WO2021258797A1 - Image information input method, electronic device, and computer readable storage medium - Google Patents

Image information input method, electronic device, and computer readable storage medium Download PDF

Info

Publication number
WO2021258797A1
WO2021258797A1 PCT/CN2021/083140 CN2021083140W WO2021258797A1 WO 2021258797 A1 WO2021258797 A1 WO 2021258797A1 CN 2021083140 W CN2021083140 W CN 2021083140W WO 2021258797 A1 WO2021258797 A1 WO 2021258797A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
image
classification result
processed
label
Prior art date
Application number
PCT/CN2021/083140
Other languages
French (fr)
Chinese (zh)
Inventor
唐吴全
王斌
张腾
秦佳美
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021258797A1 publication Critical patent/WO2021258797A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Definitions

  • This application relates to the field of terminals, and in particular to an image information input method, electronic equipment, and computer-readable storage media.
  • Information input is an important function of electronic equipment. Whether inquiring information on the Internet, or sending emails, information, etc., users need to enter relevant information on electronic devices.
  • the visual input function is gradually added to the information input method.
  • a user uses the visual input function, he inputs an image into an electronic device, and the electronic device recognizes the semantic information contained in the image and uses the recognized semantic information as input information.
  • the visual input function can automatically "guess" the semantic information that the user wants to express based on the image input by the user, which improves the convenience of information input.
  • the existing visual input function can only perform simple semantic recognition on the image input by the user. For example, when performing semantic recognition on an image with text, only the text is segmented from the image, and then the segmented text is used as the semantic information of the image, but the part of the image without text cannot be semantically recognized.
  • the existing methods since only simple semantic recognition can be performed on the image input by the user, the recognized semantic information does not contain all the semantics in the image, resulting in the existing visual input function being unable to accurately "express" the user's thoughts. The information to be input, and thus the accuracy of the input information cannot be guaranteed.
  • This application provides an image information input method, electronic equipment, and computer storage medium, which can improve the accuracy of visual input information.
  • an embodiment of the present application provides an image information input method, the method includes: acquiring a to-be-processed image; classifying the to-be-processed image to obtain a first classification result; selecting according to the first classification result Corresponding classification model, input the image to be processed into the classification model to obtain a second classification result output by the classification model; input the second classification result as the information label of the image to be processed.
  • the image to be processed is classified twice. Compared with the classification only once, the classification result obtained after the two classifications is more accurate. Furthermore, in the process of performing the second classification, since it is the classification model selected according to the first classification result obtained from the first classification, the second classification is equivalent to the reclassification of the first classification result, namely The granularity level of the second classification is lower than the granularity level of the first classification. In other words, the second classification result obtained by the second classification is more accurate than the first classification result, which improves the accuracy of semantic recognition of the image to be processed, and then when the second classification result is input as the information label of the image to be processed , Can improve the accuracy of visual input information, has strong ease of use and practicality.
  • the first classification result includes at least one first category label.
  • the selecting a corresponding classification model according to the first classification result, inputting the to-be-processed image into the classification model, and obtaining the second classification result output by the classification model includes: extracting from the to-be-processed image Sub-images corresponding to each of the first category labels in the first classification result, and obtaining the classification model corresponding to each of the first category labels in the first classification result;
  • the sub-image corresponding to the i-th first category label is input into the classification model corresponding to the i-th first category label to obtain the sub-label of the i-th first category label, where i is less than or A positive integer equal to N, where N is the number of first category labels in the first classification result; each sub-label of the first category label in the first classification result is used as the second classification result.
  • each first category label in the first classification result corresponds to a classification model, and the respective classification models corresponding to the first category labels are used to classify the respective sub-images corresponding to the first category labels. That is, the more fine-grained classification of the first classification result is equivalent to dividing small classes on the basis of a large class, thereby improving the accuracy of semantic recognition of the image to be processed, and has strong ease of use and practicability.
  • inputting the to-be-processed image into the classification model and obtaining the second classification result output by the classification model further includes: according to the first classification result and/or The second classification result is used to obtain the extended information corresponding to the first classification result and/or the second classification result, where the extended information corresponds to the first classification result and/or the second classification result relevant information.
  • the inputting the second classification result as the information label of the image to be processed includes: inputting the second classification result and/or the extended information as the information label of the image to be processed.
  • the extended information related to the first classification result and/or the second classification result is also input as the information label of the image to be processed, so that more semantic information can be "guessed” from the image to be processed, which increases the semantics of the image to be processed.
  • the richness of the recognition results ensures the completeness of the semantic recognition results of the image to be processed, thereby improving the intelligence of the visual input method.
  • the acquiring the extended information corresponding to the first classification result and/or the second classification result includes: combining the first classification result and/or the second classification result
  • the classification result is input into the preset instruction detection model, and the information query instruction output by the instruction detection model is obtained; according to the information query instruction, the first classification result and/or the extended information corresponding to the second classification result are inquired .
  • the preset instruction detection model can be used to reflect the user's information query habits, and query the extended information according to the information query instructions output by the instruction detection model.
  • the information is closer to the semantic information that the user wants to express.
  • the obtaining an image to be processed includes: obtaining a video to be processed, and extracting image information from the video to be processed, wherein the image information includes at least one frame of picture; Use at least one picture in the image information as the image to be processed.
  • the obtaining the to-be-processed video further includes: extracting audio information from the to-be-processed video; performing voice recognition processing on the audio information to obtain the audio information Information label.
  • the inputting the second classification result as the information label of the image to be processed includes: inputting the information label of the audio information and the second classification result as the information label of the image to be processed .
  • the second classification result includes at least one second category label; the information label of the audio information and the second classification result are used as the information of the image to be processed
  • the information label input includes: if the same second category label exists in the second classification result, de-duplicating the second classification result; deduplicating the information label of the audio information and all the de-duplication processing
  • the second classification result is input as the information label of the image to be processed.
  • the inputting the information label of the audio information and the second classification result after deduplication processing as the information label of the image to be processed includes: If there is a first target label in the second classification result, the first target label is input as the information label of the image to be processed, wherein the first target label is the deduplicated processed image.
  • the second category label in the second classification result that matches the information label of the audio information; if the first target label does not exist in the second classification result after deduplication processing, search for The extended information of the second category label in the second classification result is until the second target label is searched out, and the second target label is input as the information label of the image to be processed; wherein, the first The second target tag is the extended information of the second category tag that matches the information tag of the audio information.
  • the identified image information tags and audio information tags may have part of the same content and part of different content, and the same content is often semantic information that the user wants to express. Therefore, the part of the second classification result that matches the information label of the audio information is input as the information label of the image to be processed, that is, the same or similar semantic information contained in the image and audio is extracted, which is equivalent to the identification of the semantic information. There was a screening. As a result, the information label of the image to be processed is closer to the semantic information that the user wants to express, and the accuracy of the visual input information can be improved.
  • the first target tag or the second target tag may also be displayed to the user as the first-preferred tag in the information tags of the image to be processed.
  • the information tags of the image to be processed are displayed to the user in the form of a sequence, and the first push tag is located at the beginning of the sequence.
  • an embodiment of the present application provides an image information input device.
  • the device includes: an acquisition unit for acquiring an image to be processed; a first classification unit for classifying the image to be processed to obtain a first A classification result; a second classification unit, configured to select a corresponding classification model according to the first classification result, input the to-be-processed image into the classification model, and obtain a second classification result output by the classification model; information input The unit is used to input the second classification result as the information label of the image to be processed.
  • an embodiment of the present application provides an electronic device, the electronic device includes a processor, and the processor is configured to run a computer program stored in a memory, so as to implement any of the possible implementation manners provided in the first aspect Methods.
  • an embodiment of the present application provides a computer-readable storage medium, including computer instructions, which when the computer instructions run on a computer or processor, cause the computer or processor to execute any of the The methods provided by the possible implementations.
  • the embodiments of the present application provide a computer program product.
  • the computer program product runs on a computer or a processor
  • the computer or the processor executes the method provided in any one of the possible implementation manners of the first aspect.
  • the electronic device described in the third aspect, the computer storage medium described in the fourth aspect, or the computer program product described in the fifth aspect provided above are all used to execute the method provided in the first aspect. Therefore, the beneficial effects that can be achieved can refer to the beneficial effects in the corresponding method, which will not be repeated here.
  • FIG. 1 is a schematic structural diagram of an electronic device 100 provided by an embodiment of the present application.
  • FIG. 2 is a block diagram of the software structure of the electronic device 100 provided by an embodiment of the present application.
  • Figures 3(a) to 3(f) are schematic diagrams of application interfaces provided by embodiments of the present application.
  • FIG. 4 is a schematic flowchart of an image information input method provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a to-be-processed image provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of segmentation of a to-be-processed image provided by an embodiment of the present application.
  • Figures 7(a) and 7(b) are schematic diagrams of interaction between a user and an electronic device provided by an embodiment of the present application;
  • FIG. 8 is a schematic flowchart of an image information input method provided by another embodiment of the present application.
  • Fig. 9 is a structural block diagram of an image information input device provided by an embodiment of the present application.
  • the electronic equipment may be a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a notebook computer,
  • AR augmented reality
  • VR virtual reality
  • UMPC ultra-mobile personal computers
  • PDA personal digital assistants
  • FIG. 1 is a schematic structural diagram of an electronic device 100 according to an embodiment of the present application.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2.
  • Mobile communication module 150 wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195, etc.
  • SIM Subscriber identification module
  • the sensor module 180 can include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light Sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100.
  • the electronic device 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait.
  • AP application processor
  • GPU graphics processing unit
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • the processor is configured to execute the image information input method provided in the embodiment of the present application. For example, the processor executes the following steps S401-S404 or steps S901-S906.
  • the controller may be the nerve center and command center of the electronic device 100.
  • the controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 to store instructions and data.
  • the memory in the processor 110 is a cache memory.
  • the memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 110 is reduced, and the efficiency of the system is improved.
  • the processor 110 may include one or more interfaces.
  • the interface can include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter (universal asynchronous transmitter) interface.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB Universal Serial Bus
  • the I2C interface is a bidirectional synchronous serial bus, which includes a serial data line (SDA) and a serial clock line (SCL).
  • the processor 110 may include multiple sets of I2C buses.
  • the processor 110 may couple the touch sensor 180K, charger, flash, camera 193, etc., respectively through different I2C bus interfaces.
  • the processor 110 may couple the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to implement the touch function of the electronic device 100.
  • the I2S interface can be used for audio communication.
  • the processor 110 may include multiple sets of I2S buses.
  • the processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170.
  • the audio module 170 may transmit audio signals to the wireless communication module 160 through an I2S interface, so as to realize the function of answering calls through a Bluetooth headset.
  • the PCM interface can also be used for audio communication to sample, quantize and encode analog signals.
  • the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
  • the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus can be a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • the UART interface is generally used to connect the processor 110 and the wireless communication module 160.
  • the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to realize the Bluetooth function.
  • the audio module 170 may transmit audio signals to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a Bluetooth headset.
  • the MIPI interface can be used to connect the processor 110 with the display screen 194, the camera 193 and other peripheral devices.
  • the MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc.
  • the processor 110 and the camera 193 communicate through a CSI interface to implement the shooting function of the electronic device 100.
  • the processor 110 and the display screen 194 communicate through a DSI interface to realize the display function of the electronic device 100.
  • the GPIO interface can be configured through software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface can be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and so on.
  • the GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface 130 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on.
  • the USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transfer data between the electronic device 100 and peripheral devices. It can also be used to connect earphones and play audio through earphones.
  • the interface can also be used to connect other electronic devices, such as AR devices.
  • the interface connection relationship between the modules illustrated in the embodiment of the present application is merely a schematic description, and does not constitute a structural limitation of the electronic device 100.
  • the electronic device 100 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the charging management module 140 may receive the charging input of the wired charger through the USB interface 130.
  • the charging management module 140 may receive the wireless charging input through the wireless charging coil of the electronic device 100. While the charging management module 140 charges the battery 142, it can also supply power to the electronic device through the power management module 141.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, and the wireless communication module 160.
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance).
  • the power management module 141 may also be provided in the processor 110.
  • the power management module 141 and the charging management module 140 may also be provided in the same device.
  • the wireless communication function of the electronic device 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.
  • the antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in the electronic device 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna can be used in combination with a tuning switch.
  • the mobile communication module 150 may provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the electronic device 100.
  • the mobile communication module 150 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), and the like.
  • the mobile communication module 150 can receive electromagnetic waves by the antenna 1, and perform processing such as filtering, amplifying and transmitting the received electromagnetic waves to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic wave radiation via the antenna 1.
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110.
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is processed by the baseband processor and then passed to the application processor.
  • the application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays an image or video through the display screen 194.
  • the modem processor may be an independent device.
  • the modem processor may be independent of the processor 110 and be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellites.
  • WLAN wireless local area networks
  • BT wireless fidelity
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110.
  • the wireless communication module 160 may also receive a signal to be sent from the processor 110, perform frequency modulation, amplify it, and convert it into electromagnetic waves to radiate through the antenna 2.
  • the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • the GNSS may include the global positioning system (GPS), the global navigation satellite system (GLONASS), the Beidou navigation satellite system (BDS), and the quasi-zenith satellite system (quasi). -zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite-based augmentation systems
  • the electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is a microprocessor for image processing, connected to the display 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations and is used for graphics rendering.
  • the processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos, and the like.
  • the display screen 194 includes a display panel.
  • the display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active-matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • active-matrix organic light-emitting diode active-matrix organic light-emitting diode
  • AMOLED flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc.
  • the electronic device 100 may include one or N display screens 194, and N is a positive integer greater than one.
  • the electronic device 100 can realize a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, and an application processor.
  • the ISP is used to process the data fed back from the camera 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transfers the electrical signal to the ISP for processing and is converted into an image visible to the naked eye.
  • ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 193.
  • the camera 193 is used to capture still images or videos.
  • the object generates an optical image through the lens and is projected to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the electronic device 100 may include one or N cameras 193, and N is a positive integer greater than one.
  • the camera is used to obtain the image to be processed in the image information input method provided in the embodiment of the present application, or the image in the video to be processed.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in multiple encoding formats, such as: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.
  • MPEG moving picture experts group
  • MPEG2 MPEG2, MPEG3, MPEG4, and so on.
  • NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • the NPU can realize applications such as intelligent cognition of the electronic device 100, such as image recognition, face recognition, voice recognition, text understanding, and so on.
  • the external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.
  • the internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by running instructions stored in the internal memory 121.
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, at least one application program (such as a sound playback function, an image playback function, etc.) required by at least one function.
  • the data storage area can store data (such as audio data, phone book, etc.) created during the use of the electronic device 100.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and the like.
  • UFS universal flash storage
  • the electronic device 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. For example, music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal.
  • the audio module 170 can also be used to encode and decode audio signals.
  • the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.
  • the speaker 170A also called “speaker” is used to convert audio electrical signals into sound signals.
  • the electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
  • the receiver 170B also called a "handset" is used to convert audio electrical signals into sound signals.
  • the electronic device 100 answers a call or voice message, it can receive the voice by bringing the receiver 170B close to the human ear.
  • the microphone 170C also called “microphone”, “microphone”, is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 170C through the human mouth, and input the sound signal into the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C.
  • the electronic device 100 may be provided with two microphones 170C, which can implement noise reduction functions in addition to collecting sound signals.
  • the electronic device 100 may also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.
  • the microphone may be used to collect the audio of the video to be processed in the image information input method provided in the embodiment of the present application.
  • the earphone interface 170D is used to connect wired earphones.
  • the earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, or a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association
  • the pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
  • the pressure sensor 180A may be provided on the display screen 194.
  • the capacitive pressure sensor may include at least two parallel plates with conductive materials.
  • the electronic device 100 determines the intensity of the pressure according to the change in capacitance.
  • the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A.
  • the electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A.
  • touch operations that act on the same touch position but have different touch operation intensities can correspond to different operation instructions. For example: when a touch operation whose intensity is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.
  • the gyro sensor 180B may be used to determine the movement posture of the electronic device 100.
  • the angular velocity of the electronic device 100 around three axes ie, x, y, and z axes
  • the gyro sensor 180B can be used for image stabilization.
  • the gyro sensor 180B detects the shake angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the electronic device 100 through reverse movement to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenes.
  • the air pressure sensor 180C is used to measure air pressure.
  • the electronic device 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 can use the magnetic sensor 180D to detect the opening and closing of the flip holster.
  • the electronic device 100 can detect the opening and closing of the flip according to the magnetic sensor 180D. Then, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, features such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and be used in applications such as horizontal and vertical screen switching, pedometers and so on.
  • the electronic device 100 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 may use the distance sensor 180F to measure the distance to achieve fast focusing.
  • the proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the electronic device 100 emits infrared light to the outside through the light emitting diode.
  • the electronic device 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 can determine that there is no object near the electronic device 100.
  • the electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power.
  • the proximity light sensor 180G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.
  • the ambient light sensor 180L is used to sense the brightness of the ambient light.
  • the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived brightness of the ambient light.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in the pocket to prevent accidental touch.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to implement fingerprint unlocking, access application locks, fingerprint photographs, fingerprint answering calls, and so on.
  • the temperature sensor 180J is used to detect temperature.
  • the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the electronic device 100 reduces the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection.
  • the electronic device 100 when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to avoid abnormal shutdown of the electronic device 100 due to low temperature.
  • the electronic device 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
  • Touch sensor 180K also called “touch panel”.
  • the touch sensor 180K may be provided on the display screen 194, and the touch screen is composed of the touch sensor 180K and the display screen 194, which is also called a “touch screen”.
  • the touch sensor 180K is used to detect touch operations acting on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • the visual output related to the touch operation can be provided through the display screen 194.
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100, which is different from the position of the display screen 194.
  • the bone conduction sensor 180M can acquire vibration signals.
  • the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice.
  • the bone conduction sensor 180M can also contact the human pulse and receive the blood pressure pulse signal.
  • the bone conduction sensor 180M may also be provided in the earphone, combined with the bone conduction earphone.
  • the audio module 170 can parse the voice signal based on the vibration signal of the vibrating bone block of the voice obtained by the bone conduction sensor 180M, and realize the voice function.
  • the application processor may analyze the heart rate information based on the blood pressure beating signal obtained by the bone conduction sensor 180M, and realize the heart rate detection function.
  • the button 190 includes a power-on button, a volume button, and so on.
  • the button 190 may be a mechanical button. It can also be a touch button.
  • the electronic device 100 may receive key input, and generate key signal input related to user settings and function control of the electronic device 100.
  • the motor 191 can generate vibration prompts.
  • the motor 191 can be used for incoming call vibration notification, and can also be used for touch vibration feedback.
  • touch operations for different applications can correspond to different vibration feedback effects.
  • Acting on touch operations in different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects.
  • Different application scenarios for example: time reminding, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 may be an indicator light, which may be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on.
  • the SIM card interface 195 is used to connect to the SIM card.
  • the SIM card can be inserted into the SIM card interface 195 or pulled out from the SIM card interface 195 to achieve contact and separation with the electronic device 100.
  • the electronic device 100 may support 1 or N SIM card interfaces, and N is a positive integer greater than 1.
  • the SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc.
  • the same SIM card interface 195 can insert multiple cards at the same time. The types of the multiple cards can be the same or different.
  • the SIM card interface 195 can also be compatible with different types of SIM cards.
  • the SIM card interface 195 can also be compatible with external memory cards.
  • the electronic device 100 interacts with the network through the SIM card to implement functions such as call and data communication.
  • the electronic device 100 adopts an eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.
  • the software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
  • the embodiment of the present invention takes an Android system with a layered architecture as an example to illustrate the software structure of the electronic device 100 by way of example.
  • FIG. 2 is a block diagram of the software structure of the electronic device 100 provided by an embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface.
  • the Android system is divided into four layers, from top to bottom, the application layer, the application framework layer, the system library and the Android runtime (Android runtime), and the kernel layer.
  • the application layer can include a series of application packages.
  • the application package can include applications such as camera, gallery, calendar, call, map, navigation, input method, Bluetooth, music, video, short message, etc.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer can include a window manager, a content provider, a view system, a phone manager, a resource manager, and a notification manager.
  • the window manager is used to manage window programs.
  • the window manager can obtain the size of the display screen, determine whether there is a status bar, lock the screen, take a screenshot, etc.
  • the content provider is used to store and retrieve data and make these data accessible to applications.
  • the data may include video, image, audio, phone calls made and received, browsing history and bookmarks, phone book, etc.
  • the view system includes visual controls, such as controls that display text, controls that display pictures, and so on.
  • the view system can be used to build applications.
  • the display interface can be composed of one or more views.
  • a display interface that includes a short message notification icon may include a view that displays text and a view that displays pictures.
  • the phone manager is used to provide the communication function of the electronic device 100. For example, the management of the call status (including connecting, hanging up, etc.).
  • the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
  • the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and it can disappear automatically after a short stay without user interaction.
  • the notification manager is used to notify download completion, message reminders, and so on.
  • the notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or a scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window.
  • prompt text information in the status bar sound a prompt sound, electronic device vibration, flashing indicator light, etc.
  • Android Runtime includes core libraries and virtual machines. Android runtime is responsible for the scheduling and management of the Android system.
  • the core library consists of two parts: one part is the function function that the java language needs to call, and the other part is the core library of Android.
  • the application layer and the application framework layer run in a virtual machine.
  • the virtual machine executes the java files of the application layer and the application framework layer as binary files.
  • the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
  • the system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (for example: OpenGL ES), 2D graphics engine (for example: SGL), etc.
  • the surface manager is used to manage the display subsystem and provides a combination of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to realize 3D graphics drawing, image rendering, synthesis, and layer processing.
  • the 2D graphics engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display driver, camera driver, audio driver, and sensor driver.
  • the following embodiment of the image information input method can be implemented on a mobile phone with the above hardware structure/software structure.
  • Visual input can recognize the semantic information contained in the visual information from the visual information input by the user.
  • the identified semantic information can be used for information input.
  • Visual information can include static images (such as pictures) or dynamic images (such as videos, etc.).
  • the visual input can be a separate application or a function in an application.
  • the visual input can be a system program that comes with the electronic device or a third-party application installed on the electronic device.
  • the visual input can be a system program that comes with the electronic device or a third-party application installed on the electronic device.
  • the user needs to use the visual input function for information input, the user first needs to select the visual input application as a tool for information input.
  • the interface of the application may include preset buttons.
  • the preset button is used to activate the visual input function.
  • FIG. 3(a) it is an information input interface 10 of an electronic device 100.
  • the information input interface 10 includes an information input box 101, a virtual keyboard control 102 and an image input control 103. in:
  • the information input box 101 is used to display input information.
  • the information input box can be a search box, short message sending box, query box and other areas where information needs to be input.
  • the information input box is the search box as an example.
  • the virtual keyboard control 102 is used for the user to input information into the information input box.
  • the image input control 103 is used to activate the visual input function. As shown in Figure 3(a), the image input control can be set on the virtual keyboard control. In another application scenario, the image input control can also be set separately from the virtual keyboard control. For example, the image input control is set on the right, left, or above the virtual keyboard control.
  • the user can enter the information input interface 10 through user operations.
  • the process of entering the information input interface through user operations will be described below in conjunction with the workflow of the software and hardware of the electronic device 100.
  • the kernel layer processes the touch operation into the original input event (including touch coordinates, time stamp of the touch operation, etc.).
  • the original input events are stored in the kernel layer.
  • the application framework layer obtains the original input event from the kernel layer, recognizes that the application corresponding to the input event is an input method application, and then calls the interface of the application framework layer to start the input method application.
  • the display driver by calling the kernel layer to display the virtual keyboard control and image input control to the user; and start the sensor driver by calling the kernel layer to obtain the information input by the user through the sensor corresponding to the virtual keyboard control and the image input control. User's touch operation. So far, the information input interface 10 is displayed on the electronic device 100.
  • the user can enter the mode selection interface 20 through user operations on the information input interface 10.
  • the user operation may be a touch operation (such as a click operation, etc.) of the image input control 103 detected on the information input interface 10.
  • the electronic device 100 in response to a user's click operation on the image input control 103, displays a mode selection interface 20.
  • the mode selection interface 20 may be a virtual keyboard control 102, and the virtual keyboard control 102 includes an album application control 201 and a camera application control 202. in:
  • the photo album application control 201 is used to start the photo album application.
  • the name of the photo album application control can be "album”, “gallery” or “photo”, etc.
  • the name of the photo album application control is "Gallery”.
  • the camera application control 202 is used to start a camera application.
  • the name of the camera application control can be “camera” or “photograph”, etc. As shown in Figure 3(b), in this application scenario, the name of the camera application control is "camera”.
  • the electronic device 100 displays the photographing interface 40.
  • the gallery interface 30 as shown in FIG. 3(c) may include a return button 301, a confirmation button 302, and multiple pictures 303.
  • the user can select the picture that needs to be input from multiple pictures through user operations.
  • the electronic device 100 displays the mode selection interface 20.
  • the one or more pictures corresponding to the click operation are displayed as selected (for example, the color of the selected picture becomes darker, or the Add a mark to the selected picture, etc.).
  • the electronic device 100 can display the image label interface 50.
  • the shooting interface 40 shown in FIG. 3(d) may include a shooting frame 401, a shooting key 402, and a return key 403.
  • the user can obtain the photographed picture through user operation.
  • the camera on the electronic device 100 obtains the image contained in the shooting frame 401, and after obtaining the shot picture, the shooting interface 40 may display the confirmation button .
  • the electronic device 100 may display the image label interface 50.
  • the electronic device 100 displays the mode selection interface 20.
  • the information label interface 50 shown in FIG. 3(e) may include an information input box 101 and a virtual keyboard control 102, wherein the virtual keyboard control 102 includes a plurality of information labels 501.
  • the information tag is semantic information recognized from a picture selected by the user or a photographed picture taken by the user.
  • the user can select at least one information tag from a plurality of information tags as input information.
  • the electronic device 100 may display the input result interface 60.
  • the input result interface 60 as shown in FIG. 3(f) may include an information input box 101 and input information 601.
  • the input information 601 is an information label selected by the user.
  • the selected multiple information tags can be combined into input information according to the user's selection order.
  • the electronic device 100 may use the image information input method provided in the embodiments of the present application to obtain the information label of the image to be processed.
  • the gallery may include pictures and videos, and the user can select the pictures or videos in the gallery. Users can also use the camera to capture pictures or video. Among them, the pictures and captured pictures in the gallery are static images, and the videos and captured videos in the gallery are dynamic images.
  • the image information input method provided in the embodiments of the present application will be introduced below for static images and dynamic images respectively.
  • FIG. 4 is a schematic flowchart of an image information input method provided by an embodiment of the present application.
  • the image information input method may include the following steps:
  • S401 Acquire an image to be processed.
  • the electronic device 100 may directly obtain a picture input by the user (ie, a picture selected by the user from a gallery or a photographed picture obtained through a camera application), and record the picture input by the user as an image to be processed.
  • the electronic device 100 may also obtain image data from a picture input by the user, and use the image data as an image to be processed. For example, the electronic device obtains the pixel value information of the picture, converts the pixel value information into a bitmap, and records the bitmap as an image to be processed.
  • the image to be processed can also be a picture directly input by the user.
  • an information input interface 10 is displayed on the electronic device 100, and the information input interface 10 may include an image input box. Users can copy pictures from web pages/chat messages, and then paste the copied pictures into the image input box. After detecting the input event in the image input box, the electronic device 100 obtains the picture input in the image input box, and records the picture as an image to be processed.
  • preprocessing may be performed on the image to be processed, which specifically includes: clipping the image to be processed. For example, crop the image to be processed into a 200 ⁇ 200 image. After the cropped image is obtained, the cropped image can be recorded as the image to be processed. The trimming process can unify the size of the image to be processed, facilitating subsequent image processing.
  • S402 Perform classification processing on the image to be processed to obtain a first classification result.
  • the first classification result includes at least one first category label.
  • the category label can be used to represent the semantic information contained in the image to be processed.
  • FIG. 5 is a schematic diagram of an image to be processed provided in an embodiment of the present application.
  • the image to be processed as shown in FIG. 5 contains animals and cars.
  • the first classification result of the image to be processed includes two first category labels, namely "animal" and "car".
  • one way of classifying the image to be processed may be: obtaining a pre-trained classifier; inputting the image to be processed into the classifier for classification, and obtaining at least one first category label output by the classifier.
  • each sample image can be manually annotated, the semantic information of the sample image (that is, the first category label) can be annotated, and the sample image with the annotation can be input into the classification Training in the device. Then input part of the labeled sample images into the classifier for testing. When the classification accuracy of the classifier reaches a certain preset accuracy, the training is completed.
  • the construction method of the classifier can be a statistical method, a machine learning method or a neural network method. Since the neural network has the advantages of fast calculation speed and high accuracy of results, it is preferable that the classifier is a neural network.
  • the first classification result further includes a probability value corresponding to each first category label.
  • the greater the probability value the greater the probability that the first category label corresponding to this probability value can represent the semantic information contained in the image to be processed. Therefore, the first classification result can be preliminarily screened based on the probability value. Specifically: delete the first category label corresponding to the probability value less than the preset value in the first classification result, and only retain the probability value greater than or equal to the preset value The corresponding first category label. In this way, it is equivalent to excluding some less likely semantic information.
  • Using the trained classifier to classify the image to be processed can improve the efficiency of the classification process. And since the classification accuracy of the trained classifier is high, the accuracy of the first classification result obtained by using the classifier is also high.
  • the foregoing process of classifying the image to be processed is actually a process of roughly classifying the image to be processed.
  • Rough classification is compared to fine classification.
  • the higher the degree of refinement of image classification the smaller the granularity; on the contrary, the lower the degree of refinement of image classification, the greater the granularity.
  • the granularity of coarse classification is larger than that of fine classification.
  • the degree of refinement of the classification result obtained by the rough classification is lower than the degree of refinement of the classification result obtained by the fine classification.
  • the first classification result obtained by classifying the image to be processed can reflect a larger range of semantic information contained in the image to be processed. However, often a larger range of semantic information cannot reflect the semantic information that users want to express. In order to make the identified semantic information closer to the content that the user wants to express, the first classification result can be reclassified, that is, the first classification result can be fine-grained classification. Specific steps are as follows.
  • S403 Select a corresponding classification model according to the first classification result, input the to-be-processed image into the classification model, and obtain a second classification result output by the classification model.
  • the corresponding second classification result may be gender, name, etc.
  • the corresponding second classification result may be text information, picture information, or network address information corresponding to the two-dimensional code.
  • the corresponding second classification result may be the name of the plant, the type of the plant, and so on.
  • One way of selecting the corresponding classification model according to the first classification result is: selecting the classification model corresponding to each first category label in the first classification result.
  • the first classification result obtained from the image to be processed in Fig. 5 includes two first category labels, namely "animal” and "car".
  • the person classification model corresponding to the first category label "animal” is acquired, and the vehicle classification model corresponding to the first category label "car” is acquired.
  • the classification model corresponding to each first category label may be pre-trained. In this way, when recognizing the image to be processed, the recognition time can be saved and the recognition accuracy can be ensured.
  • each first category label may correspond to at least one classification model, and different classification models output different classification results. Therefore, the second classification result may include at least one second category label.
  • the category range of the second category label is smaller than the category range of the first category label.
  • the classification model may be a multi-label classification model, and its output result may include multiple category labels.
  • the second classification result obtained by using the vehicle model includes two second category labels, namely "car” and "brand A".
  • each classification model can be a single-label classification model, that is, each classification model outputs only one category label.
  • the first category label "car" of the image to be processed as shown in FIG. 5 can correspond to the brand model (the brand model can identify the brand information of the car) and the vehicle type classification model (vehicle type).
  • the model can identify the type of car information).
  • a second category label obtained by using a brand model is "brand A”
  • a second category label obtained by using a vehicle type model is "car”. Therefore, the second classification result includes two second category labels, namely "car” and "brand A".
  • results obtained by the above two methods can be the same, but the number of classification models is different.
  • the samples used by each classification model are also different.
  • one way to obtain the second classification result is to directly input the to-be-processed image into the classification model corresponding to each first class label to obtain the second classification result output by the classification model.
  • each classification model can actually only recognize the image corresponding to the corresponding first category label.
  • the classification model corresponding to "animal” is an animal model
  • the classification model corresponding to "car” is a vehicle model.
  • the animal model can only identify the part of the image that contains "animals", but cannot identify the part of the image that contains "cars.”
  • the vehicle model can only recognize the part of the image that contains the "car”, and cannot recognize the part of the image that contains the "animal”. Therefore, if the image to be processed is input into the character model or the vehicle model, it is equivalent to inputting part of the invalid information into the classification model, and the invalid information will interfere with the effective information, thereby affecting the classification result of the classification model.
  • another method for obtaining the second classification result is provided in the embodiment of the present application, and only valid information in the image to be processed may be input into the classification model for classification.
  • the specific steps include:
  • the sub-image corresponding to the first category label is input into the classification model corresponding to the i-th first category label, and the sub-label of the i-th first category label is obtained, where i is a positive integer less than or equal to N, and N is the first category label.
  • the number of labels of the first category in the classification result; the sub-label of each label of the first category in the first classification result is used as the second classification result.
  • FIG. 6 is a schematic diagram of segmentation of an image to be processed provided in an embodiment of the present application.
  • the first classification result obtained contains two first category labels, which are "two-dimensional code” and "text” respectively.
  • each first category label in the first classification result corresponds to a classification model, and the classification model corresponding to the first category label is used to correspond to the first category label.
  • the classification of the sub-images is to perform a more fine-grained classification of the first classification result, which is equivalent to dividing small classes on the basis of a large class, thereby improving the accuracy of semantic recognition of the image to be processed.
  • classification with lower granularity may be continued for multiple times. For example: after obtaining the second classification result, perform the third classification process. Specifically, the corresponding classification model is selected according to the second classification result, the image to be processed is input into the classification model, and the third classification result output by the classification model is obtained.
  • the process of each classification after the second classification result reference may be made to the example in S403, which will not be repeated here.
  • the number of classifications can be preset according to actual needs, and there is no specific limitation here. The more times of classification, the lower the granularity of classification, and the smaller the range of classification results obtained.
  • Fig. 7(a) and Fig. 7(b) are schematic diagrams of interaction between a user and an electronic device provided by an embodiment of the present application.
  • the electronic device 100 can first classify the first classification result
  • the result is displayed to the user as the information label of the image to be processed, so that the user selects at least one input label from the first classification result (the input label is any one of the first category labels).
  • the electronic device 100 After the user selects the input tag, the electronic device 100 responds to the detected input tag, selects the corresponding classification model according to the input tag, inputs the image to be processed into the classification model, obtains the second classification result output by the classification model, and classifies the second classification
  • the result is displayed to the user as an information tag of the image to be processed, so that the user selects at least one information tag from the information tags of the image to be processed as input information.
  • the electronic device 100 displays the input information in the information input box in response to the detected input information.
  • the electronic device 100 when the classifier used in S401 and the classification model used in S403 are integrated, after the classifier in S402 obtains the first classification result, the electronic device 100 directly As a result, the corresponding classification model is selected, the image to be processed is input to the classification model, and the second classification result output by the classification model is obtained.
  • the second classification result is displayed to the user as the information label of the image to be processed, so that the user can obtain the information of the image to be processed Select at least one information tag from the tags as input information.
  • the electronic device 100 displays the input information in the information input box in response to the detected input information.
  • the electronic device 100 can input all the second category labels in the second classification result as the information label of the image to be processed into the information input box. However, in this way, there is more information input into the information input box, and not every second category label can represent the semantic information that the user wants to express.
  • the second classification result also includes the probability value corresponding to each second category label.
  • the electronic device 100 may Display each second category label in the second classification result to the user, and the user selects at least one of the second category labels as the information label of the image to be processed; then the electronic device 100 responds to the detected information label selected by the user, and Enter the detected information label into the information input box.
  • both the first classification result and the second classification result may be input as the information label of the image to be processed.
  • a specific method please refer to the above-mentioned method of inputting the second classification result as the information label of the image to be processed, which will not be repeated here.
  • the first classification result and/or the second classification result can be obtained according to the first classification result and/or the second classification result
  • the extended information corresponding to the result, and the second classification result and/or extended information are input as the information label of the image to be processed.
  • the extended information is information related to the first classification result and/or the second classification result.
  • the “relevant” here may refer to information related to all category labels in the first classification result and/or the second classification result. For example, assuming that the second category labels in the second classification result are "Brand A" and "Car” respectively, the obtained extended information may be the introduction information of a brand A car.
  • the “relevant” here may also refer to information related to any category label in the first classification result and/or the second classification result. For example, assuming that the second category labels in the second classification result are "Brand A" and "Car” respectively, the obtained extended information may be brand information of brand A and introduction information of cars.
  • one way of obtaining the extended information corresponding to the first classification result and/or the second classification result is: using a search engine to search the Internet for the extended information corresponding to the first classification result and/or the second classification result.
  • search engine refers to a retrieval technology that uses specific strategies to retrieve information from the Internet according to user needs and feeds the information back to users.
  • Search engines rely on a variety of technologies, such as web crawler technology, search ranking technology, web page processing technology, big data processing technology, natural language processing technology, etc.
  • any existing search engine can be used to search for information, and there is no specific limitation.
  • the extended information searched by the above method is usually too complicated, the content is large, and the correlation between the information is poor.
  • another method of obtaining the extended information corresponding to the first classification result and/or the second classification result is provided in the embodiment of this application.
  • the method specifically includes: inputting the first classification result and/or the second classification result into a preset instruction detection model to obtain an information query instruction output by the instruction detection model; query according to the information query instruction The extended information corresponding to the first classification result and/or the second classification result.
  • the instruction detection model can be pre-trained.
  • the instruction detection model is equivalent to the correspondence between the category label in the first classification result and/or the second classification result and the information query instruction.
  • Each category label can correspond to one or more information query instructions.
  • Information query instructions may include: query parameters, type matching, keyword query, translation, and so on.
  • the electronic device can use the search engine to query the parameter information of the A brand car from the Internet, and The inquired parameter information is used as the extended information of the category label "A brand car”.
  • the electronic device can use the existing matching rules to obtain the type information of the QR code (such as business cards, official accounts, Web page link, etc.), the type information of the QR code can be further analyzed to obtain the analytical information (such as analyzing the information in the business card), and the type information and/analytic information of the QR code can be used as the category label "QR code" Extended information.
  • the type information of the QR code such as business cards, official accounts, Web page link, etc.
  • the type information of the QR code can be further analyzed to obtain the analytical information (such as analyzing the information in the business card)
  • the type information and/analytic information of the QR code can be used as the category label "QR code" Extended information.
  • the electronic device can use the search engine to query the encyclopedia information of the rose from the Internet (such as a simple description of the rose) , Or a link to a web page that introduces roses, etc.), and use encyclopedia information as an extension of the category tag "roses".
  • the output information query instruction is a translation.
  • the electronic device can use a translation application or on the Internet Query the Chinese definition corresponding to beautiful on the above, and use the Chinese definition as the extended information of the category label "beautiful".
  • the instruction detection model can be obtained by pre-training according to actual needs. For example: the user's historical search information and historical input information can be collected, and the historical search information and historical input information can be used as training data to train the instruction detection model.
  • the trained instruction detection model can reflect the user's information query habits, and the trained instruction detection model can "guess" the information query action that the user wants to perform, and then perform the query based on the "guessed" information query action
  • the extended information makes the acquired extended information closer to the semantic information that the user wants to express, thereby increasing the intelligence of the image information input method.
  • the second classification result and/or the extended information can be input as the information label of the image to be processed.
  • the electronic device 100 Use the image information input method provided in the embodiments of the application to obtain the first classification result and the second classification result of the image to be processed, and input the first classification result and/or the second classification result into the preset instruction detection model to obtain After the information query instruction output by the instruction detection model, the instruction control corresponding to the information query instruction is displayed on the extended information query interface 70, so that the user selects the target instruction from the information query instruction and clicks the instruction control corresponding to the target instruction.
  • the electronic device 100 queries the first classification result and/or the extended information corresponding to the second classification result according to the information query instruction corresponding to the instruction control clicked by the user, and compares the second classification result with the extension
  • the information is displayed as the information label of the image to be processed in the information label interface 50 as shown in FIG. 3(e), so that the user can select the information to be input from a plurality of information labels.
  • the image information input method provided by the embodiment of the present application is introduced.
  • the following takes the visual information as a dynamic image as an example to introduce the image information input method provided in the embodiment of the present application.
  • Fig. 8 is a schematic flowchart of an image information input method provided by another embodiment of the present application.
  • the image information input method may include the following steps:
  • the electronic device 100 may directly obtain the video input by the user (that is, the video selected by the user from the gallery or the captured video obtained through the camera application), and record the video input by the user as a video to be processed.
  • the electronic device 100 may also obtain video information from a video input by the user, and use the video information as a video to be processed. For example, the electronic device obtains the pixel value information of each frame of image in the video, converts the pixel value information into a bitmap, and records the bitmap as a video to be processed.
  • S802 Extract image information from the video to be processed, and use at least one picture in the image information as the image to be processed.
  • Video is composed of image information and audio information.
  • the image information includes at least one frame of picture.
  • Each frame of picture in the image information can be recorded as a to-be-processed image, and then each to-be-processed image is classified and processed separately.
  • the information contained in the adjacent frames of the video is the same. Therefore, in order to reduce the amount of calculation, the image information can also be sampled, that is, a picture is obtained every few frames and recorded as the image to be processed.
  • S803 Perform classification processing on the image to be processed to obtain a first classification result.
  • S804 Select a corresponding classification model according to the first classification result, input the to-be-processed image into the classification model, and obtain a second classification result output by the classification model.
  • Steps S803-S804 are the same as steps S402-S403 in the embodiment of FIG. 4, and for details, please refer to the description of steps S402-S403, which will not be repeated here.
  • S805 Extract audio information from the video to be processed, perform voice recognition processing on the audio information, and obtain an information tag of the audio information.
  • the existing automatic speech recognition (ASR) technology can be used to recognize audio information, obtain text information contained in the audio information, and use the recognized text information as the information label of the audio information.
  • the recognized complete sentence can be used as the label of the audio information; it is also possible to extract keywords with grammatical meaning from the recognized complete sentence according to the grammatical characteristics, and use the keyword as the label of the audio information. For example: the recognized complete sentence is "A brand car is a popular car at the moment", and the grammatically meaningful keyword extracted from this sentence is "A brand car", but ignores that it has no grammatical meaning Words, such as prepositions, auxiliary words, etc.
  • steps S803-S804 are the process of obtaining the semantic information contained in the image information
  • step S905 is the process of obtaining the semantic information contained in the audio information. These two processes can be processed in parallel, or processed one after the other, which is not specifically limited here.
  • the method further includes:
  • S806 Input the information label of the audio information and the second classification result as the information label of the image to be processed.
  • the second classification result may include multiple second category labels, the same second category labels may exist in these second category labels, that is, duplicate semantic information.
  • the second classification result can be deduplicated first. Specific steps can include:
  • the second classification result is deduplicated; the information label of the audio information and the second classification result after deduplication are used as the information label of the image to be processed enter.
  • the de-duplication processing means that only any one second category label in the same second category label is retained.
  • the de-duplication processing means that only any one second category label in the same second category label is retained.
  • the second classification result namely "car", “car” and “car”.
  • the final second classification result after deduplication includes two second category labels, namely "car” and "car”.
  • the de-duplication processing can also mean that only any one of the second-category tags with the same semantics is retained.
  • the second classification result recognized for the image to be processed in Fig. 6 includes three second category labels, which are "Zhang San#55555555555#” and "Name Zhang San”. And "Phone 55555555555”. Among them, "Name Zhang San” and "Zhang San#55555555555#” these two second category tags both contain the semantic information of the name Zhang San, you can keep only one of the two; "Phone 55555555555” and "Zhang Three #55555555555#” These two second category tags both contain the semantic information of the phone number 55555555555, and only one of the two can be kept. Therefore, the second classification result after deduplication processing may only include one second category label "Zhang San#55555555555#", and may also include two second category labels "Name Zhang San” and "Phone 55555555555555".
  • the information label of the audio information and each second category label in the second classification result after the deduplication processing can be used as the information label of the image to be processed.
  • the information tags of the image to be processed obtained in this way are complicated, and they may contain multiple information tags that are invalid or that cannot reflect the semantic information that the user wants to express.
  • the part that can express the same semantic information among the information label of the audio information and the second classification result after deduplication can be used as the information label of the image to be processed.
  • the specific methods are introduced in the following two situations.
  • Case 1 If there is a first target label in the second classification result after de-duplication processing.
  • the first target label is a second category label that matches the information label of the audio information in the second classification result after deduplication processing.
  • the information label of the audio information and the information label of the image information that is, the second classification result after deduplication
  • the first target label there is a part (that is, the first target label) that can express the same semantic information. Therefore, the first target tag is input as the information tag of the image to be processed.
  • matching can mean the same, or it can mean the same semantic information can be expressed.
  • the information tag of the audio information includes "brand A car”
  • the second category tag in the second classification result after deduplication processing includes “brand A car”
  • if the two tags are the same, then "brand A car” is The first target label.
  • the information tag of the audio information includes "rose”
  • the second category tag in the second classification result after deduplication processing includes "rose”. Because rose and rose represent the same semantic information, the two If the tags match, the second category tag "rose” is recorded as the first target tag.
  • the information label of the audio information and the information label of the image information there is no part that can express the same semantic information (that is, the first target label) .
  • the extended information of the second category label in the second classification result after deduplication can be searched until the second target label is searched out, and the second target label is input as the information label of the image to be processed.
  • the second target tag is the extended information of the second category tag that matches the information tag of the audio information.
  • the information tag of the audio information includes "XX Official Account”
  • the second category tag of the second classification result after deduplication processing includes "QR code”
  • the searched out extension information of "QR code” There are “XX Official Account” and "XX Company”, where the "XX Official Account” is the same as the "XX Official Account” in the information tag of the audio information, and the "XX Official Account” is recorded as the second target tag.
  • step S404 By searching for the extended information of the second category label in the second classification result, it is possible to find the part that can express the same semantic information in both the information label of the audio information and the information label of the image information.
  • searching for the extended information of the second category label in the second classification result refer to the description of “obtain the first classification result and/or the extended information corresponding to the second classification result” in an embodiment of step S404. This will not be repeated here.
  • the part of the second classification result that matches the information label of the audio information is input as the information label of the image to be processed, that is, the same or similar semantic information contained in the image and audio is extracted, which is equivalent to the identification of the Semantic information was screened once.
  • the information label of the image to be processed is closer to the semantic information that the user wants to express, and the accuracy of the visual input information can be improved.
  • the first target label or the second target label can be directly input into the information input box as the information label of the image to be processed .
  • the first target tag or the second target tag may also be displayed to the user as the first tag in the information tags of the image to be processed.
  • the information label of the image to be processed may include the second classification result, the information label of the audio information, and the first target label/the second target label.
  • the first target label/the second target label among them are regarded as the first-preferred label.
  • the first push label refers to a label that is obviously different from other information labels among the information labels of the image to be processed.
  • the information tags of the image to be processed are displayed to the user in the form of a sequence, with the first push tag at the beginning of the sequence.
  • FIG. 9 is a structural block diagram of an image information input device provided in an embodiment of the present application. For ease of description, only parts related to the embodiment of the present application are shown.
  • the device includes:
  • the image acquisition unit 91 is used to acquire an image to be processed.
  • the first classification unit 92 is configured to perform classification processing on the to-be-processed image to obtain a first classification result.
  • the second classification unit 93 is configured to select a corresponding classification model according to the first classification result, input the to-be-processed image into the classification model, and obtain a second classification result output by the classification model.
  • the information input unit 94 is configured to input the second classification result as the information label of the image to be processed.
  • the image acquisition unit 91 When acquiring a captured picture through the camera application, the image acquisition unit 91 first starts the camera application, then registers the camera button and/or focus callback function to acquire the image data captured by the camera application, and then converts the image data into a bitmap. And record the bitmap as the image to be processed.
  • the image acquisition unit 91 When acquiring a picture from the gallery, the image acquisition unit 91 first starts the photo album application, and then registers the callback function of the selected picture to acquire the data of the selected picture, and then converts the data of the selected picture into a bitmap, and then transfers the bitmap to the selected picture.
  • the image is marked as the image to be processed.
  • the image acquiring unit 91 After the image acquiring unit 91 acquires the image to be processed, the image acquiring unit 91 passes the image to be processed (ie, bitmap) to the first classification unit 92 through a focus callback function in a parameter manner.
  • the first classification unit 92 registers the first classification result callback function.
  • the first classification unit 92 transmits the first classification result to the second classification unit 103 through the first classification result callback function.
  • the second classification unit 93 registers the second classification result callback function.
  • the second classification unit 93 transmits the second classification result to the information input unit 104 through the second classification result callback function.
  • the information input unit 94 inserts the information tag of the image to be processed into the cursor on the current interface.
  • the first classification result includes at least one first category label.
  • the second classification unit 93 is also used for:
  • the i is a positive integer less than or equal to N, and N is the number of the first category label in the first classification result; the subtags of each first category label in the first classification result As the second classification result.
  • the device 9 further includes:
  • the extended information acquiring unit is configured to input the to-be-processed image into the classification model, and after obtaining the second classification result output by the classification model, acquire according to the first classification result and/or the second classification result The extended information corresponding to the first classification result and/or the second classification result, wherein the extended information is information related to the first classification result and/or the second classification result.
  • the information input unit 94 is further configured to input the second classification result and/or the extended information as the information label of the image to be processed.
  • the extended information acquiring unit is also used to:
  • the image acquisition unit 91 includes:
  • the image information acquisition module is used to acquire a video to be processed and extract image information from the video to be processed, wherein the image information includes at least one frame of pictures; and at least one frame of the image in the image information is used as the Describe the image to be processed.
  • the image acquisition unit 91 further includes:
  • the audio information acquisition module is configured to extract audio information from the to-be-processed video after the acquisition of the to-be-processed video; perform voice recognition processing on the audio information to obtain the information tag of the audio information.
  • the information input unit 94 is further configured to input the information label of the audio information and the second classification result as the information label of the image to be processed.
  • the second classification result includes at least one second category label.
  • the information input unit 94 is also used to:
  • the second classification result is deduplicated; the information label of the audio information and the second classification result after the deduplication are used as The information tag input of the image to be processed.
  • the information input unit 94 is also used to:
  • the first target tag is input as the information tag of the image to be processed, where the first target tag is deduplication processing
  • the second category label that matches the information label of the audio information in the subsequent second classification result
  • the first target tag does not exist in the second classification result after deduplication processing, search for the extended information of the second category tag in the second classification result after deduplication processing until the first target tag is searched out.
  • Two target tags, and the second target tag is input as the information tag of the image to be processed; wherein, the second target tag is an extension of the second category tag that matches the information tag of the audio information information.
  • the embodiments of the present application also provide a computer-readable storage medium, including computer instructions, which when the computer instructions run on a computer or a processor, cause the computer or the processor to execute each of the above-mentioned image information input method embodiments Steps in.
  • the embodiments of the present application provide a computer program product.
  • the computer program product runs on a computer or a processor, the computer or the processor realizes the steps in the foregoing image information input method embodiments when executed.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted through the computer-readable storage medium.
  • the computer instructions can be sent from one website site, computer, server, or data center to another website site, computer, Server or data center for transmission.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)), etc.
  • An embodiment of the present application further provides a chip system, wherein the chip system includes a processor, the processor is coupled with a memory, and the processor executes a computer program stored in the memory to realize the above-mentioned image information. Enter the steps in the method embodiment.
  • the chip system may be a single chip or a chip module composed of multiple chips.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

An image information input method, an electronic device, and a computer readable storage medium, which are applicable to the technical field of terminals. The image information input method comprises: obtaining an image to be processed (S401); performing classification processing on said image, and obtaining a first classification result (S402); selecting a corresponding classification model according to the first classification result, inputting said image into the classification model, and obtaining a second classification result outputted by the classification model (S403); and using the second classification result as the information label input for said image (S404). In the described method, complete and accurate semantic recognition can be performed on an image during visual input, thereby improving the accuracy of visual input information.

Description

图像信息输入方法、电子设备及计算机可读存储介质Image information input method, electronic equipment and computer readable storage medium
本申请要求于2020年06月24日提交国家知识产权局、申请号为202010589458.0、申请名称为“图像信息输入方法、电子设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application submitted to the State Intellectual Property Office on June 24, 2020, the application number is 202010589458.0, and the application name is "Image Information Input Method, Electronic Equipment, and Computer Readable Storage Medium", and its entire content Incorporated in this application by reference.
技术领域Technical field
本申请涉及终端领域,尤其涉及一种图像信息输入方法、电子设备及计算机可读存储介质。This application relates to the field of terminals, and in particular to an image information input method, electronic equipment, and computer-readable storage media.
背景技术Background technique
信息输入是电子设备的一项重要功能。不论是在英特网上查询资料,还是发送邮件、信息等,用户都需要在电子设备上输入相关的信息。而随着视觉处理技术的发展,信息输入方法中也逐渐加入了视觉输入功能。用户使用视觉输入功能时,在电子设备中输入图像,电子设备识别出图像中所包含的语义信息、并将识别出的语义信息作为输入信息。与传统的由用户直接输入文字信息的方式不同,视觉输入功能可以根据用户输入的图像自动“推测”出用户想要表达的语义信息,提高了信息输入的便捷性。Information input is an important function of electronic equipment. Whether inquiring information on the Internet, or sending emails, information, etc., users need to enter relevant information on electronic devices. With the development of visual processing technology, the visual input function is gradually added to the information input method. When a user uses the visual input function, he inputs an image into an electronic device, and the electronic device recognizes the semantic information contained in the image and uses the recognized semantic information as input information. Different from the traditional way of directly inputting text information by the user, the visual input function can automatically "guess" the semantic information that the user wants to express based on the image input by the user, which improves the convenience of information input.
但是,现有的视觉输入功能只能对用户输入的图像进行简单的语义识别。例如:对带有文字的图像进行语义识别时,只是从图像中分割出文字,然后将分割出的文字作为图像的语义信息,但是无法对图像中未带有文字的部分进行语义识别。现有方法中,由于只能对用户输入的图像进行简单的语义识别,因此,识别出的语义信息未能包含图像中的全部语义,导致现有的视觉输入功能无法准确地“表达”用户想要输入的信息,进而无法保证输入信息的准确性。However, the existing visual input function can only perform simple semantic recognition on the image input by the user. For example, when performing semantic recognition on an image with text, only the text is segmented from the image, and then the segmented text is used as the semantic information of the image, but the part of the image without text cannot be semantically recognized. In the existing methods, since only simple semantic recognition can be performed on the image input by the user, the recognized semantic information does not contain all the semantics in the image, resulting in the existing visual input function being unable to accurately "express" the user's thoughts. The information to be input, and thus the accuracy of the input information cannot be guaranteed.
发明内容Summary of the invention
本申请提供一种图像信息输入方法、电子设备及计算机存储介质,可以提高视觉输入信息的准确性。This application provides an image information input method, electronic equipment, and computer storage medium, which can improve the accuracy of visual input information.
为达到上述目的,本申请采用如下技术方案:In order to achieve the above objectives, this application adopts the following technical solutions:
第一方面,本申请实施例提供了一种图像信息输入方法,该方法包括:获取待处理图像;对所述待处理图像进行分类处理,获得第一分类结果;根据所述第一分类结果选择对应的分类模型,将所述待处理图像输入所述分类模型,获得所述分类模型输出的第二分类结果;将所述第二分类结果作为所述待处理图像的信息标签输入。In the first aspect, an embodiment of the present application provides an image information input method, the method includes: acquiring a to-be-processed image; classifying the to-be-processed image to obtain a first classification result; selecting according to the first classification result Corresponding classification model, input the image to be processed into the classification model to obtain a second classification result output by the classification model; input the second classification result as the information label of the image to be processed.
上述的图像信息输入方法中,对待处理图像进行了两次分类,相比于只进行一次分类,进行两次分类后得到的分类结果更加精确。进一步的,在进行第二次分类的过程中,由于是根据第一次分类得到的第一分类结果选择的分类模型,因此,第二次分类相当于对第一分类结果进行的再分类,即第二次分类的粒度级别低于第一分类的粒度级别。换言之,第二次分类得到的第二分类结果与第一分类结果相比更加精确,提高了对待处理图像的语义识别的准确度,进而在将第二分类结果作为待处理图像的信息标签输入时,可以提高视觉输入信息的准确性,具有较强的易用性和实用性。In the above-mentioned image information input method, the image to be processed is classified twice. Compared with the classification only once, the classification result obtained after the two classifications is more accurate. Furthermore, in the process of performing the second classification, since it is the classification model selected according to the first classification result obtained from the first classification, the second classification is equivalent to the reclassification of the first classification result, namely The granularity level of the second classification is lower than the granularity level of the first classification. In other words, the second classification result obtained by the second classification is more accurate than the first classification result, which improves the accuracy of semantic recognition of the image to be processed, and then when the second classification result is input as the information label of the image to be processed , Can improve the accuracy of visual input information, has strong ease of use and practicality.
结合第一方面,在一些实施例中,所述第一分类结果中包括至少一个第一类别标 签。所述根据所述第一分类结果选择对应的分类模型,将所述待处理图像输入所述分类模型,获得所述分类模型输出的第二分类结果,包括:从所述待处理图像中提取出所述第一分类结果中每个所述第一类别标签对应的子图像,并获取所述第一分类结果中每个所述第一类别标签对应的分类模型;将所述第一分类结果中第i个第一类别标签对应的子图像输入到所述第i个第一类别标签对应的分类模型中,获得所述第i个第一类别标签的子标签,其中,所述i为小于或等于N的正整数,N为所述第一分类结果中第一类别标签的数量;将所述第一分类结果中每个所述第一类别标签的子标签作为所述第二分类结果。With reference to the first aspect, in some embodiments, the first classification result includes at least one first category label. The selecting a corresponding classification model according to the first classification result, inputting the to-be-processed image into the classification model, and obtaining the second classification result output by the classification model includes: extracting from the to-be-processed image Sub-images corresponding to each of the first category labels in the first classification result, and obtaining the classification model corresponding to each of the first category labels in the first classification result; The sub-image corresponding to the i-th first category label is input into the classification model corresponding to the i-th first category label to obtain the sub-label of the i-th first category label, where i is less than or A positive integer equal to N, where N is the number of first category labels in the first classification result; each sub-label of the first category label in the first classification result is used as the second classification result.
上述进行第二次分类的过程中,第一分类结果中每个第一类别标签各自对应一个分类模型,利用第一类别标签各自对应的分类模型对第一类别标签各自对应的子图像进行分类,即对第一分类结果进行更细粒度的分类,相当于在一个大类的基础上划分小类,从而提高了对待处理图像的语义识别的精确度,具有较强的易用性和实用性。In the above second classification process, each first category label in the first classification result corresponds to a classification model, and the respective classification models corresponding to the first category labels are used to classify the respective sub-images corresponding to the first category labels. That is, the more fine-grained classification of the first classification result is equivalent to dividing small classes on the basis of a large class, thereby improving the accuracy of semantic recognition of the image to be processed, and has strong ease of use and practicability.
结合第一方面,在一些实施例中,将所述待处理图像输入所述分类模型,获得所述分类模型输出的第二分类结果之后,还包括:根据所述第一分类结果和/或所述第二分类结果,获取所述第一分类结果和/或所述第二分类结果对应的扩展信息,其中,所述扩展信息为与所述第一分类结果和/或所述第二分类结果相关的信息。相应的,所述将所述第二分类结果作为所述待处理图像的信息标签输入,包括:将所述第二分类结果和/或所述扩展信息作为所述待处理图像的信息标签输入。With reference to the first aspect, in some embodiments, inputting the to-be-processed image into the classification model and obtaining the second classification result output by the classification model further includes: according to the first classification result and/or The second classification result is used to obtain the extended information corresponding to the first classification result and/or the second classification result, where the extended information corresponds to the first classification result and/or the second classification result relevant information. Correspondingly, the inputting the second classification result as the information label of the image to be processed includes: inputting the second classification result and/or the extended information as the information label of the image to be processed.
将与第一分类结果和/或第二分类结果相关的扩展信息也作为待处理图像的信息标签输入,从而根据待处理图像能够“推测”出更多的语义信息,增加了待处理图像的语义识别结果的丰富性,保证了待处理图像的语义识别结果的完整性,进而提高了视觉输入方法的智能程度。The extended information related to the first classification result and/or the second classification result is also input as the information label of the image to be processed, so that more semantic information can be "guessed" from the image to be processed, which increases the semantics of the image to be processed The richness of the recognition results ensures the completeness of the semantic recognition results of the image to be processed, thereby improving the intelligence of the visual input method.
结合第一方面,在一些实施例中,所述获取所述第一分类结果和/或所述第二分类结果对应的扩展信息,包括:将所述第一分类结果和/或所述第二分类结果输入到预设的指令检测模型中,获得所述指令检测模型输出的信息查询指令;根据所述信息查询指令查询所述第一分类结果和/或所述第二分类结果对应的扩展信息。With reference to the first aspect, in some embodiments, the acquiring the extended information corresponding to the first classification result and/or the second classification result includes: combining the first classification result and/or the second classification result The classification result is input into the preset instruction detection model, and the information query instruction output by the instruction detection model is obtained; according to the information query instruction, the first classification result and/or the extended information corresponding to the second classification result are inquired .
其中,预设的指令检测模型可以用于反映出用户的信息查询习惯,根据指令检测模型输出的信息查询指令查询扩展信息,相当于根据用户的信息查询习惯查询扩展信息,从而使得查询出的扩展信息更贴近用户想要表达的语义信息。Among them, the preset instruction detection model can be used to reflect the user's information query habits, and query the extended information according to the information query instructions output by the instruction detection model. The information is closer to the semantic information that the user wants to express.
结合第一方面,在一些实施例中,所述获取待处理图像,包括:获取待处理视频,并从所述待处理视频中提取图像信息,其中,所述图像信息中包括至少一帧图片;将所述图像信息中的至少一帧图片作为所述待处理图像。With reference to the first aspect, in some embodiments, the obtaining an image to be processed includes: obtaining a video to be processed, and extracting image information from the video to be processed, wherein the image information includes at least one frame of picture; Use at least one picture in the image information as the image to be processed.
结合第一方面,在一些实施例中,在所述获取待处理视频之后,还包括:从所述待处理视频中提取音频信息;对所述音频信息进行语音识别处理,获得所述音频信息的信息标签。相应的,所述将所述第二分类结果作为所述待处理图像的信息标签输入,包括:将所述音频信息的信息标签和所述第二分类结果作为所述待处理图像的信息标签输入。With reference to the first aspect, in some embodiments, after the obtaining the to-be-processed video, it further includes: extracting audio information from the to-be-processed video; performing voice recognition processing on the audio information to obtain the audio information Information label. Correspondingly, the inputting the second classification result as the information label of the image to be processed includes: inputting the information label of the audio information and the second classification result as the information label of the image to be processed .
当用户输入的为视频时,可以从视频中提取图像信息和音频信息,对图像信息进行图像识别处理得到图像的信息标签,对音频信息进行语音识别处理得到音频的信息 标签,然后将图像的信息标签和音频的信息标签均作为图像的信息标签输入。由于最终的待处理图像的信息标签中融合了图像信息和音频信息,因此,增加了待处理图像的信息标签的多样性。从而提高了视觉输入信息的丰富性,进而提高了视觉输入方法的智能程度,具有较强的易用性和实用性。When the user input is video, you can extract image information and audio information from the video, perform image recognition processing on the image information to obtain the information label of the image, perform voice recognition processing on the audio information to obtain the audio information label, and then combine the image information Both the label and the audio information label are input as the image information label. Since image information and audio information are integrated into the information label of the final image to be processed, the diversity of the information label of the image to be processed is increased. Thereby, the richness of visual input information is improved, and the intelligence of the visual input method is improved, and it has strong ease of use and practicality.
结合第一方面,在一些实施例中,所述第二分类结果中包括至少一个第二类别标签;所述将所述音频信息的信息标签和所述第二分类结果作为所述待处理图像的信息标签输入,包括:若所述第二分类结果中存在相同的第二类别标签,则对所述第二分类结果进行去重处理;将所述音频信息的信息标签和去重处理后的所述第二分类结果作为所述待处理图像的信息标签输入。With reference to the first aspect, in some embodiments, the second classification result includes at least one second category label; the information label of the audio information and the second classification result are used as the information of the image to be processed The information label input includes: if the same second category label exists in the second classification result, de-duplicating the second classification result; deduplicating the information label of the audio information and all the de-duplication processing The second classification result is input as the information label of the image to be processed.
结合第一方面,在一些实施例中,所述将所述音频信息的信息标签和去重处理后的所述第二分类结果作为所述待处理图像的信息标签输入,包括:若去重处理后的所述第二分类结果中存在第一目标标签,则将所述第一目标标签作为所述待处理图像的信息标签输入,其中,所述第一目标标签为去重处理后的所述第二分类结果中与所述音频信息的信息标签匹配的所述第二类别标签;若去重处理后的所述第二分类结果中不存在所述第一目标标签,则搜索去重处理后的所述第二分类结果中所述第二类别标签的扩展信息,直到搜索出第二目标标签,并将所述第二目标标签作为所述待处理图像的信息标签输入;其中,所述第二目标标签为与所述音频信息的信息标签匹配的所述第二类别标签的扩展信息。With reference to the first aspect, in some embodiments, the inputting the information label of the audio information and the second classification result after deduplication processing as the information label of the image to be processed includes: If there is a first target label in the second classification result, the first target label is input as the information label of the image to be processed, wherein the first target label is the deduplicated processed image. The second category label in the second classification result that matches the information label of the audio information; if the first target label does not exist in the second classification result after deduplication processing, search for The extended information of the second category label in the second classification result is until the second target label is searched out, and the second target label is input as the information label of the image to be processed; wherein, the first The second target tag is the extended information of the second category tag that matches the information tag of the audio information.
识别出的图像的信息标签和音频的信息标签可能存在部分相同的内容和部分不同的内容,而相同的内容往往是用户想要表达的语义信息。因此,将第二分类结果与音频信息的信息标签相匹配的部分作为待处理图像的信息标签输入,即提取出图像和音频中包含的相同或相似的语义信息,相当于对识别出的语义信息进行了一次筛选。从而使得待处理图像的信息标签更加贴近用户想要表达的语义信息,进而可以提高视觉输入信息的准确性。The identified image information tags and audio information tags may have part of the same content and part of different content, and the same content is often semantic information that the user wants to express. Therefore, the part of the second classification result that matches the information label of the audio information is input as the information label of the image to be processed, that is, the same or similar semantic information contained in the image and audio is extracted, which is equivalent to the identification of the semantic information. There was a screening. As a result, the information label of the image to be processed is closer to the semantic information that the user wants to express, and the accuracy of the visual input information can be improved.
在另一些实施例中,还可以将所述第一目标标签或所述第二目标标签作为所述待处理图像的信息标签中的首推标签显示给用户。例如:将待处理图像的信息标签按照序列的形式显示给用户,首推标签位于序列的起始位置。再例如:将首推标签的字体颜色与非首推标签的字体颜色进行区分(如首推标签的字体颜色为红色,非首推标签的字体颜色为黑色),以使用户能够在待处理图像的信息标签中较快地注意到首推标签。In other embodiments, the first target tag or the second target tag may also be displayed to the user as the first-preferred tag in the information tags of the image to be processed. For example: the information tags of the image to be processed are displayed to the user in the form of a sequence, and the first push tag is located at the beginning of the sequence. Another example: distinguish the font color of the first-preferred label from the font color of the non-preferred label (for example, the font color of the first-preferred label is red, and the font color of the non-preferred label is black), so that the user can display the image to be processed Notice the top-preferred tag quickly in the information tag of.
第二方面,本申请实施例提供了一种图像信息输入装置,该装置包括:获取单元,用于获取待处理图像;第一分类单元,用于对所述待处理图像进行分类处理,获得第一分类结果;第二分类单元,用于根据所述第一分类结果选择对应的分类模型,将所述待处理图像输入所述分类模型,获得所述分类模型输出的第二分类结果;信息输入单元,用于将所述第二分类结果作为所述待处理图像的信息标签输入。In a second aspect, an embodiment of the present application provides an image information input device. The device includes: an acquisition unit for acquiring an image to be processed; a first classification unit for classifying the image to be processed to obtain a first A classification result; a second classification unit, configured to select a corresponding classification model according to the first classification result, input the to-be-processed image into the classification model, and obtain a second classification result output by the classification model; information input The unit is used to input the second classification result as the information label of the image to be processed.
第三方面,本申请实施例提供了一种电子设备,所述电子设备包括处理器,所述处理器用于运行存储器中存储的计算机程序,以实现如第一方面任一种可能的实施方式提供的方法。In a third aspect, an embodiment of the present application provides an electronic device, the electronic device includes a processor, and the processor is configured to run a computer program stored in a memory, so as to implement any of the possible implementation manners provided in the first aspect Methods.
第四方面,本申请实施例提供了一种计算机可读存储介质,包括计算机指令,当 所述计算机指令在计算机或处理器上运行时,使得所述计算机或处理器执行如第一方面任一种可能的实施方式提供的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, including computer instructions, which when the computer instructions run on a computer or processor, cause the computer or processor to execute any of the The methods provided by the possible implementations.
第五方面,本申请实施例提供了一种计算机程序产品,当计算机程序产品在计算机或处理器上运行时,使得计算机或处理器执行如第一方面任一种可能的实施方式提供的方法。In the fifth aspect, the embodiments of the present application provide a computer program product. When the computer program product runs on a computer or a processor, the computer or the processor executes the method provided in any one of the possible implementation manners of the first aspect.
可以理解地,上述提供的第三方面所述的电子设备、第四方面所述的计算机存储介质或者第五方面所述的计算机程序产品均用于执行第一方面所提供的方法。因此,其所能达到的有益效果可参考对应方法中的有益效果,此处不再赘述。It is understandable that the electronic device described in the third aspect, the computer storage medium described in the fourth aspect, or the computer program product described in the fifth aspect provided above are all used to execute the method provided in the first aspect. Therefore, the beneficial effects that can be achieved can refer to the beneficial effects in the corresponding method, which will not be repeated here.
附图说明Description of the drawings
图1是本申请实施例提供的电子设备100的结构示意图;FIG. 1 is a schematic structural diagram of an electronic device 100 provided by an embodiment of the present application;
图2是本申请实施例提供的电子设备100的软件结构框图;2 is a block diagram of the software structure of the electronic device 100 provided by an embodiment of the present application;
图3(a)至图3(f)是本申请实施例提供的应用界面的示意图;Figures 3(a) to 3(f) are schematic diagrams of application interfaces provided by embodiments of the present application;
图4是本申请实施例提供的图像信息输入方法的流程示意图;4 is a schematic flowchart of an image information input method provided by an embodiment of the present application;
图5是本申请实施例提供的待处理图像的示意图;FIG. 5 is a schematic diagram of a to-be-processed image provided by an embodiment of the present application;
图6是本申请实施例提供的待处理图像的分割示意图;FIG. 6 is a schematic diagram of segmentation of a to-be-processed image provided by an embodiment of the present application;
图7(a)和图7(b)是本申请实施例提供的用户与电子设备的交互示意图;Figures 7(a) and 7(b) are schematic diagrams of interaction between a user and an electronic device provided by an embodiment of the present application;
图8是本申请又一实施例提供的图像信息输入方法的流程示意图;FIG. 8 is a schematic flowchart of an image information input method provided by another embodiment of the present application;
图9是本申请实施例提供的图像信息输入装置的结构框图。Fig. 9 is a structural block diagram of an image information input device provided by an embodiment of the present application.
具体实施方式detailed description
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are proposed for a thorough understanding of the embodiments of the present application. However, it should be clear to those skilled in the art that the present application can also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to avoid unnecessary details from obstructing the description of this application.
应当理解,当在本申请说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that when used in the specification and appended claims of this application, the term "comprising" indicates the existence of the described features, wholes, steps, operations, elements and/or components, but does not exclude one or more other The existence or addition of features, wholes, steps, operations, elements, components, and/or collections thereof.
还应当理解,在本申请实施例中,“至少一个”是指一个或一个以上。It should also be understood that in the embodiments of the present application, "at least one" refers to one or more than one.
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should also be understood that the term "and/or" used in the specification and appended claims of this application refers to any combination of one or more of the items listed in the associated and all possible combinations, and includes these combinations.
如在本申请说明书和所附权利要求书中所使用的那样,术语“若”可以依据上下文被解释为“当...时”或“一旦”或“响应于”。As used in the description of this application and the appended claims, the term "if" can be construed as "when" or "once" or "in response to" depending on the context.
另外,在本申请说明书和所附权利要求书的描述中,术语“第一”、“第二”等仅用于区分描述,而不能理解为指示或暗示相对重要性。In addition, in the description of the specification of this application and the appended claims, the terms "first", "second", etc. are only used to distinguish the description, and cannot be understood as indicating or implying relative importance.
在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。The reference to "one embodiment" or "some embodiments" described in the specification of this application means that one or more embodiments of this application include a specific feature, structure, or characteristic described in combination with the embodiment. Therefore, the sentences "in one embodiment", "in some embodiments", "in some other embodiments", "in some other embodiments", etc. appearing in different places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments", unless otherwise specifically emphasized. The terms "including", "including", "having" and their variations all mean "including but not limited to" unless otherwise specifically emphasized.
本申请实施例中提供的一种图像信息输入方法中所涉及到的步骤仅仅作为示例,并非所有的步骤均是必须执行的步骤,或者并非各个信息或消息中的内容均是必选的,在使用过程中可以根据需要酌情增加或减少。The steps involved in the image information input method provided in the embodiments of this application are only examples. Not all steps are mandatory steps, or not all information or content in the message is mandatory. During use, it can be increased or decreased as needed.
本申请实施例中同一个步骤或者具有相同功能的步骤或者消息在不同实施例之间可以互相参考借鉴。The same step or steps or messages with the same function in the embodiments of the present application may refer to each other among different embodiments.
本申请实施例描述的业务场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着网络架构的演变和新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The business scenarios described in the embodiments of this application are intended to more clearly illustrate the technical solutions of the embodiments of this application, and do not constitute a limitation on the technical solutions provided in the embodiments of this application. Those of ordinary skill in the art will know that as the network architecture evolves As with the emergence of new business scenarios, the technical solutions provided in the embodiments of this application are equally applicable to similar technical problems.
为了说明本申请所述的技术方案,下面通过具体实施例来进行说明。In order to illustrate the technical solution described in the present application, specific embodiments are used for description below.
首先介绍本申请实施例涉及的电子设备,该电子设备可以是手机、平板电脑、可穿戴设备、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)等终端设备,本申请实施例对终端设备的具体类型不作任何限制。First, the electronic equipment involved in the embodiments of this application is introduced. The electronic equipment may be a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a notebook computer, For terminal devices such as ultra-mobile personal computers (UMPC), netbooks, and personal digital assistants (personal digital assistants, PDAs), the embodiments of this application do not impose any restrictions on the specific types of terminal devices.
请参阅图1,图1是本申请实施例提供的一种电子设备100的结构示意图。Please refer to FIG. 1. FIG. 1 is a schematic structural diagram of an electronic device 100 according to an embodiment of the present application.
电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2. , Mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 can include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light Sensor 180L, bone conduction sensor 180M, etc.
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。示例性的,处理器用于执行本申请实施例提供的图像信息输入方法,例如,处理器执行下述步骤S401-S404或步骤S901-S906。The processor 110 may include one or more processing units. For example, the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Among them, the different processing units may be independent devices or integrated in one or more processors. Exemplarily, the processor is configured to execute the image information input method provided in the embodiment of the present application. For example, the processor executes the following steps S401-S404 or steps S901-S906.
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。The controller may be the nerve center and command center of the electronic device 100. The controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching and executing instructions.
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。A memory may also be provided in the processor 110 to store instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 110 is reduced, and the efficiency of the system is improved.
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。In some embodiments, the processor 110 may include one or more interfaces. The interface can include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter (universal asynchronous transmitter) interface. receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / Or Universal Serial Bus (USB) interface, etc.
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K,充电器,闪光灯,摄像头193等。例如:处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现电子设备100的触摸功能。The I2C interface is a bidirectional synchronous serial bus, which includes a serial data line (SDA) and a serial clock line (SCL). In some embodiments, the processor 110 may include multiple sets of I2C buses. The processor 110 may couple the touch sensor 180K, charger, flash, camera 193, etc., respectively through different I2C bus interfaces. For example, the processor 110 may couple the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to implement the touch function of the electronic device 100.
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。The I2S interface can be used for audio communication. In some embodiments, the processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit audio signals to the wireless communication module 160 through an I2S interface, so as to realize the function of answering calls through a Bluetooth headset.
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。The PCM interface can also be used for audio communication to sample, quantize and encode analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器110与无线通信模块160。例如:处理器110通过UART接口与无线通信模块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口向无线通信模块160传递音频信号,实现通过蓝牙耳机播放音乐的功能。The UART interface is a universal serial data bus used for asynchronous communication. The bus can be a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, the UART interface is generally used to connect the processor 110 and the wireless communication module 160. For example, the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to realize the Bluetooth function. In some embodiments, the audio module 170 may transmit audio signals to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a Bluetooth headset.
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子设备100的显示功能。The MIPI interface can be used to connect the processor 110 with the display screen 194, the camera 193 and other peripheral devices. The MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc. In some embodiments, the processor 110 and the camera 193 communicate through a CSI interface to implement the shooting function of the electronic device 100. The processor 110 and the display screen 194 communicate through a DSI interface to realize the display function of the electronic device 100.
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被 配置为I2C接口,I2S接口,UART接口,MIPI接口等。The GPIO interface can be configured through software. The GPIO interface can be configured as a control signal or as a data signal. In some embodiments, the GPIO interface can be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and so on. The GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为电子设备100充电,也可以用于电子设备100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。The USB interface 130 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on. The USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transfer data between the electronic device 100 and peripheral devices. It can also be used to connect earphones and play audio through earphones. The interface can also be used to connect other electronic devices, such as AR devices.
可以理解的是,本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。It can be understood that the interface connection relationship between the modules illustrated in the embodiment of the present application is merely a schematic description, and does not constitute a structural limitation of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过电子设备100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备供电。The charging management module 140 is used to receive charging input from the charger. Among them, the charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive the charging input of the wired charger through the USB interface 130. In some embodiments of wireless charging, the charging management module 140 may receive the wireless charging input through the wireless charging coil of the electronic device 100. While the charging management module 140 charges the battery 142, it can also supply power to the electronic device through the power management module 141.
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, and the wireless communication module 160. The power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance). In some other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may also be provided in the same device.
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。The wireless communication function of the electronic device 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。The antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in the electronic device 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example, antenna 1 can be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna can be used in combination with a tuning switch.
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。The mobile communication module 150 may provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the electronic device 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), and the like. The mobile communication module 150 can receive electromagnetic waves by the antenna 1, and perform processing such as filtering, amplifying and transmitting the received electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic wave radiation via the antenna 1. In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110. In some embodiments, at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独 立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。The modem processor may include a modulator and a demodulator. Among them, the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing. The low-frequency baseband signal is processed by the baseband processor and then passed to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays an image or video through the display screen 194. In some embodiments, the modem processor may be an independent device. In other embodiments, the modem processor may be independent of the processor 110 and be provided in the same device as the mobile communication module 150 or other functional modules.
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。The wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellites. System (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be sent from the processor 110, perform frequency modulation, amplify it, and convert it into electromagnetic waves to radiate through the antenna 2.
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。In some embodiments, the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology. The wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc. The GNSS may include the global positioning system (GPS), the global navigation satellite system (GLONASS), the Beidou navigation satellite system (BDS), and the quasi-zenith satellite system (quasi). -zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。The electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations and is used for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。The display screen 194 is used to display images, videos, and the like. The display screen 194 includes a display panel. The display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active-matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode). AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc. In some embodiments, the electronic device 100 may include one or N display screens 194, and N is a positive integer greater than one.
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。The electronic device 100 can realize a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, and an application processor.
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可 以设置在摄像头193中。The ISP is used to process the data fed back from the camera 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transfers the electrical signal to the ISP for processing and is converted into an image visible to the naked eye. ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 193.
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。示例性的,摄像头用于获取本申请实施例提供的图像信息输入方法中的待处理图像,或者待处理视频中的图像。The camera 193 is used to capture still images or videos. The object generates an optical image through the lens and is projected to the photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal. ISP outputs digital image signals to DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats of image signals. In some embodiments, the electronic device 100 may include one or N cameras 193, and N is a positive integer greater than one. Exemplarily, the camera is used to obtain the image to be processed in the image information input method provided in the embodiment of the present application, or the image in the video to be processed.
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in multiple encoding formats, such as: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。NPU is a neural-network (NN) computing processor. By drawing on the structure of biological neural networks, for example, the transfer mode between human brain neurons, it can quickly process input information, and it can also continuously self-learn. The NPU can realize applications such as intelligent cognition of the electronic device 100, such as image recognition, face recognition, voice recognition, text understanding, and so on.
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100. The external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。The internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions. The processor 110 executes various functional applications and data processing of the electronic device 100 by running instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. Among them, the storage program area can store an operating system, at least one application program (such as a sound playback function, an image playback function, etc.) required by at least one function. The data storage area can store data (such as audio data, phone book, etc.) created during the use of the electronic device 100. In addition, the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and the like.
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。The electronic device 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. For example, music playback, recording, etc.
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。The audio module 170 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal. The audio module 170 can also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。The speaker 170A, also called "speaker", is used to convert audio electrical signals into sound signals. The electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。The receiver 170B, also called a "handset", is used to convert audio electrical signals into sound signals. When the electronic device 100 answers a call or voice message, it can receive the voice by bringing the receiver 170B close to the human ear.
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。示例性的,麦克风可以用于采集本申请实施例提供的图像信息输入方法中待处理视频的音频。The microphone 170C, also called "microphone", "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound by approaching the microphone 170C through the human mouth, and input the sound signal into the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement noise reduction functions in addition to collecting sound signals. In other embodiments, the electronic device 100 may also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions. Exemplarily, the microphone may be used to collect the audio of the video to be processed in the image information input method provided in the embodiment of the present application.
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。The earphone interface 170D is used to connect wired earphones. The earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, or a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。电子设备100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备100根据压力传感器180A检测所述触摸操作强度。电子设备100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。The pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be provided on the display screen 194. There are many types of pressure sensors 180A, such as resistive pressure sensors, inductive pressure sensors, capacitive pressure sensors and so on. The capacitive pressure sensor may include at least two parallel plates with conductive materials. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes. The electronic device 100 determines the intensity of the pressure according to the change in capacitance. When a touch operation acts on the display screen 194, the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A. The electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A. In some embodiments, touch operations that act on the same touch position but have different touch operation intensities can correspond to different operation instructions. For example: when a touch operation whose intensity is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。The gyro sensor 180B may be used to determine the movement posture of the electronic device 100. In some embodiments, the angular velocity of the electronic device 100 around three axes (ie, x, y, and z axes) can be determined by the gyroscope sensor 180B. The gyro sensor 180B can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor 180B detects the shake angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the electronic device 100 through reverse movement to achieve anti-shake. The gyro sensor 180B can also be used for navigation and somatosensory game scenes.
气压传感器180C用于测量气压。在一些实施例中,电子设备100通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。The air pressure sensor 180C is used to measure air pressure. In some embodiments, the electronic device 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。在一些实施例中,当电子设备100是翻盖机时,电子设备100可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。The magnetic sensor 180D includes a Hall sensor. The electronic device 100 can use the magnetic sensor 180D to detect the opening and closing of the flip holster. In some embodiments, when the electronic device 100 is a flip machine, the electronic device 100 can detect the opening and closing of the flip according to the magnetic sensor 180D. Then, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, features such as automatic unlocking of the flip cover are set.
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。The acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and be used in applications such as horizontal and vertical screen switching, pedometers and so on.
距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器180F测距以实现快速对焦。Distance sensor 180F, used to measure distance. The electronic device 100 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 may use the distance sensor 180F to measure the distance to achieve fast focusing.
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。电子设备100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备100附近有物体。当检测到不充分的反射光时,电子设备100可以确定电子设备100附近没有物体。电子设备100可以利用接近光传感器180G检测用户手持电子设备100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。The proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic device 100 emits infrared light to the outside through the light emitting diode. The electronic device 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 can determine that there is no object near the electronic device 100. The electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power. The proximity light sensor 180G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.
环境光传感器180L用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备100是否在口袋里,以防误触。The ambient light sensor 180L is used to sense the brightness of the ambient light. The electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived brightness of the ambient light. The ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures. The ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in the pocket to prevent accidental touch.
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。The fingerprint sensor 180H is used to collect fingerprints. The electronic device 100 can use the collected fingerprint characteristics to implement fingerprint unlocking, access application locks, fingerprint photographs, fingerprint answering calls, and so on.
温度传感器180J用于检测温度。在一些实施例中,电子设备100利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,电子设备100执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,电子设备100对电池142加热,以避免低温导致电子设备100异常关机。在其他一些实施例中,当温度低于又一阈值时,电子设备100对电池142的输出电压执行升压,以避免低温导致的异常关机。The temperature sensor 180J is used to detect temperature. In some embodiments, the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the electronic device 100 reduces the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection. In other embodiments, when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to avoid abnormal shutdown of the electronic device 100 due to low temperature. In some other embodiments, when the temperature is lower than another threshold, the electronic device 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。Touch sensor 180K, also called "touch panel". The touch sensor 180K may be provided on the display screen 194, and the touch screen is composed of the touch sensor 180K and the display screen 194, which is also called a “touch screen”. The touch sensor 180K is used to detect touch operations acting on or near it. The touch sensor can pass the detected touch operation to the application processor to determine the type of touch event. The visual output related to the touch operation can be provided through the display screen 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100, which is different from the position of the display screen 194.
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器180M也可以设置于耳机中,结合成骨传导耳机。音频模块170可以基于所述骨传导传感器180M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器180M获取的血压跳动信号解析心率信息,实现心率检测功能。The bone conduction sensor 180M can acquire vibration signals. In some embodiments, the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice. The bone conduction sensor 180M can also contact the human pulse and receive the blood pressure pulse signal. In some embodiments, the bone conduction sensor 180M may also be provided in the earphone, combined with the bone conduction earphone. The audio module 170 can parse the voice signal based on the vibration signal of the vibrating bone block of the voice obtained by the bone conduction sensor 180M, and realize the voice function. The application processor may analyze the heart rate information based on the blood pressure beating signal obtained by the bone conduction sensor 180M, and realize the heart rate detection function.
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。The button 190 includes a power-on button, a volume button, and so on. The button 190 may be a mechanical button. It can also be a touch button. The electronic device 100 may receive key input, and generate key signal input related to user settings and function control of the electronic device 100.
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应 不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。The motor 191 can generate vibration prompts. The motor 191 can be used for incoming call vibration notification, and can also be used for touch vibration feedback. For example, touch operations for different applications (such as taking photos, audio playback, etc.) can correspond to different vibration feedback effects. Acting on touch operations in different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects. Different application scenarios (for example: time reminding, receiving information, alarm clock, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also support customization.
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。The indicator 192 may be an indicator light, which may be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on.
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和电子设备100的接触和分离。电子设备100可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。电子设备100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,电子设备100采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在电子设备100中,不能和电子设备100分离。The SIM card interface 195 is used to connect to the SIM card. The SIM card can be inserted into the SIM card interface 195 or pulled out from the SIM card interface 195 to achieve contact and separation with the electronic device 100. The electronic device 100 may support 1 or N SIM card interfaces, and N is a positive integer greater than 1. The SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc. The same SIM card interface 195 can insert multiple cards at the same time. The types of the multiple cards can be the same or different. The SIM card interface 195 can also be compatible with different types of SIM cards. The SIM card interface 195 can also be compatible with external memory cards. The electronic device 100 interacts with the network through the SIM card to implement functions such as call and data communication. In some embodiments, the electronic device 100 adopts an eSIM, that is, an embedded SIM card. The eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.
电子设备100的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本发明实施例以分层架构的Android系统为例,示例性说明电子设备100的软件结构。The software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. The embodiment of the present invention takes an Android system with a layered architecture as an example to illustrate the software structure of the electronic device 100 by way of example.
图2是本申请实施例提供的电子设备100的软件结构框图。FIG. 2 is a block diagram of the software structure of the electronic device 100 provided by an embodiment of the present application.
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,系统库和安卓运行时(Android runtime),以及内核层。The layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, the application layer, the application framework layer, the system library and the Android runtime (Android runtime), and the kernel layer.
应用程序层可以包括一系列应用程序包。The application layer can include a series of application packages.
如图2所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,输入法,蓝牙,音乐,视频,短信息等应用程序。As shown in Figure 2, the application package can include applications such as camera, gallery, calendar, call, map, navigation, input method, Bluetooth, music, video, short message, etc.
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。The application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer. The application framework layer includes some predefined functions.
如图2所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。As shown in Figure 2, the application framework layer can include a window manager, a content provider, a view system, a phone manager, a resource manager, and a notification manager.
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。The window manager is used to manage window programs. The window manager can obtain the size of the display screen, determine whether there is a status bar, lock the screen, take a screenshot, etc.
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。The content provider is used to store and retrieve data and make these data accessible to applications. The data may include video, image, audio, phone calls made and received, browsing history and bookmarks, phone book, etc.
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。The view system includes visual controls, such as controls that display text, controls that display pictures, and so on. The view system can be used to build applications. The display interface can be composed of one or more views. For example, a display interface that includes a short message notification icon may include a view that displays text and a view that displays pictures.
电话管理器用于提供电子设备100的通信功能。例如通话状态的管理(包括接通,挂断等)。The phone manager is used to provide the communication function of the electronic device 100. For example, the management of the call status (including connecting, hanging up, etc.).
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的 消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。The notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and it can disappear automatically after a short stay without user interaction. For example, the notification manager is used to notify download completion, message reminders, and so on. The notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or a scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window. For example, prompt text information in the status bar, sound a prompt sound, electronic device vibration, flashing indicator light, etc.
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。Android Runtime includes core libraries and virtual machines. Android runtime is responsible for the scheduling and management of the Android system.
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。The core library consists of two parts: one part is the function function that the java language needs to call, and the other part is the core library of Android.
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。The application layer and the application framework layer run in a virtual machine. The virtual machine executes the java files of the application layer and the application framework layer as binary files. The virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。The system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (for example: OpenGL ES), 2D graphics engine (for example: SGL), etc.
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。The surface manager is used to manage the display subsystem and provides a combination of 2D and 3D layers for multiple applications.
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。The media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files. The media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。The 3D graphics processing library is used to realize 3D graphics drawing, image rendering, synthesis, and layer processing.
2D图形引擎是2D绘图的绘图引擎。The 2D graphics engine is a drawing engine for 2D drawing.
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。The kernel layer is the layer between hardware and software. The kernel layer contains at least display driver, camera driver, audio driver, and sensor driver.
下述图像信息输入方法实施例可以在具有上述硬件结构/软件结构的手机上实现。The following embodiment of the image information input method can be implemented on a mobile phone with the above hardware structure/software structure.
下面对本申请实施例涉及的视觉输入进行介绍。视觉输入可以实现从用户输入的视觉信息中识别出视觉信息包含的语义信息。识别出的语义信息可以用于信息的输入。视觉信息可以包括静态图像(如图片)或动态图像(如视频等)。The visual input involved in the embodiments of the present application will be introduced below. Visual input can recognize the semantic information contained in the visual information from the visual information input by the user. The identified semantic information can be used for information input. Visual information can include static images (such as pictures) or dynamic images (such as videos, etc.).
视觉输入可以是一个单独的应用程序,也可以是一个应用程序中的一个功能。The visual input can be a separate application or a function in an application.
当视觉输入是一个单独的应用程序时,视觉输入可以是电子设备自带的系统程序,也可以是安装在电子设备上的第三方应用程序。当用户需要使用视觉输入功能进行信息输入时,用户首先需要选择视觉输入的应用程序作为信息输入的工具。When the visual input is a separate application, the visual input can be a system program that comes with the electronic device or a third-party application installed on the electronic device. When the user needs to use the visual input function for information input, the user first needs to select the visual input application as a tool for information input.
当视觉输入是一个应用程序中的一个功能时,应用程序的界面上可以包括预设按钮。预设按钮用于启动视觉输入功能。When the visual input is a function in an application, the interface of the application may include preset buttons. The preset button is used to activate the visual input function.
下面以视觉输入作为一个应用程序中的一个功能为例,介绍视觉输入的一个应用场景。本应用场景中,应用程序为输入法应用。参见图3(a)至图3(f),图3(a)至图3(f)是本申请实施例提供的一种应用界面的示意图。如图3(a)所示,为电子设备100的信息输入界面10。信息输入界面10包括信息输入框101、虚拟键盘控件102和图像输入控件103。其中:The following takes visual input as a function in an application as an example to introduce an application scenario of visual input. In this application scenario, the application is an input method application. Referring to Figures 3(a) to 3(f), Figures 3(a) to 3(f) are schematic diagrams of an application interface provided by an embodiment of the present application. As shown in FIG. 3(a), it is an information input interface 10 of an electronic device 100. The information input interface 10 includes an information input box 101, a virtual keyboard control 102 and an image input control 103. in:
信息输入框101,用于显示输入信息。信息输入框可以是搜索框、短信息的消息 发送框、查询框等需要输入信息的区域。本应用场景中以信息输入框为搜索框为例。The information input box 101 is used to display input information. The information input box can be a search box, short message sending box, query box and other areas where information needs to be input. In this application scenario, the information input box is the search box as an example.
虚拟键盘控件102,用于用户向信息输入框输入信息。The virtual keyboard control 102 is used for the user to input information into the information input box.
图像输入控件103,用于启动视觉输入功能。如图3(a)所示,图像输入控件可以设置于虚拟键盘控件上。在另一个应用场景中,图像输入控件也可以与虚拟键盘控件分开设置,例如,图像输入控件设置于虚拟键盘控件的右侧、左侧或上方等。The image input control 103 is used to activate the visual input function. As shown in Figure 3(a), the image input control can be set on the virtual keyboard control. In another application scenario, the image input control can also be set separately from the virtual keyboard control. For example, the image input control is set on the right, left, or above the virtual keyboard control.
用户可通过用户操作进入信息输入界面10。下面结合电子设备100软件以及硬件的工作流程对通过用户操作进入信息输入界面的过程进行说明。示例性的:用户单击信息输入框中的任意位置后,触摸传感器180K接收到触摸操作(即单击操作),相应的硬件中断被发给内核层。内核层将触摸操作加工成原始输入事件(包括触摸坐标,触摸操作的时间戳等信息)。原始输入事件被存储在内核层。应用程序框架层从内核层获取原始输入事件,识别出该输入事件所对应的应用程序为输入法应用,然后调用应用框架层的接口,启动输入法应用。进而通过调用内核层启动显示驱动,以将虚拟键盘控件和图像输入控件显示给用户;并通过调用内核层启动传感器驱动,以通过虚拟键盘控件对应的传感器获取用户输入的信息、通过图像输入控件获取用户的触摸操作。至此,电子设备100上显示信息输入界面10。The user can enter the information input interface 10 through user operations. The process of entering the information input interface through user operations will be described below in conjunction with the workflow of the software and hardware of the electronic device 100. Exemplary: After the user clicks any position in the information input box, the touch sensor 180K receives the touch operation (that is, the click operation), and the corresponding hardware interrupt is sent to the kernel layer. The kernel layer processes the touch operation into the original input event (including touch coordinates, time stamp of the touch operation, etc.). The original input events are stored in the kernel layer. The application framework layer obtains the original input event from the kernel layer, recognizes that the application corresponding to the input event is an input method application, and then calls the interface of the application framework layer to start the input method application. Then start the display driver by calling the kernel layer to display the virtual keyboard control and image input control to the user; and start the sensor driver by calling the kernel layer to obtain the information input by the user through the sensor corresponding to the virtual keyboard control and the image input control. User's touch operation. So far, the information input interface 10 is displayed on the electronic device 100.
用户可在信息输入界面10上通过用户操作进入方式选择界面20。具体的,该用户操作可以是信息输入界面10上检测到的图像输入控件103的触摸操作(如单击操作等)。如图3(a)和图3(b)所示,响应于用户对图像输入控件103的单击操作,电子设备100显示方式选择界面20。方式选择界面20可以虚拟键盘控件102,虚拟键盘控件102上包括相册应用控件201和相机应用控件202。其中:The user can enter the mode selection interface 20 through user operations on the information input interface 10. Specifically, the user operation may be a touch operation (such as a click operation, etc.) of the image input control 103 detected on the information input interface 10. As shown in FIGS. 3(a) and 3(b), in response to a user's click operation on the image input control 103, the electronic device 100 displays a mode selection interface 20. The mode selection interface 20 may be a virtual keyboard control 102, and the virtual keyboard control 102 includes an album application control 201 and a camera application control 202. in:
相册应用控件201,用于启动相册应用程序。相册应用控件的名称可以是“相册”、“图库”或“照片”等。如图3(b)所示,本应用场景中,相册应用控件的名称为“图库”。当检测到用户点击相册应用控件201时,电子设备100可显示图库界面30。The photo album application control 201 is used to start the photo album application. The name of the photo album application control can be "album", "gallery" or "photo", etc. As shown in Figure 3(b), in this application scenario, the name of the photo album application control is "Gallery". When it is detected that the user clicks on the photo album application control 201, the electronic device 100 may display the gallery interface 30.
相机应用控件202,用于启动相机应用程序。相机应用控件的名称可以是“相机”或“拍照”等。如图3(b)所示,本应用场景中,相机应用控件的名称为“相机”。当检测到用户点击相机应用控件202时,电子设备100显示拍摄界面40。The camera application control 202 is used to start a camera application. The name of the camera application control can be "camera" or "photograph", etc. As shown in Figure 3(b), in this application scenario, the name of the camera application control is "camera". When detecting that the user clicks on the camera application control 202, the electronic device 100 displays the photographing interface 40.
如图3(c)所示的图库界面30可以包括返回键301、确认键302以及多张图片303。用户可以通过用户操作从多张图片中选择需要输入的图片。示例性的,当检测到用户对图库界面30上的返回键301的点击操作时,电子设备100显示方式选择界面20。当检测到用户对图库界面30上的一张或多张图片的点击操作时,将该点击操作对应的一张或多张图片显示为被选中状态(如被选中的图片颜色变暗、或在被选中的图片上添加标记等)。当检测到用户对图库界面30上的确认键302的点击操作后,电子设备100可显示图像标签界面50。The gallery interface 30 as shown in FIG. 3(c) may include a return button 301, a confirmation button 302, and multiple pictures 303. The user can select the picture that needs to be input from multiple pictures through user operations. Exemplarily, when the user's clicking operation on the return button 301 on the gallery interface 30 is detected, the electronic device 100 displays the mode selection interface 20. When a user's click operation on one or more pictures on the gallery interface 30 is detected, the one or more pictures corresponding to the click operation are displayed as selected (for example, the color of the selected picture becomes darker, or the Add a mark to the selected picture, etc.). After detecting the user's click operation on the confirmation key 302 on the gallery interface 30, the electronic device 100 can display the image label interface 50.
如图3(d)所示的拍摄界面40可以包括拍摄框401、拍摄键402和返回键403。用户可以通过用户操作获取拍摄图片。示例性的,当检测到用户对拍摄界面40上的拍摄键402的点击操作时,电子设备100上的摄像头获取拍摄框401中所包含的图像,获得拍摄图片后,拍摄界面40可显示确认键。当检测到用户对拍摄界面40上的确认键的点击操作时,电子设备100可显示图像标签界面50。当检测到用户对拍摄界面40上的返回键403的点击操作时,电子设备100显示方式选择界面20。The shooting interface 40 shown in FIG. 3(d) may include a shooting frame 401, a shooting key 402, and a return key 403. The user can obtain the photographed picture through user operation. Exemplarily, when the user's click operation on the shooting button 402 on the shooting interface 40 is detected, the camera on the electronic device 100 obtains the image contained in the shooting frame 401, and after obtaining the shot picture, the shooting interface 40 may display the confirmation button . When the user's click operation on the confirmation key on the shooting interface 40 is detected, the electronic device 100 may display the image label interface 50. When the user's click operation on the return key 403 on the shooting interface 40 is detected, the electronic device 100 displays the mode selection interface 20.
如图3(e)所示的信息标签界面50可以包括信息输入框101和虚拟键盘控件102,其中,虚拟键盘控件102上包括多个信息标签501。其中,信息标签为从用户选择的图片或用户拍摄的拍摄图片中识别出的语义信息。用户可以从多个信息标签中选择至少一个信息标签作为输入信息。当检测到用户对信息标签界面50上的信息标签的点击操作时,电子设备100可显示输入结果界面60。The information label interface 50 shown in FIG. 3(e) may include an information input box 101 and a virtual keyboard control 102, wherein the virtual keyboard control 102 includes a plurality of information labels 501. Wherein, the information tag is semantic information recognized from a picture selected by the user or a photographed picture taken by the user. The user can select at least one information tag from a plurality of information tags as input information. When the user's click operation on the information label on the information label interface 50 is detected, the electronic device 100 may display the input result interface 60.
如图3(f)所示的输入结果界面60可以包括信息输入框101和输入信息601。其中,输入信息601为用户选中的信息标签。当用户选择了多个信息标签时,可以按照用户的选择顺序将选中的多个信息标签组合成输入信息。The input result interface 60 as shown in FIG. 3(f) may include an information input box 101 and input information 601. Among them, the input information 601 is an information label selected by the user. When the user selects multiple information tags, the selected multiple information tags can be combined into input information according to the user's selection order.
上述视觉输入的应用场景中,当用户在图3(c)所示的图库界面上选择图片并点击确认键后,或当用户在图3(d)所示的拍摄界面上点击确认键后,电子设备100可以利用本申请实施例提供的图像信息输入方法获取待处理图像的信息标签。In the above visual input application scenario, when the user selects a picture on the gallery interface shown in Figure 3(c) and clicks the confirm button, or when the user clicks the confirm button on the shooting interface shown in Figure 3(d), The electronic device 100 may use the image information input method provided in the embodiments of the present application to obtain the information label of the image to be processed.
上述视觉输入的应用场景中,图库中可以包括图片和视频,用户可以选择图库中的图片或视频。用户也可以通过摄像头获取拍摄图片或拍摄视频。其中,图库中的图片和拍摄图片属于静态图像,而图库中的视频和拍摄视频属于动态图像。下面将分别针对静态图像和动态图像对本申请实施例提供的图像信息输入方法进行介绍。In the application scenario of the above-mentioned visual input, the gallery may include pictures and videos, and the user can select the pictures or videos in the gallery. Users can also use the camera to capture pictures or video. Among them, the pictures and captured pictures in the gallery are static images, and the videos and captured videos in the gallery are dynamic images. The image information input method provided in the embodiments of the present application will be introduced below for static images and dynamic images respectively.
首先以视觉信息为静态图像为例,对本申请实施例提供的图像信息输入方法进行介绍。请参阅图4,图4是本申请实施例提供的图像信息输入方法的流程示意图。如图4所示,作为示例而非限定,图像信息输入方法可以包括以下步骤:First, taking the visual information as a static image as an example, the image information input method provided in the embodiment of the present application is introduced. Please refer to FIG. 4, which is a schematic flowchart of an image information input method provided by an embodiment of the present application. As shown in FIG. 4, as an example and not a limitation, the image information input method may include the following steps:
S401,获取待处理图像。S401: Acquire an image to be processed.
本申请实施例中,电子设备100可以直接获取用户输入的图片(即用户从图库中选择的图片或通过相机应用程序获得的拍摄图片),并将用户输入的图片记为待处理图像。电子设备100还可以从用户输入的图片中获取图像数据,并将图像数据作为待处理图像。例如:电子设备获取图片的像素值信息,将像素值信息转换成位图,将位图记为待处理图像。In the embodiment of the present application, the electronic device 100 may directly obtain a picture input by the user (ie, a picture selected by the user from a gallery or a photographed picture obtained through a camera application), and record the picture input by the user as an image to be processed. The electronic device 100 may also obtain image data from a picture input by the user, and use the image data as an image to be processed. For example, the electronic device obtains the pixel value information of the picture, converts the pixel value information into a bitmap, and records the bitmap as an image to be processed.
在一个应用场景中,待处理图像还可以为用户直接输入的图片。例如:输入法应用被启动后,电子设备100上显示信息输入界面10,信息输入界面10上可以包括图像输入框。用户可以从网页中/聊天信息中复制图片,然后将复制的图片粘贴到图像输入框中。电子设备100检测到图像输入框中的输入事件后,获取图像输入框中输入的图片,并将该图片记为待处理图像。In an application scenario, the image to be processed can also be a picture directly input by the user. For example, after the input method application is started, an information input interface 10 is displayed on the electronic device 100, and the information input interface 10 may include an image input box. Users can copy pictures from web pages/chat messages, and then paste the copied pictures into the image input box. After detecting the input event in the image input box, the electronic device 100 obtains the picture input in the image input box, and records the picture as an image to be processed.
可选的,在获取到待处理图像之后,还可以对待处理图像进行预处理,具体包括:对待处理图像进行剪裁处理。例如,将待处理图像剪裁为200×200的图像。获得剪裁后的图像后,可以将剪裁后的图像记为待处理图像。剪裁处理可以统一待处理图像的尺寸,便于后续的图像处理。Optionally, after the image to be processed is acquired, preprocessing may be performed on the image to be processed, which specifically includes: clipping the image to be processed. For example, crop the image to be processed into a 200×200 image. After the cropped image is obtained, the cropped image can be recorded as the image to be processed. The trimming process can unify the size of the image to be processed, facilitating subsequent image processing.
S402,对待处理图像进行分类处理,获得第一分类结果。S402: Perform classification processing on the image to be processed to obtain a first classification result.
在本申请实施例中,第一分类结果中包括至少一个第一类别标签。In this embodiment of the present application, the first classification result includes at least one first category label.
类别标签可以用于表示待处理图像中包含的语义信息。例如:参见图5,是本申请实施例提供的待处理图像的示意图。如图5所示的待处理图像中包含动物和车,相应的,待处理图像的第一分类结果中包含2个第一类别标签,分别为“动物”和“车”。The category label can be used to represent the semantic information contained in the image to be processed. For example: refer to FIG. 5, which is a schematic diagram of an image to be processed provided in an embodiment of the present application. The image to be processed as shown in FIG. 5 contains animals and cars. Correspondingly, the first classification result of the image to be processed includes two first category labels, namely "animal" and "car".
可选的,对待处理图像进行分类处理的一种方式可以为:获取预先训练好的分类 器;将待处理图像输入分类器进行分类,获得分类器输出的至少一个第一类别标签。Optionally, one way of classifying the image to be processed may be: obtaining a pre-trained classifier; inputting the image to be processed into the classifier for classification, and obtaining at least one first category label output by the classifier.
在训练分类器的过程中,可以获取大量的样本图像,并对每个样本图像进行人工标注,标注出样本图像的语义信息(即第一类别标签),将带有标注的样本图像输入到分类器中进行训练。然后将部分带有标注的样本图像输入到分类器中进行测试,当分类器的分类精度达到某个预设精度时,训练完成。In the process of training the classifier, a large number of sample images can be obtained, and each sample image can be manually annotated, the semantic information of the sample image (that is, the first category label) can be annotated, and the sample image with the annotation can be input into the classification Training in the device. Then input part of the labeled sample images into the classifier for testing. When the classification accuracy of the classifier reaches a certain preset accuracy, the training is completed.
其中,分类器的构造方法可以采用统计方法、机器学习方法或神经网络方法等。由于神经网络具有计算速度快、结果准确度高等优点,因此,优选的,分类器为神经网络。Among them, the construction method of the classifier can be a statistical method, a machine learning method or a neural network method. Since the neural network has the advantages of fast calculation speed and high accuracy of results, it is preferable that the classifier is a neural network.
可选的,第一分类结果中还包括每个第一类别标签对应的概率值。概率值越大,说明这个概率值对应的第一类别标签能够表示待处理图像中包含的语义信息的可能性越大。因此,可以根据概率值对第一分类结果进行初步筛选,具体的:将第一分类结果中小于预设值的概率值对应的第一类别标签删除,只保留大于或等于预设值的概率值对应的第一类别标签。这样,相当于排除了一些可能性较小的语义信息。Optionally, the first classification result further includes a probability value corresponding to each first category label. The greater the probability value, the greater the probability that the first category label corresponding to this probability value can represent the semantic information contained in the image to be processed. Therefore, the first classification result can be preliminarily screened based on the probability value. Specifically: delete the first category label corresponding to the probability value less than the preset value in the first classification result, and only retain the probability value greater than or equal to the preset value The corresponding first category label. In this way, it is equivalent to excluding some less likely semantic information.
利用训练后的分类器对待处理图像进行分类处理,可以提高分类处理的效率。并且由于训练后的分类器的分类精度较高,因此,利用分类器获得的第一分类结果的准确度也较高。Using the trained classifier to classify the image to be processed can improve the efficiency of the classification process. And since the classification accuracy of the trained classifier is high, the accuracy of the first classification result obtained by using the classifier is also high.
本申请实施例中,上述对待处理图像进行分类处理的过程,实际是对待处理图像进行粗分类的过程。粗分类是相较于细分类而言的。在图像处理领域,对图像分类的细化程度越高,粒度级越小;反之,对图像分类的细化程度越低,粒度级越大。可见,粗分类的粒度级是大于细分类的粒度级的。换言之,粗分类得到的分类结果的细化程度低于细分类得到的分类结果的细化程度。例如:如图5中的待处理图像进行粗分类后得到的分类结果为“动物”和“车”,对该图像进行细分类得到的分类结果为“狗”和“汽车”。“车”这个类别中包含了“汽车”,“动物”这个类别中包含了“狗”。In the embodiment of the present application, the foregoing process of classifying the image to be processed is actually a process of roughly classifying the image to be processed. Rough classification is compared to fine classification. In the field of image processing, the higher the degree of refinement of image classification, the smaller the granularity; on the contrary, the lower the degree of refinement of image classification, the greater the granularity. It can be seen that the granularity of coarse classification is larger than that of fine classification. In other words, the degree of refinement of the classification result obtained by the rough classification is lower than the degree of refinement of the classification result obtained by the fine classification. For example: the image to be processed in Figure 5 is roughly classified as "animal" and "car", and the image is classified as "dog" and "car". The "car" category includes "cars", and the "animal" category includes "dogs".
对待处理图像进行分类处理获得的第一分类结果,能够反映出待处理图像中包含的语义信息的较大范围。但是,往往较大范围的语义信息无法反映出用户想要表达的语义信息。为了使识别出的语义信息更加贴近用户想要表达的内容,可以对第一分类结果进行再分类,即对第一分类结果进行细粒度分类。具体步骤如下。The first classification result obtained by classifying the image to be processed can reflect a larger range of semantic information contained in the image to be processed. However, often a larger range of semantic information cannot reflect the semantic information that users want to express. In order to make the identified semantic information closer to the content that the user wants to express, the first classification result can be reclassified, that is, the first classification result can be fine-grained classification. Specific steps are as follows.
S403,根据第一分类结果选择对应的分类模型,将待处理图像输入分类模型,获得分类模型输出的第二分类结果。S403: Select a corresponding classification model according to the first classification result, input the to-be-processed image into the classification model, and obtain a second classification result output by the classification model.
相当于对第一分类结果进行细粒度分类。示例性的,假设第一分类结果为人,对应的第二分类结果可以为性别、人名等。假设第一分类结果为二维码,对应的第二分类结果可以为二维码对应的文字信息、图片信息或网络地址信息等。假设第一分类结果为植物,对应的第二分类结果可以为植物的名称、植物的种类等。It is equivalent to fine-grained classification of the first classification result. Exemplarily, assuming that the first classification result is a person, the corresponding second classification result may be gender, name, etc. Assuming that the first classification result is a two-dimensional code, the corresponding second classification result may be text information, picture information, or network address information corresponding to the two-dimensional code. Assuming that the first classification result is a plant, the corresponding second classification result may be the name of the plant, the type of the plant, and so on.
根据第一分类结果选择对应的分类模型的一种方式为:选择第一分类结果中每个第一类别标签对应的分类模型。例如:继续图5中的示例,图5中的待处理图像得到第一分类结果中包括2个第一类别标签,分别为“动物”和“车”。获取第一类别标签“动物”对应的人物分类模型,获取第一类别标签“车”对应的车辆分类模型。One way of selecting the corresponding classification model according to the first classification result is: selecting the classification model corresponding to each first category label in the first classification result. For example: Continuing the example in Fig. 5, the first classification result obtained from the image to be processed in Fig. 5 includes two first category labels, namely "animal" and "car". The person classification model corresponding to the first category label "animal" is acquired, and the vehicle classification model corresponding to the first category label "car" is acquired.
每个第一类别标签对应的分类模型都可以是预先训练好的。这样,在对待处理图像进行识别时,既可以节约识别的时间,又可以保证识别的准确性。The classification model corresponding to each first category label may be pre-trained. In this way, when recognizing the image to be processed, the recognition time can be saved and the recognition accuracy can be ensured.
另外,每个第一类别标签可以对应至少一个分类模型,不同的分类模型输出的分类结果不同。因此,第二分类结果中可以包含至少一个第二类别标签。第二类别标签的类别范围小于第一类别标签的类别范围。In addition, each first category label may correspond to at least one classification model, and different classification models output different classification results. Therefore, the second classification result may include at least one second category label. The category range of the second category label is smaller than the category range of the first category label.
当一个第一类别标签只对应一个分类模型时,该分类模型可以为多标签分类模型,其输出结果可以包括多个类别标签。示例性的:图5中所示的待处理图像的第一类别标签“车”对应的分类模型为车辆模型,车辆模型既可以识别出车的种类,又可以识别出车的品牌、或关于车的其他信息。例如,利用车辆模型得到的第二分类结果中包括2个第二类别标签,分别为“汽车”和“A品牌”。When a first category label corresponds to only one classification model, the classification model may be a multi-label classification model, and its output result may include multiple category labels. Exemplary: The classification model corresponding to the first category label "car" of the image to be processed shown in FIG. Other information. For example, the second classification result obtained by using the vehicle model includes two second category labels, namely "car" and "brand A".
当一个第一类别标签对应多个分类模型时,每个分类模型可以为单标签分类模型,即每个分类模型只输出一个类别标签。示例性的:如图5中的所示的待处理图像的第一类别标签“车”可以对应的分类模型为品牌模型(品牌模型可以识别出车的品牌信息)和车辆类型分类模型(车辆类型模型可以识别出车的种类信息)。例如,利用品牌模型得到的一个第二类别标签为“A品牌”,利用车辆类型模型得到的一个第二类别标签为“汽车”。因此,第二分类结果中包括2个第二类别标签,分别为“汽车”和“A品牌”。When one first category label corresponds to multiple classification models, each classification model can be a single-label classification model, that is, each classification model outputs only one category label. Exemplary: the first category label "car" of the image to be processed as shown in FIG. 5 can correspond to the brand model (the brand model can identify the brand information of the car) and the vehicle type classification model (vehicle type). The model can identify the type of car information). For example, a second category label obtained by using a brand model is "brand A", and a second category label obtained by using a vehicle type model is "car". Therefore, the second classification result includes two second category labels, namely "car" and "brand A".
上述两种方式获得的结果可以是相同的,只是分类模型的数量不同,相应的,在训练分类模型时,各分类模型所采用的样本也不同。The results obtained by the above two methods can be the same, but the number of classification models is different. Correspondingly, when training the classification models, the samples used by each classification model are also different.
可选的,获得第二分类结果的一种方式为:直接将待处理图像分别输入到每个第一类别标签对应的分类模型中,获得分类模型输出的第二分类结果。Optionally, one way to obtain the second classification result is to directly input the to-be-processed image into the classification model corresponding to each first class label to obtain the second classification result output by the classification model.
由于每个第一类别标签对应的分类模型不同,而每个分类模型实际上只能够对相应的第一类别标签对应的图像进行识别。例如:如图5所示的待处理图像的两个第一类别标签中,“动物”对应的分类模型为动物模型,“车”对应的分类模型为车辆模型。其中,动物模型只能对包含“动物”的那部分图像进行识别,无法对包含“车”的那部分图像进行识别。而车辆模型只能对包含“车”的那部分图像进行识别,无法对包含“动物”的那部分图像进行识别。因此,如果将待处理图像输入到人物模型或车辆模型中,相当于是将部分无效信息输入到了分类模型中,而无效信息将会对有效信息产生干扰,进而影响分类模型的分类结果。Since the classification model corresponding to each first category label is different, each classification model can actually only recognize the image corresponding to the corresponding first category label. For example, in the two first category labels of the image to be processed as shown in FIG. 5, the classification model corresponding to "animal" is an animal model, and the classification model corresponding to "car" is a vehicle model. Among them, the animal model can only identify the part of the image that contains "animals", but cannot identify the part of the image that contains "cars." The vehicle model can only recognize the part of the image that contains the "car", and cannot recognize the part of the image that contains the "animal". Therefore, if the image to be processed is input into the character model or the vehicle model, it is equivalent to inputting part of the invalid information into the classification model, and the invalid information will interfere with the effective information, thereby affecting the classification result of the classification model.
为了解决上述问题,可选的,本申请实施例中提供了获取第二分类结果的另一种方式,可以只将待处理图像中的有效信息输入到分类模型中进行分类。具体步骤包括:In order to solve the foregoing problem, optionally, another method for obtaining the second classification result is provided in the embodiment of the present application, and only valid information in the image to be processed may be input into the classification model for classification. The specific steps include:
从待处理图像中提取出第一分类结果中每个第一类别标签对应的子图像,并获取第一分类结果中每个第一类别标签对应的分类模型;将第一分类结果中第i个第一类别标签对应的子图像输入到第i个第一类别标签对应的分类模型中,获得第i个第一类别标签的子标签,其中,i为小于或等于N的正整数,N为第一分类结果中第一类别标签的数量;将第一分类结果中每个第一类别标签的子标签作为第二分类结果。Extract the sub-image corresponding to each first category label in the first classification result from the image to be processed, and obtain the classification model corresponding to each first category label in the first classification result; The sub-image corresponding to the first category label is input into the classification model corresponding to the i-th first category label, and the sub-label of the i-th first category label is obtained, where i is a positive integer less than or equal to N, and N is the first category label. The number of labels of the first category in the classification result; the sub-label of each label of the first category in the first classification result is used as the second classification result.
示例性的,提取子图像的过程可参见图6,是本申请实施例提供的待处理图像的分割示意图。对图6中的(a)所示的待处理图像进行S402中的分类处理后,得到的第一分类结果中包含两个第一类别标签,分别为“二维码”和“文字”。从待处理图像中提取出“二维码”对应的部分(如图6中的(a)所示的虚线610所围成的部分),得到第一类别标签“二维码”对应的子图像(如图6中的(b)所示),从待处理图像中提取“文字”对应的部分(如图6中的(a)所示的虚线620所围成的部分),得到第一类别 标签“文字”对应的子图像(如图6中的(c)所示)。Exemplarily, the process of extracting sub-images can be seen in FIG. 6, which is a schematic diagram of segmentation of an image to be processed provided in an embodiment of the present application. After performing the classification processing in S402 on the image to be processed shown in (a) in FIG. 6, the first classification result obtained contains two first category labels, which are "two-dimensional code" and "text" respectively. Extract the part corresponding to the "QR code" from the image to be processed (the part enclosed by the dotted line 610 as shown in (a) in Figure 6), and obtain the sub-image corresponding to the first category label "QR code" (As shown in Figure 6(b)), extract the part corresponding to the "text" from the image to be processed (the part enclosed by the dotted line 620 as shown in Figure 6(a)) to obtain the first category The sub-image corresponding to the label "text" (as shown in (c) in Figure 6).
然后,将图6中的(b)所示的子图像输入到“二维码”对应的分类模型中进行分类(假设得到的分类结果,即“二维码”的子标签为“张三#55555555555#”),将图6中的(c)所示的子图像输入到“文字”对应的分类模型中进行分类(假设得到的分类结果,即“文字”的子标签为“姓名张三”和“电话55555555555”)。最后,将“张三#55555555555#”、“姓名张三”和“电话55555555555”作为第二分类结果。Then, input the sub-image shown in (b) in Figure 6 into the classification model corresponding to "QR code" for classification (assuming that the resulting classification result, that is, the sub-tag of "QR code" is "张三#" 55555555555#"), input the sub-image shown in (c) in Figure 6 into the classification model corresponding to "text" for classification (assuming the resulting classification result, that is, the sub-tag of "text" is "name Zhang San" And "Phone 5555555555"). Finally, take "Zhang San#55555555555#", "Name Zhang San" and "Phone 55555555555" as the second classification result.
上述进行第二次分类、获得第二分类结果的过程中,第一分类结果中每个第一类别标签各自对应一个分类模型,利用第一类别标签各自对应的分类模型对第一类别标签各自对应的子图像进行分类,即对第一分类结果进行更细粒度的分类,相当于在一个大类的基础上划分小类,从而提高了对待处理图像的语义识别的精确度。In the above process of performing the second classification and obtaining the second classification result, each first category label in the first classification result corresponds to a classification model, and the classification model corresponding to the first category label is used to correspond to the first category label. The classification of the sub-images is to perform a more fine-grained classification of the first classification result, which is equivalent to dividing small classes on the basis of a large class, thereby improving the accuracy of semantic recognition of the image to be processed.
可选的,为了进一步缩小分类结果的类别范围,可以在获得第二分类结果后,继续进行多次粒度级更低的分类。例如:在获得第二分类结果后,进行第三次分类过程。具体的,根据第二分类结果选择对应的分类模型,将待处理图像输入分类模型,获得分类模型输出的第三分类结果。第二分类结果后的每次分类的过程可以参考S403中的示例,在此不再赘述。分类的次数可以根据实际需要预先设定,在此不做具体限定。分类的次数越多,分类的粒度级越低,得到的分类结果的类别范围越小。Optionally, in order to further narrow the category range of the classification result, after the second classification result is obtained, classification with lower granularity may be continued for multiple times. For example: after obtaining the second classification result, perform the third classification process. Specifically, the corresponding classification model is selected according to the second classification result, the image to be processed is input into the classification model, and the third classification result output by the classification model is obtained. For the process of each classification after the second classification result, reference may be made to the example in S403, which will not be repeated here. The number of classifications can be preset according to actual needs, and there is no specific limitation here. The more times of classification, the lower the granularity of classification, and the smaller the range of classification results obtained.
上述S402中所采用的分类器和S403中采用的分类模型可以是分开设置的,也可以是集成设置的。下面介绍两种交互应用场景。参见图7(a)和图7(b),图7(a)和图7(b)是本申请实施例提供的用户与电子设备的交互示意图。The classifier used in S402 and the classification model used in S403 may be set separately or integrated. Two interactive application scenarios are introduced below. Referring to Fig. 7(a) and Fig. 7(b), Fig. 7(a) and Fig. 7(b) are schematic diagrams of interaction between a user and an electronic device provided by an embodiment of the present application.
如图7(a)所示,当S402中所采用的分类器和S403中采用的分类模型分开设置时,在S402中的分类器得到第一分类结果后,电子设备100可以先将第一分类结果作为待处理图像的信息标签显示给用户,以使用户从第一分类结果中选择至少一个输入标签(输入标签为第一类别标签中的任意一个标签)。当用户选择输入标签后,电子设备100响应于检测到的上述输入标签,根据输入标签选择对应的分类模型,将待处理图像输入分类模型,获得分类模型输出的第二分类结果,将第二分类结果作为待处理图像的信息标签显示给用户,以使用户从待处理图像的信息标签中选择至少一个信息标签作为输入信息。当用户选择输入信息后,电子设备100响应于检测到的上述输入信息,将输入信息显示在信息输入框中。As shown in Figure 7(a), when the classifier used in S402 and the classification model used in S403 are separately set, after the classifier in S402 obtains the first classification result, the electronic device 100 can first classify the first classification result The result is displayed to the user as the information label of the image to be processed, so that the user selects at least one input label from the first classification result (the input label is any one of the first category labels). After the user selects the input tag, the electronic device 100 responds to the detected input tag, selects the corresponding classification model according to the input tag, inputs the image to be processed into the classification model, obtains the second classification result output by the classification model, and classifies the second classification The result is displayed to the user as an information tag of the image to be processed, so that the user selects at least one information tag from the information tags of the image to be processed as input information. When the user chooses to input information, the electronic device 100 displays the input information in the information input box in response to the detected input information.
如图7(b)所示,当S401中所采用的分类器和S403中采用的分类模型集成设置时,在S402中的分类器得到第一分类结果后,电子设备100直接给根据第一分类结果选择对应的分类模型,将待处理图像输入分类模型,获得分类模型输出的第二分类结果,将第二分类结果作为待处理图像的信息标签显示给用户,以使用户从待处理图像的信息标签中选择至少一个信息标签作为输入信息。当用户选择输入信息后,电子设备100响应于检测到的上述输入信息,将输入信息显示在信息输入框中。As shown in Figure 7(b), when the classifier used in S401 and the classification model used in S403 are integrated, after the classifier in S402 obtains the first classification result, the electronic device 100 directly As a result, the corresponding classification model is selected, the image to be processed is input to the classification model, and the second classification result output by the classification model is obtained. The second classification result is displayed to the user as the information label of the image to be processed, so that the user can obtain the information of the image to be processed Select at least one information tag from the tags as input information. When the user chooses to input information, the electronic device 100 displays the input information in the information input box in response to the detected input information.
S404,将第二分类结果作为待处理图像的信息标签输入。S404: Input the second classification result as the information label of the image to be processed.
将第二分类结果作为待处理图像的信息标签输入的一种方式为:电子设备100可以将第二分类结果中的所有第二类别标签均作为待处理图像的信息标签输入到信息输入框中。但是这样,输入到信息输入框中的信息较多,且不是每个第二类别标签都能够表示用户想要表达的语义信息。One way of inputting the second classification result as the information label of the image to be processed is: the electronic device 100 can input all the second category labels in the second classification result as the information label of the image to be processed into the information input box. However, in this way, there is more information input into the information input box, and not every second category label can represent the semantic information that the user wants to express.
为了解决上述问题,可选的,第二分类结果中还包括每个第二类别标签对应的概率值,概率值越大,说明这个概率值对应的第二类别标签能够表示待处理图像中包含的语义信息的可能性越大。因此,将第二分类结果作为待处理图像的信息标签输入的另一种方式为:电子设备100可以选取第二分类结果中概率值最大的第二类别标签作为待处理图像的信息标签,并将得到的信息标签输入到信息输入框中。In order to solve the above problem, optionally, the second classification result also includes the probability value corresponding to each second category label. The greater the probability value, the second category label corresponding to this probability value can represent the image contained in the image to be processed. The greater the possibility of semantic information. Therefore, another way to input the second classification result as the information label of the image to be processed is: the electronic device 100 may select the second classification label with the largest probability value in the second classification result as the information label of the image to be processed, and The obtained information label is input into the information input box.
上述确定待处理图像的信息标签的方法相当于代替用户做了信息筛选,利用上述方法得到的信息标签往往不是用户想要表达的语义信息。因此,优选的,将第二分类结果作为待处理图像的信息标签输入的另一种方式为:如图3(a)至图3(f)实施例的应用场景中所述,电子设备100可以将第二分类结果中的各个第二类别标签显示给用户,由用户选择其中至少一个第二类别标签作为待处理图像的信息标签;然后电子设备100响应于检测到的用户选择的信息标签,并将检测到的信息标签输入到信息输入框中。The above method of determining the information label of the image to be processed is equivalent to replacing the user with information screening, and the information label obtained by the above method is often not the semantic information that the user wants to express. Therefore, preferably, another way of inputting the second classification result as the information label of the image to be processed is as follows: As described in the application scenarios of the embodiments in FIG. 3(a) to FIG. 3(f), the electronic device 100 may Display each second category label in the second classification result to the user, and the user selects at least one of the second category labels as the information label of the image to be processed; then the electronic device 100 responds to the detected information label selected by the user, and Enter the detected information label into the information input box.
可选的,也可以将第一分类结果和第二分类结果均作为待处理图像的信息标签输入。具体方法可参见上述的将第二分类结果作为待处理图像的信息标签输入的方式,在此不再赘述。Optionally, both the first classification result and the second classification result may be input as the information label of the image to be processed. For a specific method, please refer to the above-mentioned method of inputting the second classification result as the information label of the image to be processed, which will not be repeated here.
在一个实施例中,为了根据待处理图像“推测”出更多的语义信息,可选的,可以根据第一分类结果和/或第二分类结果,获取第一分类结果和/或第二分类结果对应的扩展信息,并将第二分类结果和/或扩展信息作为待处理图像的信息标签输入。In one embodiment, in order to "infer" more semantic information from the image to be processed, optionally, the first classification result and/or the second classification result can be obtained according to the first classification result and/or the second classification result The extended information corresponding to the result, and the second classification result and/or extended information are input as the information label of the image to be processed.
其中,扩展信息为与第一分类结果和/或第二分类结果相关的信息。这里的“相关”可以指,与第一分类结果和/或第二分类结果中的所有类别标签均相关的信息。例如,假设第二分类结果中的第二类别标签分别为“A品牌”和“汽车”,得到的扩展信息可以为A品牌汽车的介绍信息。这里的“相关”也可以指,与第一分类结果和/或第二分类结果中的任意一个类别标签相关的信息。例如,假设第二分类结果中的第二类别标签分别为“A品牌”和“汽车”,得到的扩展信息可以为A品牌的品牌信息和汽车的介绍信息。Wherein, the extended information is information related to the first classification result and/or the second classification result. The “relevant” here may refer to information related to all category labels in the first classification result and/or the second classification result. For example, assuming that the second category labels in the second classification result are "Brand A" and "Car" respectively, the obtained extended information may be the introduction information of a brand A car. The “relevant” here may also refer to information related to any category label in the first classification result and/or the second classification result. For example, assuming that the second category labels in the second classification result are "Brand A" and "Car" respectively, the obtained extended information may be brand information of brand A and introduction information of cars.
可选的,获取第一分类结果和/或第二分类结果对应的扩展信息的一种方式为:利用搜索引擎从互联网上搜索第一分类结果和/第二分类结果对应的扩展信息。其中,搜索引擎是指是根据用户需求,运用特定策略从互联网检索出信息,并将信息反馈给用户的一门检索技术。搜索引擎依托于多种技术,如网络爬虫技术、检索排序技术、网页处理技术、大数据处理技术、自然语言处理技术等。本申请实施例中可以利用现有的任意一种搜索引擎进行信息搜索,不做具体限定。Optionally, one way of obtaining the extended information corresponding to the first classification result and/or the second classification result is: using a search engine to search the Internet for the extended information corresponding to the first classification result and/or the second classification result. Among them, search engine refers to a retrieval technology that uses specific strategies to retrieve information from the Internet according to user needs and feeds the information back to users. Search engines rely on a variety of technologies, such as web crawler technology, search ranking technology, web page processing technology, big data processing technology, natural language processing technology, etc. In the embodiments of the present application, any existing search engine can be used to search for information, and there is no specific limitation.
利用上述方式搜索出的扩展信息通常过于繁杂,内容较多,且信息之间的相关性较差。为了获取到与类别标签所表示的语义信息相关性较强的扩展信息,可选的,本申请实施例中提供了获取第一分类结果和/或第二分类结果对应的扩展信息的另一种方式,具体包括:将所述第一分类结果和/或所述第二分类结果输入到预设的指令检测模型中,获得所述指令检测模型输出的信息查询指令;根据所述信息查询指令查询所述第一分类结果和/或所述第二分类结果对应的扩展信息。The extended information searched by the above method is usually too complicated, the content is large, and the correlation between the information is poor. In order to obtain the extended information that is strongly related to the semantic information represented by the category label, optionally, another method of obtaining the extended information corresponding to the first classification result and/or the second classification result is provided in the embodiment of this application. The method specifically includes: inputting the first classification result and/or the second classification result into a preset instruction detection model to obtain an information query instruction output by the instruction detection model; query according to the information query instruction The extended information corresponding to the first classification result and/or the second classification result.
其中,指令检测模型可以是预先训练好的。指令检测模型相当于是第一分类结果和/第二分类结果中的类别标签与信息查询指令之间的对应关系。每个类别标签可以对应一个或多个信息查询指令。信息查询指令可以包括:查询参数、类型匹配、关键词 查询、翻译等。Among them, the instruction detection model can be pre-trained. The instruction detection model is equivalent to the correspondence between the category label in the first classification result and/or the second classification result and the information query instruction. Each category label can correspond to one or more information query instructions. Information query instructions may include: query parameters, type matching, keyword query, translation, and so on.
示例性的,假设将类别标签为“A品牌汽车”输入到指令检测模型中,输出的信息查询指令为查询参数,然后电子设备可以利用搜索引擎从互联网上查询A品牌汽车的参数信息,并将查询到的参数信息作为类别标签“A品牌汽车”的扩展信息。Exemplarily, suppose that the category label "A brand car" is input into the instruction detection model, and the output information query instruction is the query parameter, and then the electronic device can use the search engine to query the parameter information of the A brand car from the Internet, and The inquired parameter information is used as the extended information of the category label "A brand car".
再假设将类别标签“二维码”输入到指令检测模型中,输出的信息查询指令为类型匹配,然后电子设备可以利用现有的匹配规则获取二维码的类型信息(如名片、公众号、网页链接等),还可以对二维码的类型信息做进一步的解析得到解析信息(如解析名片中的信息),并将二维码的类型信息和/解析信息作为类别标签“二维码”的扩展信息。Suppose that the category label "QR code" is input into the instruction detection model, and the output information query instruction is type matching, and then the electronic device can use the existing matching rules to obtain the type information of the QR code (such as business cards, official accounts, Web page link, etc.), the type information of the QR code can be further analyzed to obtain the analytical information (such as analyzing the information in the business card), and the type information and/analytic information of the QR code can be used as the category label "QR code" Extended information.
再假设将类别标签“玫瑰花”输入到指令检测模型中,输出的信息查询指令为关键词查询,然后电子设备可以利用搜索引擎从互联网上查询玫瑰花的百科信息(如关于玫瑰花的简单描述,或介绍玫瑰花的网页链接等),并将百科信息作为类别标签“玫瑰花”的扩展信息。Suppose that the category label "rose" is input into the instruction detection model, and the output information query instruction is a keyword query, and then the electronic device can use the search engine to query the encyclopedia information of the rose from the Internet (such as a simple description of the rose) , Or a link to a web page that introduces roses, etc.), and use encyclopedia information as an extension of the category tag "roses".
再假设类别标签为“beautiful”,假设用户在电子设备上的用于语音为中文,将类别标签输入到指令检测模型中,输出的信息查询指令为翻译,电子设备可以利用翻译应用程序或在互联网上查询beautiful对应的中文释义,并将中文释义作为类别标签“beautiful”的扩展信息。Suppose that the category label is "beautiful", and that the user's voice on the electronic device is Chinese, and the category label is input into the instruction detection model. The output information query instruction is a translation. The electronic device can use a translation application or on the Internet Query the Chinese definition corresponding to beautiful on the above, and use the Chinese definition as the extended information of the category label "beautiful".
上述只是信息查询指令的示例,并不用于限定信息查询指令的具体内容和功能。The foregoing are only examples of information query instructions, and are not used to limit the specific content and functions of the information query instructions.
指令检测模型可以根据实际需要预先训练获得。例如:可以搜集用户的历史搜索信息和历史输入信息,将历史搜索信息和历史输入信息作为训练数据、对指令检测模型进行训练。这样,训练后的指令检测模型能够反映出用户的信息查询习惯,利用训练后的指令检测模型可以“推测”出用户想要进行的信息查询动作,进而根据“推测”出的信息查询动作进行查询扩展信息,使得获取到的扩展信息更贴近用户想要表达的语义信息,进而提高了图像信息输入方法的智能程度。The instruction detection model can be obtained by pre-training according to actual needs. For example: the user's historical search information and historical input information can be collected, and the historical search information and historical input information can be used as training data to train the instruction detection model. In this way, the trained instruction detection model can reflect the user's information query habits, and the trained instruction detection model can "guess" the information query action that the user wants to perform, and then perform the query based on the "guessed" information query action The extended information makes the acquired extended information closer to the semantic information that the user wants to express, thereby increasing the intelligence of the image information input method.
相应的,在获取到第一分类结果和/第二分类结果的扩展信息之后,可以将第二分类结果和/或扩展信息作为待处理图像的信息标签输入。Correspondingly, after acquiring the extended information of the first classification result and/or the second classification result, the second classification result and/or the extended information can be input as the information label of the image to be processed.
在一个应用场景中,当用户在图3(c)所示的图库界面上选择图片并点击确认键后,或当用户在图3(d)所示的拍摄界面上点击确认键后,电子设备100利用本申请实施例提供的图像信息输入方法获取待处理图像的第一分类结果和第二分类结果,并将第一分类结果和/第二分类结果输入到预设的指令检测模型中,获得所述指令检测模型输出的信息查询指令之后,将信息查询指令对应的指令控件显示到扩展信息查询界面70上,以使用户从信息查询指令中选择目标指令、并点击目标指令对应的指令控件。电子设备100响应于检测到的用户点击的指令控件,根据用户点击的指令控件对应的信息查询指令查询第一分类结果和/或第二分类结果对应的扩展信息,并将第二分类结果和扩展信息作为待处理图像的信息标签显示到如图3(e)所示信息标签界面50中,以使用户从多个信息标签中选择需要输入的信息。In an application scenario, when the user selects a picture on the gallery interface shown in Figure 3(c) and clicks the confirmation button, or when the user clicks the confirmation button on the shooting interface shown in Figure 3(d), the electronic device 100 Use the image information input method provided in the embodiments of the application to obtain the first classification result and the second classification result of the image to be processed, and input the first classification result and/or the second classification result into the preset instruction detection model to obtain After the information query instruction output by the instruction detection model, the instruction control corresponding to the information query instruction is displayed on the extended information query interface 70, so that the user selects the target instruction from the information query instruction and clicks the instruction control corresponding to the target instruction. In response to the detected instruction control clicked by the user, the electronic device 100 queries the first classification result and/or the extended information corresponding to the second classification result according to the information query instruction corresponding to the instruction control clicked by the user, and compares the second classification result with the extension The information is displayed as the information label of the image to be processed in the information label interface 50 as shown in FIG. 3(e), so that the user can select the information to be input from a plurality of information labels.
图4所述实施例中,以视觉信息为静态图像为例,介绍了本申请实施例提供的图像信息输入方法。下面以视觉信息为动态图像为例,对本申请实施例提供的图像信息输入方法进行介绍。请参阅图8,是本申请又一实施例提供的图像信息输入方法的流 程示意图。如图8所示,作为示例而非限定,图像信息输入方法可以包括以下步骤:In the embodiment shown in FIG. 4, taking the visual information as a static image as an example, the image information input method provided by the embodiment of the present application is introduced. The following takes the visual information as a dynamic image as an example to introduce the image information input method provided in the embodiment of the present application. Please refer to Fig. 8, which is a schematic flowchart of an image information input method provided by another embodiment of the present application. As shown in FIG. 8, as an example and not a limitation, the image information input method may include the following steps:
S801,获取待处理视频。S801: Obtain a video to be processed.
本申请实施例中,电子设备100可以直接获取用户输入的视频(即用户从图库中选择的视频或通过相机应用程序获得的拍摄视频),并将用户输入的视频记为待处理视频。电子设备100还可以从用户输入的视频中获取视频信息,并将视频信息作为待处理视频。例如:电子设备获取视频中每一帧图像的像素值信息,将像素值信息转换成位图,将位图记为待处理视频。In the embodiment of the present application, the electronic device 100 may directly obtain the video input by the user (that is, the video selected by the user from the gallery or the captured video obtained through the camera application), and record the video input by the user as a video to be processed. The electronic device 100 may also obtain video information from a video input by the user, and use the video information as a video to be processed. For example, the electronic device obtains the pixel value information of each frame of image in the video, converts the pixel value information into a bitmap, and records the bitmap as a video to be processed.
S802,从待处理视频中提取图像信息,将图像信息中的至少一帧图片作为待处理图像。S802: Extract image information from the video to be processed, and use at least one picture in the image information as the image to be processed.
视频是由图像信息和音频信息构成的。其中,图像信息中包括至少一帧图片。Video is composed of image information and audio information. Wherein, the image information includes at least one frame of picture.
可以将图像信息中的每一帧图片分别记为待处理图像,然后对每张待处理图像分别进行分类处理。但是通常,视频中相邻的几帧图像中所包含的信息是相同的,因此,为了减少计算量,也可以对图像信息进行采样,即每隔几帧获取一张图片记为待处理图像。Each frame of picture in the image information can be recorded as a to-be-processed image, and then each to-be-processed image is classified and processed separately. However, usually, the information contained in the adjacent frames of the video is the same. Therefore, in order to reduce the amount of calculation, the image information can also be sampled, that is, a picture is obtained every few frames and recorded as the image to be processed.
S803,对待处理图像进行分类处理,获得第一分类结果。S803: Perform classification processing on the image to be processed to obtain a first classification result.
S804,根据第一分类结果选择对应的分类模型,将待处理图像输入分类模型,获得分类模型输出的第二分类结果。S804: Select a corresponding classification model according to the first classification result, input the to-be-processed image into the classification model, and obtain a second classification result output by the classification model.
步骤S803-S804与图4实施例中的步骤S402-S403相同,具体可参见步骤S402-S403中的描述,在此不再赘述。Steps S803-S804 are the same as steps S402-S403 in the embodiment of FIG. 4, and for details, please refer to the description of steps S402-S403, which will not be repeated here.
由于视频是由图像信息和音频信息构成的,图像信息和音频信息中均包括语义信息。因此,不仅要获取图像信息中包含的语义信息,还要获取音频信息中包含的语义信息。所以,在S801获取待处理视频之后,还包括:Since video is composed of image information and audio information, both image information and audio information include semantic information. Therefore, it is necessary to obtain not only the semantic information contained in the image information, but also the semantic information contained in the audio information. Therefore, after obtaining the to-be-processed video in S801, it also includes:
S805,从待处理视频中提取音频信息,对音频信息进行语音识别处理,获得音频信息的信息标签。S805: Extract audio information from the video to be processed, perform voice recognition processing on the audio information, and obtain an information tag of the audio information.
可以利用现有的语音识别(ASR,automatic speech recognition)技术对音频信息进行识别,获得音频信息包含的文字信息,将识别出的文字信息作为音频信息的信息标签。可以将识别出的完整的语句作为音频信息的标签;也可以根据语法特征,从识别出的完整的语句中提取出具有语法意义的关键词,将关键词作为音频信息的标签。例如:识别出的完整的语句为“A品牌的汽车是一款当下流行的汽车”,从这个语句中提取出的具有语法意义的关键词为“A品牌汽车”,而忽略掉不具有语法意义的词,如介词、助词等。The existing automatic speech recognition (ASR) technology can be used to recognize audio information, obtain text information contained in the audio information, and use the recognized text information as the information label of the audio information. The recognized complete sentence can be used as the label of the audio information; it is also possible to extract keywords with grammatical meaning from the recognized complete sentence according to the grammatical characteristics, and use the keyword as the label of the audio information. For example: the recognized complete sentence is "A brand car is a popular car at the moment", and the grammatically meaningful keyword extracted from this sentence is "A brand car", but ignores that it has no grammatical meaning Words, such as prepositions, auxiliary words, etc.
上述的步骤S803-S804是获取图像信息中包含的语义信息的过程,步骤S905是获取音频信息中包含的语义信息的过程。这两个过程可以是并行处理的,也可以是先后依次处理的,在此不做具体限定。The above steps S803-S804 are the process of obtaining the semantic information contained in the image information, and step S905 is the process of obtaining the semantic information contained in the audio information. These two processes can be processed in parallel, or processed one after the other, which is not specifically limited here.
在获得待处理图像的第二分类结果,且获得音频信息的信息标签之后,还包括:After the second classification result of the image to be processed is obtained, and the information label of the audio information is obtained, the method further includes:
S806,将音频信息的信息标签和第二分类结果作为待处理图像的信息标签输入。S806: Input the information label of the audio information and the second classification result as the information label of the image to be processed.
由于第二分类结果中可能包括多个第二类别标签,这些第二类别标签中可能存在相同的第二类别标签,即重复的语义信息。为了避免重复输入相同的语义信息,可以先对第二分类结果进行去重处理。具体步骤可以包括:Since the second classification result may include multiple second category labels, the same second category labels may exist in these second category labels, that is, duplicate semantic information. In order to avoid repeatedly inputting the same semantic information, the second classification result can be deduplicated first. Specific steps can include:
若第二分类结果中存在相同的第二类别标签,则对第二分类结果进行去重处理;将音频信息的信息标签和去重处理后的所述第二分类结果作为待处理图像的信息标签输入。If the same second category label exists in the second classification result, the second classification result is deduplicated; the information label of the audio information and the second classification result after deduplication are used as the information label of the image to be processed enter.
其中,去重处理是指,只保留相同的第二类别标签中的任意一个第二类别标签。示例性的,假设第二分类结果中有3个第二类别标签,分别为“车”、“汽车”和“汽车”。其中“汽车”有两个,只保留其中的一个,最后得到的去重后的第二分类结果中包括2个第二类别标签,分别为“车”和“汽车”。Wherein, the de-duplication processing means that only any one second category label in the same second category label is retained. Exemplarily, suppose that there are three second category labels in the second classification result, namely "car", "car" and "car". Among them, there are two "cars", and only one of them is retained. The final second classification result after deduplication includes two second category labels, namely "car" and "car".
去重处理还可以指,只保留语义相同的第二类别标签中的任意一个第二类别标签。示例性的,继续图6中的示例,对图6中的待处理图像识别出的第二分类结果中包括3个第二类别标签,分别为“张三#55555555555#”、“姓名张三”和“电话55555555555”。其中,“姓名张三”和“张三#55555555555#”这两个第二类别标签中都包含了姓名为张三的语义信息,可以只保留两者中的一个;“电话55555555555”和“张三#55555555555#”这两个第二类别标签中都包含了电话号码为55555555555的语义信息,可以只保留两者中的一个。所以,去重处理后的第二分类结果可以只包含一个第二类别标签“张三#55555555555#”,还可以包括2个第二类别标签“姓名张三”和“电话55555555555”。The de-duplication processing can also mean that only any one of the second-category tags with the same semantics is retained. Exemplarily, continuing the example in Fig. 6, the second classification result recognized for the image to be processed in Fig. 6 includes three second category labels, which are "Zhang San#55555555555#" and "Name Zhang San". And "Phone 55555555555". Among them, "Name Zhang San" and "Zhang San#55555555555#" these two second category tags both contain the semantic information of the name Zhang San, you can keep only one of the two; "Phone 55555555555" and "Zhang Three #55555555555#" These two second category tags both contain the semantic information of the phone number 55555555555, and only one of the two can be kept. Therefore, the second classification result after deduplication processing may only include one second category label "Zhang San#55555555555#", and may also include two second category labels "Name Zhang San" and "Phone 55555555555".
去重处理后,可以将音频信息的信息标签和去重处理后的第二分类结果中的每个第二类别标签均作为待处理图像的信息标签。但是这样得到的待处理图像的信息标签较繁杂,其中可能包含有多个无效的或不能反映用户想要表达的语义信息的信息标签。After the deduplication processing, the information label of the audio information and each second category label in the second classification result after the deduplication processing can be used as the information label of the image to be processed. However, the information tags of the image to be processed obtained in this way are complicated, and they may contain multiple information tags that are invalid or that cannot reflect the semantic information that the user wants to express.
为了解决上述问题,可选的,可以将音频信息的信息标签和去重处理后的第二分类结果这两者中,能够表达相同的语义信息的部分作为待处理图像的信息标签。具体方法分以下两种情形介绍。In order to solve the foregoing problem, optionally, the part that can express the same semantic information among the information label of the audio information and the second classification result after deduplication can be used as the information label of the image to be processed. The specific methods are introduced in the following two situations.
情形一、若去重处理后的第二分类结果中存在第一目标标签。Case 1: If there is a first target label in the second classification result after de-duplication processing.
其中,第一目标标签为去重处理后的第二分类结果中与音频信息的信息标签匹配的第二类别标签。Wherein, the first target label is a second category label that matches the information label of the audio information in the second classification result after deduplication processing.
此种情形下,说明音频信息的信息标签和图像信息的信息标签(即去重后的第二分类结果)这两者中,存在能够表达相同的语义信息的部分(即第一目标标签)。因此,将第一目标标签作为待处理图像的信息标签输入。In this case, it is explained that the information label of the audio information and the information label of the image information (that is, the second classification result after deduplication), there is a part (that is, the first target label) that can express the same semantic information. Therefore, the first target tag is input as the information tag of the image to be processed.
这里的“匹配”可以指相同,也可以指能够表达相同的语义信息。例如:音频信息的信息标签中包括“A品牌汽车”,去重处理后的第二分类结果中的第二类别标签中包括“A品牌汽车”,两个标签相同,则“A品牌汽车”为第一目标标签。再例如:音频信息的信息标签中包括“玫瑰花”,去重处理后的第二分类结果中的第二类别标签中包括“玫瑰”,由于玫瑰和玫瑰花表示相同的语义信息,因此两个标签相匹配,则将第二类别标签“玫瑰”记为第一目标标签。Here, "matching" can mean the same, or it can mean the same semantic information can be expressed. For example: the information tag of the audio information includes "brand A car", the second category tag in the second classification result after deduplication processing includes "brand A car", if the two tags are the same, then "brand A car" is The first target label. For another example: the information tag of the audio information includes "rose", and the second category tag in the second classification result after deduplication processing includes "rose". Because rose and rose represent the same semantic information, the two If the tags match, the second category tag "rose" is recorded as the first target tag.
情形二、若去重处理后的第二分类结果中不存在第一目标标签。Case 2: If there is no first target label in the second classification result after de-duplication processing.
此种情形下,说明音频信息的信息标签和图像信息的信息标签(去重处理后的第二分类结果)这两者中,不存在能够表达相同的语义信息的部分(即第一目标标签)。In this case, the information label of the audio information and the information label of the image information (the second classification result after de-duplication processing), there is no part that can express the same semantic information (that is, the first target label) .
此种情形下,可以搜索去重处理后的第二分类结果中第二类别标签的扩展信息,直到搜索出第二目标标签,并将第二目标标签作为待处理图像的信息标签输入。In this case, the extended information of the second category label in the second classification result after deduplication can be searched until the second target label is searched out, and the second target label is input as the information label of the image to be processed.
其中,第二目标标签为与音频信息的信息标签匹配的第二类别标签的扩展信息。The second target tag is the extended information of the second category tag that matches the information tag of the audio information.
示例性的,假设音频信息的信息标签包括“XX公众号”,去重处理后的第二分类结果的第二类别标签中包括“二维码”,搜索出的“二维码”的扩展信息有“XX公众号”和“XX公司”,其中,“XX公众号”与音频信息的信息标签中的“XX公众号”相同,则将“XX公众号”记为第二目标标签。Exemplarily, assuming that the information tag of the audio information includes "XX Official Account", the second category tag of the second classification result after deduplication processing includes "QR code", and the searched out extension information of "QR code" There are "XX Official Account" and "XX Company", where the "XX Official Account" is the same as the "XX Official Account" in the information tag of the audio information, and the "XX Official Account" is recorded as the second target tag.
通过搜索第二分类结果中第二类别标签的扩展信息,可以找到音频信息的信息标签和图像信息的信息标签两者中能够表达相同的语义信息的部分。其中,搜索第二分类结果中的第二类别标签的扩展信息的方法,可以参考步骤S404的一个实施例中“获取第一分类结果和/或第二分类结果对应的扩展信息”的描述,在此不再赘述。By searching for the extended information of the second category label in the second classification result, it is possible to find the part that can express the same semantic information in both the information label of the audio information and the information label of the image information. For the method of searching for the extended information of the second category label in the second classification result, refer to the description of “obtain the first classification result and/or the extended information corresponding to the second classification result” in an embodiment of step S404. This will not be repeated here.
通过上述方法,将第二分类结果与音频信息的信息标签相匹配的部分作为待处理图像的信息标签输入,即提取出图像和音频中包含的相同或相似的语义信息,相当于对识别出的语义信息进行了一次筛选。从而使得待处理图像的信息标签更加贴近用户想要表达的语义信息,进而可以提高视觉输入信息的准确性。Through the above method, the part of the second classification result that matches the information label of the audio information is input as the information label of the image to be processed, that is, the same or similar semantic information contained in the image and audio is extracted, which is equivalent to the identification of the Semantic information was screened once. As a result, the information label of the image to be processed is closer to the semantic information that the user wants to express, and the accuracy of the visual input information can be improved.
根据上述方法获得待处理图像的信息标签后,即获得第一目标标签或第二目标标签后,可以直接将第一目标标签或第二目标标签作为待处理图像的信息标签输入到信息输入框中。After obtaining the information label of the image to be processed according to the above method, that is, after obtaining the first target label or the second target label, the first target label or the second target label can be directly input into the information input box as the information label of the image to be processed .
可选的,也可以将第一目标标签或第二目标标签作为待处理图像的信息标签中的首推标签显示给用户。Optionally, the first target tag or the second target tag may also be displayed to the user as the first tag in the information tags of the image to be processed.
换言之,待处理图像的信息标签中可以包括第二分类结果、音频信息的信息标签、以及第一目标标签/第二目标标签。但是,其中的第一目标标签/第二目标标签作为首推标签。In other words, the information label of the image to be processed may include the second classification result, the information label of the audio information, and the first target label/the second target label. However, the first target label/the second target label among them are regarded as the first-preferred label.
其中,首推标签指待处理图像的信息标签中明显区别于其他信息标签的标签。例如:将待处理图像的信息标签按照序列的形式显示给用户,首推标签位于序列的起始位置。再例如:将首推标签的字体颜色与非首推标签的字体颜色进行区分(如首推标签的字体颜色为红色,非首推标签的字体颜色为黑色),以使用户能够在待处理图像的信息标签中较快地注意到首推标签。Among them, the first push label refers to a label that is obviously different from other information labels among the information labels of the image to be processed. For example: the information tags of the image to be processed are displayed to the user in the form of a sequence, with the first push tag at the beginning of the sequence. Another example: distinguish the font color of the first-preferred label from the font color of the non-preferred label (for example, the font color of the first-preferred label is red, and the font color of the non-preferred label is black), so that the user can display the image to be processed Notice the top-preferred tag quickly in the information tag of.
对应于上文实施例所述的图像信息输入方法,图9是本申请实施例提供的图像信息输入装置的结构框图,为了便于说明,仅示出了与本申请实施例相关的部分。Corresponding to the image information input method described in the above embodiment, FIG. 9 is a structural block diagram of an image information input device provided in an embodiment of the present application. For ease of description, only parts related to the embodiment of the present application are shown.
参见图9,该装置包括:Referring to Figure 9, the device includes:
图像获取单元91,用于获取待处理图像。The image acquisition unit 91 is used to acquire an image to be processed.
第一分类单元92,用于对所述待处理图像进行分类处理,获得第一分类结果。The first classification unit 92 is configured to perform classification processing on the to-be-processed image to obtain a first classification result.
第二分类单元93,用于根据所述第一分类结果选择对应的分类模型,将所述待处理图像输入所述分类模型,获得所述分类模型输出的第二分类结果。The second classification unit 93 is configured to select a corresponding classification model according to the first classification result, input the to-be-processed image into the classification model, and obtain a second classification result output by the classification model.
信息输入单元94,用于将所述第二分类结果作为所述待处理图像的信息标签输入。The information input unit 94 is configured to input the second classification result as the information label of the image to be processed.
示例性的,以Android平台为例,介绍图像信息输入装置的工作过程。Exemplarily, taking the Android platform as an example, the working process of the image information input device is introduced.
当通过相机应用获取拍摄图片时,图像获取单元91首先启动相机应用程序,然后注册拍照按键和/或对焦回调函数,以获取相机应用程序拍摄到的图像数据,之后将图像数据转换成位图,并将该位图记为待处理图像。When acquiring a captured picture through the camera application, the image acquisition unit 91 first starts the camera application, then registers the camera button and/or focus callback function to acquire the image data captured by the camera application, and then converts the image data into a bitmap. And record the bitmap as the image to be processed.
当从图库中获取图片时,图像获取单元91首先启动相册应用程序,然后注册选择图片的回调函数,以获取选择的图片的数据,之后将选择的图片的数据转换成位图, 并将该位图记为待处理图像。When acquiring a picture from the gallery, the image acquisition unit 91 first starts the photo album application, and then registers the callback function of the selected picture to acquire the data of the selected picture, and then converts the data of the selected picture into a bitmap, and then transfers the bitmap to the selected picture. The image is marked as the image to be processed.
在图像获取单元91获取到待处理图像之后,图像获取单元91将待处理图像(即位图)以参数的方式通过对焦回调函数传递给第一分类单元92。第一分类单元92注册第一分类结果回调函数。After the image acquiring unit 91 acquires the image to be processed, the image acquiring unit 91 passes the image to be processed (ie, bitmap) to the first classification unit 92 through a focus callback function in a parameter manner. The first classification unit 92 registers the first classification result callback function.
第一分类单元92将第一分类结果通过第一分类结果回调函数传递给第二分类单元103。第二分类单元93注册第二分类结果回调函数。The first classification unit 92 transmits the first classification result to the second classification unit 103 through the first classification result callback function. The second classification unit 93 registers the second classification result callback function.
第二分类单元93将第二分类结果通过第二分类结果回调函数传递给信息输入单元104。信息输入单元94将待处理图像的信息标签插入到当前界面的光标处。The second classification unit 93 transmits the second classification result to the information input unit 104 through the second classification result callback function. The information input unit 94 inserts the information tag of the image to be processed into the cursor on the current interface.
可选的,第一分类结果中包括至少一个第一类别标签。Optionally, the first classification result includes at least one first category label.
可选的,第二分类单元93还用于:Optionally, the second classification unit 93 is also used for:
从所述待处理图像中提取出所述第一分类结果中每个所述第一类别标签对应的子图像,并获取所述第一分类结果中每个所述第一类别标签对应的分类模型;将所述第一分类结果中第i个第一类别标签对应的子图像输入到所述第i个第一类别标签对应的分类模型中,获得所述第i个第一类别标签的子标签,其中,所述i为小于或等于N的正整数,N为所述第一分类结果中第一类别标签的数量;将所述第一分类结果中每个所述第一类别标签的子标签作为所述第二分类结果。Extract the sub-image corresponding to each of the first category labels in the first classification result from the to-be-processed image, and obtain the classification model corresponding to each of the first category labels in the first classification result ; Input the sub-image corresponding to the i-th first category label in the first classification result into the classification model corresponding to the i-th first category label to obtain the sub-label of the i-th first category label , Wherein the i is a positive integer less than or equal to N, and N is the number of the first category label in the first classification result; the subtags of each first category label in the first classification result As the second classification result.
可选的,装置9还包括:Optionally, the device 9 further includes:
扩展信息获取单元,用于将所述待处理图像输入所述分类模型,获得所述分类模型输出的第二分类结果之后,根据所述第一分类结果和/或所述第二分类结果,获取所述第一分类结果和/或所述第二分类结果对应的扩展信息,其中,所述扩展信息为与所述第一分类结果和/或所述第二分类结果相关的信息。The extended information acquiring unit is configured to input the to-be-processed image into the classification model, and after obtaining the second classification result output by the classification model, acquire according to the first classification result and/or the second classification result The extended information corresponding to the first classification result and/or the second classification result, wherein the extended information is information related to the first classification result and/or the second classification result.
相应的,信息输入单元94还用于,将所述第二分类结果和/或所述扩展信息作为所述待处理图像的信息标签输入。Correspondingly, the information input unit 94 is further configured to input the second classification result and/or the extended information as the information label of the image to be processed.
可选的,扩展信息获取单元还用于:Optionally, the extended information acquiring unit is also used to:
将所述第一分类结果和/或所述第二分类结果输入到预设的指令检测模型中,获得所述指令检测模型输出的信息查询指令;根据所述信息查询指令查询所述第一分类结果和/或所述第二分类结果对应的扩展信息。Input the first classification result and/or the second classification result into a preset instruction detection model to obtain an information query instruction output by the instruction detection model; query the first category according to the information query instruction The result and/or the extended information corresponding to the second classification result.
可选的,图像获取单元91包括:Optionally, the image acquisition unit 91 includes:
图像信息获取模块,用于获取待处理视频,并从所述待处理视频中提取图像信息,其中,所述图像信息中包括至少一帧图片;将所述图像信息中的至少一帧图片作为所述待处理图像。The image information acquisition module is used to acquire a video to be processed and extract image information from the video to be processed, wherein the image information includes at least one frame of pictures; and at least one frame of the image in the image information is used as the Describe the image to be processed.
可选的,图像获取单元91还包括:Optionally, the image acquisition unit 91 further includes:
音频信息获取模块,用于在所述获取待处理视频之后,从所述待处理视频中提取音频信息;对所述音频信息进行语音识别处理,获得所述音频信息的信息标签。The audio information acquisition module is configured to extract audio information from the to-be-processed video after the acquisition of the to-be-processed video; perform voice recognition processing on the audio information to obtain the information tag of the audio information.
相应的,信息输入单元94还用于,将所述音频信息的信息标签和所述第二分类结果作为所述待处理图像的信息标签输入。Correspondingly, the information input unit 94 is further configured to input the information label of the audio information and the second classification result as the information label of the image to be processed.
可选的,所述第二分类结果中包括至少一个第二类别标签。Optionally, the second classification result includes at least one second category label.
可选的,信息输入单元94还用于:Optionally, the information input unit 94 is also used to:
若所述第二分类结果中存在相同的第二类别标签,则对所述第二分类结果进行去 重处理;将所述音频信息的信息标签和去重处理后的所述第二分类结果作为所述待处理图像的信息标签输入。If the same second category label exists in the second classification result, the second classification result is deduplicated; the information label of the audio information and the second classification result after the deduplication are used as The information tag input of the image to be processed.
可选的,信息输入单元94还用于:Optionally, the information input unit 94 is also used to:
若去重处理后的所述第二分类结果中存在第一目标标签,则将所述第一目标标签作为所述待处理图像的信息标签输入,其中,所述第一目标标签为去重处理后的所述第二分类结果中与所述音频信息的信息标签匹配的所述第二类别标签;If there is a first target tag in the second classification result after deduplication processing, then the first target tag is input as the information tag of the image to be processed, where the first target tag is deduplication processing The second category label that matches the information label of the audio information in the subsequent second classification result;
若去重处理后的所述第二分类结果中不存在所述第一目标标签,则搜索去重处理后的所述第二分类结果中所述第二类别标签的扩展信息,直到搜索出第二目标标签,并将所述第二目标标签作为所述待处理图像的信息标签输入;其中,所述第二目标标签为与所述音频信息的信息标签匹配的所述第二类别标签的扩展信息。If the first target tag does not exist in the second classification result after deduplication processing, search for the extended information of the second category tag in the second classification result after deduplication processing until the first target tag is searched out. Two target tags, and the second target tag is input as the information tag of the image to be processed; wherein, the second target tag is an extension of the second category tag that matches the information tag of the audio information information.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and conciseness of the description, only the division of the above-mentioned functional units and modules is used as an example for illustration. In practical applications, the above-mentioned functions can be allocated to different functional units and modules as required. Module completion, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated in one processing unit, or each unit can exist alone physically, or two or more units can be integrated in one unit. The above-mentioned integrated units can be hardware-based Formal realization can also be realized in the form of a software functional unit. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the foregoing system, reference may be made to the corresponding process in the foregoing method embodiment, which will not be repeated here.
本申请实施例还提供了一种计算机可读存储介质,包括计算机指令,当所述计算机指令在计算机或处理器上运行时,使得所述计算机或处理器执行如上述各个图像信息输入方法实施例中的步骤。The embodiments of the present application also provide a computer-readable storage medium, including computer instructions, which when the computer instructions run on a computer or a processor, cause the computer or the processor to execute each of the above-mentioned image information input method embodiments Steps in.
本申请实施例提供了一种计算机程序产品,当计算机程序产品在计算机或处理器上运行时,使得计算机或处理器执行时实现可实现上述各个图像信息输入方法实施例中的步骤。The embodiments of the present application provide a computer program product. When the computer program product runs on a computer or a processor, the computer or the processor realizes the steps in the foregoing image information input method embodiments when executed.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者通过所述计算机可读存储介质进行传输。所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如,固态硬盘(solid state disk,SSD))等。In the foregoing embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium or transmitted through the computer-readable storage medium. The computer instructions can be sent from one website site, computer, server, or data center to another website site, computer, Server or data center for transmission. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)), etc.
本申请实施例还提供了一种芯片系统,其特征在于,所述芯片系统包括处理器,所述处理器与存储器耦合,所述处理器执行存储器中存储的计算机程序,以实现上述 各个图像信息输入方法实施例中的步骤。所述芯片系统可以为单个芯片,或者多个芯片组成的芯片模组。An embodiment of the present application further provides a chip system, wherein the chip system includes a processor, the processor is coupled with a memory, and the processor executes a computer program stored in the memory to realize the above-mentioned image information. Enter the steps in the method embodiment. The chip system may be a single chip or a chip module composed of multiple chips.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail or recorded in an embodiment, reference may be made to related descriptions of other embodiments.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及方法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。A person of ordinary skill in the art may realize that the units and method steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
最后应说明的是:以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。Finally, it should be noted that the above are only specific implementations of this application, but the scope of protection of this application is not limited to this. Any changes or substitutions within the technical scope disclosed in this application shall be covered by this application. Within the scope of protection applied for. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (10)

  1. 一种图像信息输入方法,其特征在于,包括:An image information input method, characterized in that it comprises:
    获取待处理图像;Obtain the image to be processed;
    对所述待处理图像进行分类处理,获得第一分类结果;Perform classification processing on the to-be-processed image to obtain a first classification result;
    根据所述第一分类结果选择对应的分类模型,将所述待处理图像输入所述分类模型,获得所述分类模型输出的第二分类结果;Selecting a corresponding classification model according to the first classification result, inputting the to-be-processed image into the classification model, and obtaining a second classification result output by the classification model;
    将所述第二分类结果作为所述待处理图像的信息标签输入。The second classification result is input as the information label of the image to be processed.
  2. 根据权利要求1所述的图像信息输入方法,其特征在于,所述第一分类结果中包括至少一个第一类别标签;The image information input method according to claim 1, wherein the first classification result includes at least one first category label;
    所述根据所述第一分类结果选择对应的分类模型,将所述待处理图像输入所述分类模型,获得所述分类模型输出的第二分类结果,包括:The selecting a corresponding classification model according to the first classification result, inputting the to-be-processed image into the classification model, and obtaining the second classification result output by the classification model includes:
    从所述待处理图像中提取出所述第一分类结果中每个所述第一类别标签对应的子图像,并获取所述第一分类结果中每个所述第一类别标签对应的分类模型;Extract the sub-image corresponding to each of the first category labels in the first classification result from the to-be-processed image, and obtain the classification model corresponding to each of the first category labels in the first classification result ;
    将所述第一分类结果中第i个第一类别标签对应的子图像输入到所述第i个第一类别标签对应的分类模型中,获得所述第i个第一类别标签的子标签,其中,i为小于或等于N的正整数,N为所述第一分类结果中第一类别标签的数量;Input the sub-image corresponding to the i-th first category label in the first classification result into the classification model corresponding to the i-th first category label to obtain the sub-label of the i-th first category label, Wherein, i is a positive integer less than or equal to N, and N is the number of labels of the first category in the first classification result;
    将所述第一分类结果中每个所述第一类别标签的子标签作为所述第二分类结果。Use the sub-label of each of the first category labels in the first classification result as the second classification result.
  3. 根据权利要求1或2任一项所述的图像信息输入方法,其特征在于,在将所述待处理图像输入所述分类模型,获得所述分类模型输出的第二分类结果之后,还包括:The image information input method according to any one of claims 1 or 2, characterized in that, after inputting the to-be-processed image into the classification model and obtaining a second classification result output by the classification model, the method further comprises:
    根据所述第一分类结果和/或所述第二分类结果,获取所述第一分类结果和/或所述第二分类结果对应的扩展信息,其中,所述扩展信息为与所述第一分类结果和/或所述第二分类结果相关的信息;According to the first classification result and/or the second classification result, the extended information corresponding to the first classification result and/or the second classification result is acquired, wherein the extended information is the same as that of the first classification result. Classification result and/or information related to the second classification result;
    相应的,所述将所述第二分类结果作为所述待处理图像的信息标签输入,包括:Correspondingly, the inputting the second classification result as the information label of the image to be processed includes:
    将所述第二分类结果和/或所述扩展信息作为所述待处理图像的信息标签输入。The second classification result and/or the extended information are input as the information label of the image to be processed.
  4. 根据权利要求3所述的图像信息输入方法,其特征在于,所述获取所述第一分类结果和/或所述第二分类结果对应的扩展信息,包括:The image information input method according to claim 3, wherein said obtaining the extended information corresponding to the first classification result and/or the second classification result comprises:
    将所述第一分类结果和/或所述第二分类结果输入到预设的指令检测模型中,获得所述指令检测模型输出的信息查询指令;Inputting the first classification result and/or the second classification result into a preset instruction detection model to obtain an information query instruction output by the instruction detection model;
    根据所述信息查询指令查询所述第一分类结果和/或所述第二分类结果对应的扩展信息。Query the extended information corresponding to the first classification result and/or the second classification result according to the information query instruction.
  5. 根据权利要求1所述的图像信息输入方法,其特征在于,所述获取待处理图像,包括:The image information input method according to claim 1, wherein said acquiring the image to be processed comprises:
    获取待处理视频,并从所述待处理视频中提取图像信息,其中,所述图像信息中包括至少一帧图片;Acquiring a video to be processed, and extracting image information from the video to be processed, wherein the image information includes at least one frame of picture;
    将所述图像信息中的至少一帧图片作为所述待处理图像。Use at least one picture in the image information as the image to be processed.
  6. 根据权利要求5所述的图像信息输入方法,其特征在于,在所述获取待处理视频之后,还包括:The image information input method according to claim 5, characterized in that, after said obtaining the to-be-processed video, it further comprises:
    从所述待处理视频中提取音频信息;Extract audio information from the to-be-processed video;
    对所述音频信息进行语音识别处理,获得所述音频信息的信息标签;Performing voice recognition processing on the audio information to obtain an information tag of the audio information;
    相应的,所述将所述第二分类结果作为所述待处理图像的信息标签输入,包括:Correspondingly, the inputting the second classification result as the information label of the image to be processed includes:
    将所述音频信息的信息标签和所述第二分类结果作为所述待处理图像的信息标签输入。The information label of the audio information and the second classification result are input as the information label of the image to be processed.
  7. 根据权利要求6所述的图像信息输入方法,其特征在于,所述第二分类结果中包括至少一个第二类别标签;The image information input method according to claim 6, wherein the second classification result includes at least one second category label;
    所述将所述音频信息的信息标签和所述第二分类结果作为所述待处理图像的信息标签输入,包括:The inputting the information label of the audio information and the second classification result as the information label of the image to be processed includes:
    若所述第二分类结果中存在相同的第二类别标签,则对所述第二分类结果进行去重处理;If the same second category label exists in the second classification result, perform deduplication processing on the second classification result;
    将所述音频信息的信息标签和去重处理后的所述第二分类结果作为所述待处理图像的信息标签输入。The information label of the audio information and the second classification result after deduplication are input as the information label of the image to be processed.
  8. 根据权利要求7所述的图像信息输入方法,其特征在于,所述将所述音频信息的信息标签和去重处理后的所述第二分类结果作为所述待处理图像的信息标签输入,包括:The image information input method according to claim 7, wherein the input of the information label of the audio information and the second classification result after deduplication processing as the information label of the image to be processed comprises: :
    若去重处理后的所述第二分类结果中存在第一目标标签,则将所述第一目标标签作为所述待处理图像的信息标签输入,其中,所述第一目标标签为去重处理后的所述第二分类结果中与所述音频信息的信息标签匹配的所述第二类别标签;If there is a first target tag in the second classification result after deduplication processing, then the first target tag is input as the information tag of the image to be processed, where the first target tag is deduplication processing The second category label that matches the information label of the audio information in the subsequent second classification result;
    若去重处理后的所述第二分类结果中不存在所述第一目标标签,则搜索去重处理后的所述第二分类结果中所述第二类别标签的扩展信息,直到搜索出第二目标标签,并将所述第二目标标签作为所述待处理图像的信息标签输入;其中,所述第二目标标签为与所述音频信息的信息标签匹配的所述第二类别标签的扩展信息。If the first target tag does not exist in the second classification result after deduplication processing, search for the extended information of the second category tag in the second classification result after deduplication processing until the first target tag is searched out. Two target tags, and the second target tag is input as the information tag of the image to be processed; wherein, the second target tag is an extension of the second category tag that matches the information tag of the audio information information.
  9. 一种电子设备,其特征在于,所述电子设备包括处理器,所述处理器用于运行存储器中存储的计算机程序,以实现如权利要求1至8任一项所述的方法。An electronic device, wherein the electronic device includes a processor, and the processor is configured to run a computer program stored in a memory to implement the method according to any one of claims 1 to 8.
  10. 一种计算机存储介质,包括计算机指令,当所述计算机指令在计算机或处理器上运行时,使得所述计算机或处理器执行如权利要求1至8任一项所述的方法。A computer storage medium comprising computer instructions, which when the computer instructions run on a computer or a processor, cause the computer or the processor to execute the method according to any one of claims 1 to 8.
PCT/CN2021/083140 2020-06-24 2021-03-26 Image information input method, electronic device, and computer readable storage medium WO2021258797A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010589458.0 2020-06-24
CN202010589458.0A CN111881315A (en) 2020-06-24 2020-06-24 Image information input method, electronic device, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2021258797A1 true WO2021258797A1 (en) 2021-12-30

Family

ID=73156892

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083140 WO2021258797A1 (en) 2020-06-24 2021-03-26 Image information input method, electronic device, and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN111881315A (en)
WO (1) WO2021258797A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111881315A (en) * 2020-06-24 2020-11-03 华为技术有限公司 Image information input method, electronic device, and computer-readable storage medium
CN113159091B (en) 2021-01-20 2023-06-20 北京百度网讯科技有限公司 Data processing method, device, electronic equipment and storage medium
CN115002333B (en) * 2021-03-02 2023-09-26 华为技术有限公司 Image processing method and related device
CN113240394B (en) * 2021-05-19 2023-04-07 国网福建省电力有限公司 Electric power business hall service method based on artificial intelligence
CN115482143B (en) * 2021-06-15 2023-12-19 荣耀终端有限公司 Image data calling method and system for application, electronic equipment and storage medium
CN114943976B (en) * 2022-07-26 2022-10-11 深圳思谋信息科技有限公司 Model generation method and device, electronic equipment and storage medium
CN116757284A (en) * 2022-09-26 2023-09-15 荣耀终端有限公司 Model reasoning method, device, storage medium and program product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190130214A1 (en) * 2017-10-30 2019-05-02 Sap Se Computer vision architecture with machine learned image recognition models
CN110737801A (en) * 2019-10-14 2020-01-31 腾讯科技(深圳)有限公司 Content classification method and device, computer equipment and storage medium
CN111061898A (en) * 2019-12-13 2020-04-24 Oppo(重庆)智能科技有限公司 Image processing method, image processing device, computer equipment and storage medium
CN111198958A (en) * 2018-11-19 2020-05-26 Tcl集团股份有限公司 Method, device and terminal for matching background music
CN111881315A (en) * 2020-06-24 2020-11-03 华为技术有限公司 Image information input method, electronic device, and computer-readable storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108429816A (en) * 2018-03-27 2018-08-21 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN111178349A (en) * 2019-12-17 2020-05-19 科大讯飞股份有限公司 Image identification method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190130214A1 (en) * 2017-10-30 2019-05-02 Sap Se Computer vision architecture with machine learned image recognition models
CN111198958A (en) * 2018-11-19 2020-05-26 Tcl集团股份有限公司 Method, device and terminal for matching background music
CN110737801A (en) * 2019-10-14 2020-01-31 腾讯科技(深圳)有限公司 Content classification method and device, computer equipment and storage medium
CN111061898A (en) * 2019-12-13 2020-04-24 Oppo(重庆)智能科技有限公司 Image processing method, image processing device, computer equipment and storage medium
CN111881315A (en) * 2020-06-24 2020-11-03 华为技术有限公司 Image information input method, electronic device, and computer-readable storage medium

Also Published As

Publication number Publication date
CN111881315A (en) 2020-11-03

Similar Documents

Publication Publication Date Title
WO2021258797A1 (en) Image information input method, electronic device, and computer readable storage medium
WO2020168929A1 (en) Method for identifying specific position on specific route and electronic device
WO2020238356A1 (en) Interface display method and apparatus, terminal, and storage medium
EP4064284A1 (en) Voice detection method, prediction model training method, apparatus, device, and medium
WO2022127787A1 (en) Image display method and electronic device
WO2020019220A1 (en) Method for displaying service information in preview interface, and electronic device
WO2020119455A1 (en) Method for repeating word or sentence during video playback, and electronic device
WO2022052776A1 (en) Human-computer interaction method, and electronic device and system
WO2020259554A1 (en) Learning-based keyword search method, and electronic device
WO2021254411A1 (en) Intent recognigion method and electronic device
WO2022100221A1 (en) Retrieval processing method and apparatus, and storage medium
WO2021258814A1 (en) Video synthesis method and apparatus, electronic device, and storage medium
WO2020042112A1 (en) Terminal and method for evaluating and testing ai task supporting capability of terminal
CN110866254A (en) Vulnerability detection method and electronic equipment
CN113066048A (en) Segmentation map confidence determination method and device
CN115115679A (en) Image registration method and related equipment
WO2021238371A1 (en) Method and apparatus for generating virtual character
WO2021208677A1 (en) Eye bag detection method and device
CN114943976B (en) Model generation method and device, electronic equipment and storage medium
CN115437601A (en) Image sorting method, electronic device, program product, and medium
US20230385345A1 (en) Content recommendation method, electronic device, and server
WO2021031862A1 (en) Data processing method and apparatus thereof
WO2022179271A1 (en) Search result feedback method and device, and storage medium
WO2022111640A1 (en) Application classification method, electronic device, and chip system
WO2022143083A1 (en) Application search method and device, and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21828311

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21828311

Country of ref document: EP

Kind code of ref document: A1