WO2017054676A1 - 图像处理装置、终端和方法 - Google Patents

图像处理装置、终端和方法 Download PDF

Info

Publication number
WO2017054676A1
WO2017054676A1 PCT/CN2016/099865 CN2016099865W WO2017054676A1 WO 2017054676 A1 WO2017054676 A1 WO 2017054676A1 CN 2016099865 W CN2016099865 W CN 2016099865W WO 2017054676 A1 WO2017054676 A1 WO 2017054676A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
scene
content information
image processing
scene content
Prior art date
Application number
PCT/CN2016/099865
Other languages
English (en)
French (fr)
Inventor
戴向东
Original Assignee
努比亚技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 努比亚技术有限公司 filed Critical 努比亚技术有限公司
Publication of WO2017054676A1 publication Critical patent/WO2017054676A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Definitions

  • Embodiments of the present invention relate to, but are not limited to, the field of communication technology.
  • the amount of image data stored by mobile terminals has exploded.
  • the shooting parameter information such as image height, exposure time, number of data digits, and shooting geographic location is automatically recorded and written into the file attribute of the image.
  • Users can view the basic information of the image when viewing the image properties of the image.
  • the file attribute it cannot be understood by the file attribute, and the image file must be opened and subjectively viewed by the human eye. Therefore, the file attributes of the image in the related art contain insufficient information and comprehensive information, and the user cannot use the file attributes of the image to quickly browse or filter the image.
  • Embodiments of the present invention provide an image processing apparatus, a terminal, and a method, which can add a new attribute to an image, at least making the information contained in the file attribute of the image richer and more comprehensive.
  • An embodiment of the present invention provides an image processing apparatus, including:
  • the scene recognition module is configured to perform scene recognition on the image according to the preset scene recognition parameter, and generate scene content information of the image, where the scene content information is text information describing a scene feature of the image;
  • a write module is configured to write the scene content information into a file attribute of the image.
  • the apparatus further includes an image processing module, the image processing module being configured to process the image according to the scene content information.
  • the image processing module includes a classification unit, and the classification unit is configured to: The scene content information classifies the image.
  • the image processing module includes an annotation unit, the annotation unit being configured to generate annotation information according to the scene content information when the image is published.
  • the image processing module includes an optimization unit, and the optimization unit is configured to: perform optimization processing on the image according to the scene content information.
  • the scene recognition module is configured to perform scene recognition on the image immediately after taking an image or acquiring an image from the outside.
  • the image processing apparatus further includes a deep learning module, and the deep learning module is configured to: perform deep learning by using big data, and train a scene recognition parameter capable of distinguishing scene features of the image.
  • the classification unit is configured to analyze scene features included in the image according to the scene content information, and classify the images according to the visual content according to the features.
  • the optimization unit is configured to acquire scene content information of an image, and adopt different optimization strategies according to scene content information of the image, and optimize content of different regions in the image to different degrees;
  • the optimization includes: adjusting the color of the image to enhance the visual effect of the image.
  • the embodiment of the present invention simultaneously proposes a terminal, which includes the image processing apparatus described above.
  • the embodiment of the invention simultaneously proposes an image processing method, comprising the steps of:
  • scene recognition on the image according to the preset scene recognition parameter, and generating scene content information of the image, where the scene content information is text information describing a scene feature of the image;
  • the scene content information is written into a file attribute of the image.
  • the method further includes:
  • the image is processed according to the scene content information.
  • the processing the image according to the scene content information comprises: classifying the image according to the scene content information.
  • the processing the image according to the scene content information includes: when the image is published, generating annotation information according to the scene content information.
  • the processing the image according to the scene content information comprises: performing optimization processing on the image according to the scene content information.
  • the method further includes: performing scene recognition on the image immediately after taking an image or acquiring an image from the outside
  • the step of performing scene recognition on the image according to the preset scene recognition parameter further includes: performing deep learning by using the big data, and training the scene recognition parameter capable of distinguishing the scene feature of the image.
  • the classifying the image according to the scene content information includes:
  • the scene features included in the image are analyzed according to the scene content information, and the images are classified according to the visual content according to the features.
  • the optimizing the image according to the scenario content information includes:
  • the optimization includes: adjusting the color of the image to enhance the visual effect of the image.
  • the scene content information includes a content label of the image, a coordinate position of the pixel, and content association information.
  • An image processing apparatus generates a scene content information and writes it into a file attribute of an image by recognizing an image scene, so that the file attribute of the image includes not only the image height and the exposure time.
  • the shooting parameter information such as the number of data bits and the shooting location, and the scene content information of the image, make the file attributes of the image more rich and comprehensive.
  • the image content of the image can be directly obtained by directly viewing the file attribute of the image, so that the user can obtain richer image information according to the file attribute of the image, so that the user can quickly browse or filter the image. .
  • the image can be further processed by using the scene content information of the image.
  • using the scene content information of the image to automatically classify the image providing a new image classification method
  • using the scene content information of the image automatically generating annotation information to interpret the image while publishing the image
  • the specific content eliminating the user's manual input operation, provides users with a new image sharing experience
  • using the scene content information of the image to automatically optimize the image, making the optimization process more targeted and more accurate, enhancing the image Visual effects.
  • FIG. 1 is a schematic structural diagram of hardware of a mobile terminal that implements various embodiments of the present invention
  • FIG. 2 is a schematic diagram of a wireless communication system of the mobile terminal shown in FIG. 1;
  • FIG. 3 is a flow chart of a first embodiment of an image processing method according to the present invention.
  • FIG. 4 is a schematic diagram of scene recognition for an image according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of scene recognition for another image according to an embodiment of the present invention.
  • FIG. 6 is a flowchart of a second embodiment of an image processing method according to the present invention.
  • FIG. 7 is a schematic diagram of classification and recognition of scene content by a convolutional neural network according to an embodiment of the present invention.
  • FIG. 8 is a flowchart of a third embodiment of an image processing method according to the present invention.
  • FIG. 9 is a flowchart of a fourth embodiment of an image processing method according to the present invention.
  • FIG. 10 is a schematic block diagram of a first embodiment of an image processing apparatus according to the present invention.
  • FIG. 11 is a schematic block diagram of a second embodiment of an image processing apparatus according to the present invention.
  • FIG. 12 is a schematic block diagram of a third embodiment of an image processing apparatus according to the present invention.
  • FIG. 13 is a block diagram of the image processing module of FIG.
  • the mobile terminal can be implemented in various forms.
  • the terminal described in the embodiments of the present invention may include, for example, a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a PDA (Personal Digital Assistant), a PAD (Tablet), a PMP (Portable Multimedia Player), a navigation device Mobile terminals of the like and fixed terminals such as digital TVs, desktop computers, and the like.
  • PDA Personal Digital Assistant
  • PAD Tablett
  • PMP Portable Multimedia Player
  • FIG. 1 is a schematic diagram showing the hardware structure of a mobile terminal embodying various embodiments of the present invention.
  • the mobile terminal 100 may include a wireless communication unit 110, an A/V (Audio/Video) input unit 120, a user input unit 130, a sensing unit 140, an output unit 150, a memory 160, an interface unit 170, a controller 180, and a power supply unit 190. and many more.
  • Figure 1 illustrates a mobile terminal having various components, but it should be understood that not all illustrated components may be implemented and that more or fewer components may be implemented instead. The elements of the mobile terminal will be described in detail below.
  • Wireless communication unit 110 typically includes one or more components that permit radio communication between mobile terminal 100 and a wireless communication system or network.
  • the wireless communication unit may include at least one of a broadcast receiving module 111, a mobile communication module 112, a wireless internet module 113, a short-range communication module 114, and a location information module 115.
  • the broadcast receiving module 111 receives a broadcast signal and/or broadcast associated information from an external broadcast management server via a broadcast channel.
  • the broadcast channel can include a satellite channel and/or a terrestrial channel.
  • the broadcast management server may be a server that generates and transmits a broadcast signal and/or broadcast associated information or a server that receives a previously generated broadcast signal and/or broadcast associated information and transmits it to the terminal.
  • the broadcast signal may include a TV broadcast signal, a radio broadcast signal, a data broadcast signal, and the like.
  • the broadcast signal may also include a broadcast signal combined with a TV or radio broadcast signal.
  • the broadcast associated information may also be provided via a mobile communication network, and in this case, the broadcast associated information may be received by the mobile communication module 112.
  • the broadcast signal may exist in various forms, for example, it may exist in the form of Digital Multimedia Broadcasting (DMB) Electronic Program Guide (EPG), Digital Video Broadcasting Handheld (DVB-H) Electronic Service Guide (ESG), and the like.
  • the broadcast receiving module 111 can use various types of wide The broadcast system receives the signal broadcast.
  • the broadcast receiving module 111 can use forward link media (MediaFLO) by using, for example, multimedia broadcast-terrestrial (DMB-T), digital multimedia broadcast-satellite (DMB-S), digital video broadcast-handheld (DVB-H)
  • the digital broadcasting system of the @) data broadcasting system, the terrestrial digital broadcasting integrated service (ISDB-T), and the like receives digital broadcasting.
  • the broadcast receiving module 111 can be constructed as various broadcast systems suitable for providing broadcast signals as well as the above-described digital broadcast system.
  • the broadcast signal and/or broadcast associated information received via the broadcast receiving module 111 may be stored in the memory 160 (or other type of storage medium).
  • the mobile communication module 112 transmits the radio signals to and/or receives radio signals from at least one of a base station (e.g., an access point, a Node B, etc.), an external terminal, and a server.
  • a base station e.g., an access point, a Node B, etc.
  • Such radio signals may include voice call signals, video call signals, or various types of data transmitted and/or received in accordance with text and/or multimedia messages.
  • the wireless internet module 113 supports wireless internet access of the mobile terminal.
  • the module can be internally or externally coupled to the terminal.
  • the wireless Internet access technologies involved in the module may include WLAN (Wireless LAN) (Wi-Fi), Wibro (Wireless Broadband), Wimax (Worldwide Interoperability for Microwave Access), HSDPA (High Speed Downlink Packet Access), etc. .
  • the short range communication module 114 is a module that is configured to support short range communication.
  • Some examples of short-range communication technologies include BluetoothTM, Radio Frequency Identification (RFID), Infrared Data Association (IrDA), Ultra Wide Band (UWB), ZigbeeTM, and the like.
  • the location information module 115 is a module configured to check or acquire location information of the mobile terminal.
  • a typical example of a location information module is GPS (Global Positioning System).
  • GPS Global Positioning System
  • the GPS module 115 calculates distance information and accurate time information from three or more satellites and applies triangulation to the calculated information to accurately calculate three-dimensional current position information based on longitude, latitude, and altitude.
  • the method set to calculate position and time information uses three satellites and corrects the calculated position and time information errors by using another satellite.
  • the GPS module 115 is capable of calculating speed information by continuously calculating current position information in real time.
  • the A/V input unit 120 is arranged to receive an audio or video signal.
  • the A/V input unit 120 may include a camera 121 and a microphone 1220 that processes image data of still pictures or video obtained by the image capturing device in a video capturing mode or an image capturing mode.
  • the processed image frame can be displayed on the display module 151.
  • Image frames processed by camera 121 can be stored in storage
  • the transmitter 160 (or other storage medium) transmits or transmits via the wireless communication unit 110, and two or more cameras 1210 may be provided according to the configuration of the mobile terminal.
  • the microphone 122 can receive sound (audio data) via a microphone in an operation mode of a telephone call mode, a recording mode, a voice recognition mode, and the like, and can process such sound as audio data.
  • the processed audio (voice) data can be converted to a format output that can be transmitted to the mobile communication base station via the mobile communication module 112 in the case of a telephone call mode.
  • the microphone 122 can implement various types of noise cancellation (or suppression) algorithms to cancel (or suppress) noise or interference generated during the process of receiving and transmitting audio signals.
  • the user input unit 130 may generate key input data according to a command input by the user to control various operations of the mobile terminal.
  • the user input unit 130 allows the user to input various types of information, and may include a keyboard, a pot, a touch pad (eg, a touch sensitive component that detects changes in resistance, pressure, capacitance, etc. due to contact), a scroll wheel , rocker, etc.
  • a touch screen can be formed.
  • the sensing unit 140 detects the current state of the mobile terminal 100 (eg, the open or closed state of the mobile terminal 100), the location of the mobile terminal 100, the presence or absence of contact (ie, touch input) by the user with the mobile terminal 100, and the mobile terminal.
  • the sensing unit 140 can sense whether the slide type phone is turned on or off.
  • the sensing unit 140 can detect whether the power supply unit 190 provides power or whether the interface unit 170 is coupled to an external device.
  • Sensing unit 140 may include proximity sensor 1410 which will be described below in connection with a touch screen.
  • the interface unit 170 serves as an interface through which at least one external device can connect with the mobile terminal 100.
  • the external device may include a wired or wireless headset port, an external power (or battery charger) port, a wired or wireless data port, a memory card port, a port configured to connect a device having an identification module, and an audio input/output. (I/O) port, video I/O port, headphone port, and more.
  • the identification module may be stored to verify various information used by the user using the mobile terminal 100 and may include a User Identification Module (UIM), a Customer Identification Module (SIM), a Universal Customer Identity Module (USIM), and the like.
  • the device having the identification module may take the form of a smart card, and thus the identification device may be connected to the mobile terminal 100 via a port or other connection device.
  • the interface unit 170 can be configured to receive input (eg, data information, power, etc.) from an external device and will The received input is transmitted to one or more components within the mobile terminal 100 or can be used to transfer data between the mobile terminal and an external device.
  • the interface unit 170 may function as a path through which power is supplied from the base to the mobile terminal 100 or may be used as a transmission of various command signals allowing input from the base to the mobile terminal 100 The path to the terminal.
  • Various command signals or power input from the base can be used as signals for identifying whether the mobile terminal is accurately mounted on the base.
  • Output unit 150 is configured to provide an output signal (eg, an audio signal, a video signal, an alarm signal, a vibration signal, etc.) in a visual, audio, and/or tactile manner.
  • the output unit 150 may include a display module 151, an audio output module 152, an alarm module 153, and the like.
  • the display module 151 can display information processed in the mobile terminal 100. For example, when the mobile terminal 100 is in a phone call mode, the display module 151 can display a user interface (UI) or graphical user interface (GUI) associated with a call or other communication (eg, text messaging, multimedia file download, etc.). When the mobile terminal 100 is in a video call mode or an image capture mode, the display module 151 may display a captured image and/or a received image, a UI or GUI showing a video or image and related functions, and the like.
  • UI user interface
  • GUI graphical user interface
  • the display module 151 can function as an input device and an output device.
  • the display module 151 may include at least one of a liquid crystal display (LCD), a thin film transistor LCD (TFT-LCD), an organic light emitting diode (OLED) display, a flexible display, a three-dimensional (3D) display, and the like.
  • LCD liquid crystal display
  • TFT-LCD thin film transistor LCD
  • OLED organic light emitting diode
  • a flexible display a three-dimensional (3D) display, and the like.
  • 3D three-dimensional
  • Some of these displays may be configured to be transparent to allow a user to view from the outside, which may be referred to as a transparent display, and a typical transparent display may be, for example, a TOLED (Transparent Organic Light Emitting Diode) display or the like.
  • TOLED Transparent Organic Light Emitting Diode
  • the mobile terminal 100 may include two or more display modules (or other display devices).
  • the mobile terminal may include an external display module (not shown in FIG. 1) and an internal display module (FIG. 1). Not shown).
  • the touch screen can be set to detect touch input pressure as well as touch input position and touch input area.
  • the audio output module 152 may convert audio data received by the wireless communication unit 110 or stored in the memory 160 when the mobile terminal is in a call signal receiving mode, a call mode, a recording mode, a voice recognition mode, a broadcast receiving mode, and the like.
  • the audio signal is output as sound.
  • the audio output module 152 can provide a sound related to a specific function performed by the mobile terminal 100. Frequency output (for example, call signal reception sound, message reception sound, etc.).
  • the audio output module 152 can include a speaker, a buzzer, and the like.
  • the alert module 153 can provide an output to notify the mobile terminal 100 of the occurrence of an event. Typical events may include call reception, message reception, key signal input, touch input, and the like. In addition to audio or video output, the alert module 153 can provide an output in a different manner to notify of the occurrence of an event. For example, the alarm module 153 can provide an output in the form of vibrations that, when a call, message, or some other incoming communication is received, the alarm module 153 can provide a tactile output (ie, vibration) to notify the user of it. By providing such a tactile output, the user is able to recognize the occurrence of various events even when the user's mobile phone is in the user's pocket. The alarm module 153 can also provide an output of the notification event occurrence via the display module 151 or the audio output module 152.
  • the memory 160 may store a software program or the like for processing and control operations performed by the controller 180, or may temporarily store data (for example, a phone book, a message, a still image, a video, etc.) that has been output or is to be output. Moreover, the memory 160 can store data regarding vibrations and audio signals of various manners that are output when a touch is applied to the touch screen.
  • the memory 160 may include at least one type of storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (eg, SD or DX memory, etc.), a random access memory (RAM), a static random access memory ( SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, and the like.
  • the mobile terminal 100 can cooperate with a network storage device that performs a storage function of the memory 160 through a network connection.
  • the controller 180 typically controls the overall operation of the mobile terminal. For example, the controller 180 performs the control and processing associated with voice calls, data communications, video calls, and the like. Additionally, the controller 180 can include a multimedia module 1810 for reproducing (or playing back) multimedia data, which can be constructed within the controller 180 or can be configured to be separate from the controller 180. The controller 180 may perform a pattern recognition process to recognize a handwriting input or a picture drawing input performed on the touch screen as a character or an image.
  • the power supply unit 190 receives external power or internal power under the control of the controller 180 and provides appropriate power required to operate the various components and components.
  • the various embodiments described herein can be implemented in a computer readable medium using, for example, computer software, hardware, or any combination thereof.
  • the embodiments described herein may be through the use of application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays ( An FPGA, a processor, a controller, a microcontroller, a microprocessor, at least one of the electronic units designed to perform the functions described herein, in some cases, such an embodiment may be at the controller 180 Implemented in the middle.
  • implementations such as procedures or functions may be implemented with separate software modules that permit the execution of at least one function or operation.
  • the software code can be implemented by a software application (or program) written in any suitable programming language, which can be stored in memory 160 and executed by
  • the mobile terminal has been described in terms of its function.
  • a slide type mobile terminal among various types of mobile terminals such as a folding type, a bar type, a swing type, a slide type mobile terminal, and the like will be described as an example. Therefore, the embodiment of the present invention can be applied to any type of mobile terminal, and is not limited to a slide type mobile terminal.
  • the mobile terminal 100 as shown in FIG. 1 may be configured to operate using a communication system such as a wired and wireless communication system and a satellite-based communication system that transmits data via frames or packets.
  • a communication system such as a wired and wireless communication system and a satellite-based communication system that transmits data via frames or packets.
  • a communication system in which a mobile terminal is operable according to an embodiment of the present invention will now be described with reference to FIG.
  • Such communication systems may use different air interfaces and/or physical layers.
  • air interfaces used by communication systems include, for example, Frequency Division Multiple Access (FDMA), Time Division Multiple Access (TDMA), Code Division Multiple Access (CDMA), and Universal Mobile Telecommunications System (UMTS) (in particular, Long Term Evolution (LTE)). ), Global System for Mobile Communications (GSM), etc.
  • FDMA Frequency Division Multiple Access
  • TDMA Time Division Multiple Access
  • CDMA Code Division Multiple Access
  • UMTS Universal Mobile Telecommunications System
  • LTE Long Term Evolution
  • GSM Global System for Mobile Communications
  • the following description relates to a CDMA communication system, but such teachings are equally applicable to other types of systems.
  • a CDMA wireless communication system can include a plurality of mobile terminals 100, a plurality of base stations (BS) 270, a base station controller (BSC) 275, and a mobile switching center (MSC) 280.
  • the MSC 280 is configured to interface with a public switched telephone network (PSTN) 290.
  • PSTN public switched telephone network
  • the MSC 280 is also configured to interface with a BSC 275 that can be coupled to the base station 270 via a backhaul line.
  • the backhaul line can be constructed in accordance with any of a number of well known interfaces including, for example, E1/T1, ATM, IP, PPP, Frame Relay, HDSL, ADSL, or xDSL. It will be understood that the system as shown in Figure 2 can include multiple BSC2750.
  • Each BS 270 can serve one or more partitions (or regions), each of which is covered by a multi-directional antenna or an antenna directed to a particular direction radially away from the BS 270. Alternatively, each partition may be covered by two or more antennas for diversity reception. Each BS 270 can be configured to support multiple frequency allocations, and each frequency allocation has a particular frequency spectrum (eg, 1.25 MHz, 5 MHz, etc.).
  • BS 270 may also be referred to as a Base Transceiver Subsystem (BTS) or other equivalent terminology.
  • BTS Base Transceiver Subsystem
  • the term "base station” can be used to generally refer to a single BSC 275 and at least one BS 270.
  • a base station can also be referred to as a "cell station.”
  • each partition of a particular BS 270 may be referred to as a plurality of cellular stations.
  • a broadcast transmitter (BT) 295 transmits a broadcast signal to the mobile terminal 100 operating within the system.
  • a broadcast receiving module 111 as shown in FIG. 1 is provided at the mobile terminal 100 to receive a broadcast signal transmitted by the BT 295.
  • GPS Global Positioning System
  • the satellite 300 helps locate at least one of the plurality of mobile terminals 100.
  • a plurality of satellites 300 are depicted, but it is understood that useful positioning information can be obtained using any number of satellites.
  • the GPS module 115 as shown in Figure 1 is typically configured to cooperate with the satellite 300 to obtain desired positioning information. Instead of GPS tracking technology or in addition to GPS tracking technology, other techniques that can track the location of the mobile terminal can be used. Additionally, at least one GPS satellite 300 can selectively or additionally process satellite DMB transmissions.
  • BS 270 receives reverse link signals from various mobile terminals 100.
  • Mobile terminal 100 typically participates in calls, messaging, and other types of communications.
  • Each reverse link signal received by a particular base station 270 is processed within a particular BS 270.
  • the obtained data is forwarded to the relevant BSC 275.
  • the BSC provides call resource allocation and coordinated mobility management functions including a soft handoff procedure between the BSs 270.
  • the BSC 275 also routes the received data to the MSC 280, which provides additional routing services for interfacing with the PSTN 290.
  • PSTN 290 interfaces with MSC 280, which forms an interface with BSC 275, and BSC 275 controls BS 270 accordingly to transmit forward link signals to mobile terminal 100.
  • S11 Perform scene recognition on the image according to the preset scene recognition parameter, and generate scene content information of the image.
  • the depth learning technology is used to perform scene recognition on the image according to the scene recognition parameter, and the scene content information of the image is generated, and the scene content information describes the scene.
  • acquiring an image from the outside includes downloading an image from a network or receiving an image transmitted by an external device.
  • the scene recognition parameter can distinguish the scene feature of the image, and the scene recognition parameter can be directly obtained from the external storage locally, or can be trained by the terminal using the big data for deep learning.
  • the manner in which the scene recognition parameters are derived by deep learning training will be described in detail in the next embodiment.
  • the scene content information includes a content label of the image, a coordinate position of the pixel point, content association information, and the like, in other words, at least an object in the image, and may further include an image background, a position layout of each object, an attribute feature, and the like. Attribute attributes such as color, type, shape, and associated information.
  • the depth learning technique is used to identify the scene according to the scene recognition parameter, and the object in the image is detected as strawberry, and the color of the strawberry, the type of the food, the nutritional health, and the like, and finally generate the following information.
  • Scene content information strawberries, food, organic plants, fruits, berries, nutrition, health, fresh, red, grass green, close-up, etc.
  • the depth learning technique is used to identify the scene according to the scene recognition parameter, and the blue sky is detected in the upper right part of the image, and the reddish-brown dome rock is in the lower left part, and the green tree is displayed in the middle.
  • the following scene content information is generated: a background is a reddish-brown rock dome and a blue sky, and the foreground is a dry landscape of some green trees, shrubs and light brown grass.
  • the scene content information is generated and written into the file attribute of the image, so that the file attributes of the image include not only the image height, the exposure time, the number of data bits, the shooting position and the like.
  • the information which also includes the scene content information of the image, adds a new attribute to the image.
  • the image can directly obtain the specific content of the image by directly viewing the file attribute of the image, so that the user can obtain richer image information according to the file attribute of the image, so that the user can quickly browse or filter the image.
  • the end user or the third-party user can further process the image by using the scene content information of the image, and the specific processing procedure will be described in detail in the following embodiments.
  • FIG. 6 a second embodiment of the image processing method of the present invention is proposed, and the method includes the following steps:
  • S21 Perform deep learning by using big data, and train a scene recognition parameter capable of distinguishing scene features of the image.
  • Deep learning is one of the most important breakthroughs in the field of artificial intelligence in the past decade. It has achieved great success in many fields such as speech recognition, natural language processing, computer vision, image and video analysis, and multimedia. Deep learning is a method of modeling patterns (sounds, images, etc.) in the field of machine learning. It is also a statistical-based probabilistic model. After modeling various patterns, various modes can be applied. Identification is performed, for example, when the mode to be modeled is sound, such recognition can be understood as speech recognition.
  • Deep learning stems from the research of artificial neural networks.
  • the multi-layer perceptron with multiple hidden layers is a deep learning structure.
  • Deep learning combines low-level features to form more abstract high-level representation attribute categories or features to discover distributed feature representations of data.
  • This feature extraction method is sometimes manually designed or specified, sometimes it is summarized by the computer itself given the relatively large amount of data.
  • Deep learning proposes a method for the computer to automatically learn the characteristics of the pattern, and integrates the feature learning into the process of building the model, thus reducing the incompleteness caused by the artificial design features.
  • the premise of this algorithm is that the user can provide "significant" order of data. That is to say, in an application scenario that can only provide a limited amount of data, the deep learning algorithm cannot estimate the law of the data without bias, so the recognition effect may not be as good as some existing simple algorithms.
  • the big data platform is used to collect the feature data of different scenes, and then these feature data are input into the convolutional neural network, and various features of different scenes are automatically learned, and the nonlinear characteristic combination parameters of the different scenes are trained. That is, the scene recognition parameters, and then in the specific scene recognition, these scene recognition parameters can be used to identify different scenes, and different backgrounds, objects, and attribute characteristics of the objects in the scene are distinguished.
  • the convolutional neural network classifies and recognizes the scene content: first, input an image; then, extract a sub-region; then, calculate a convolutional neural network feature; and finally, perform region classification.
  • S22 Perform scene recognition on the image according to the preset scene recognition parameter, and generate scene content information of the image.
  • the depth learning technology is used to perform scene recognition on the image according to the scene recognition parameter, and the scene content information of the image is generated, and the scene content information describes the scene.
  • acquiring an image from the outside includes downloading an image from a network or receiving an image transmitted by an external device.
  • the scene content information includes a content label of the image, a coordinate position of the pixel point, content association information, and the like, in other words, at least an object in the image, and may further include an image background, a position layout of each object, an attribute feature, and the like. Attribute attributes such as color, type, shape, and associated information.
  • the depth learning technique is used to identify the scene according to the scene recognition parameter, and the object in the image is detected as strawberry, and the color of the strawberry, the type of the food, the nutritional health, and the like, and finally generate the following information.
  • Scene content information strawberries, food, organic plants, fruits, berries, nutrition, health, fresh, red, grass green, close-up, etc.
  • the depth learning technique is used to identify the scene according to the scene recognition parameter, and the blue sky is detected in the upper right part of the image, and the reddish-brown dome rock is in the lower left part, and the green tree is displayed in the middle.
  • the following scene content information is generated: a background is a reddish-brown rock dome and a blue sky, and the foreground is a dry landscape of some green trees, shrubs and light brown grass.
  • This embodiment uses the scene content information to classify the image.
  • ordinary images are generally classified according to attributes such as time, shooting location, and image size.
  • the scene feature included in the image is immediately analyzed according to the scene content information, and the image is classified according to the visual content according to the features, for example, according to the landscape, Portraits, animals, food, weather, and the environment are divided into different categories.
  • the image can be classified into food, fruit, strawberry, close-up, etc.; according to the scene content information of FIG. 5, the image can be classified into landscape, dry landscape, etc. .
  • the terminal can automatically acquire the scene features included in the image by parsing the content information of the scene, and classify the image according to the visual content according to the scene features, for example, according to the scene content.
  • the scene features for example, according to the scene content.
  • landscapes, portraits, animals, food, weather, and the environment are divided into different categories.
  • the present embodiment automatically uses the scene content information of the image to classify the image, and provides a new image classification method.
  • S31 Perform deep learning by using big data, and train scene recognition parameters that can distinguish scene features of the image.
  • S32 Perform scene recognition on the image according to the preset scene recognition parameter, and generate scene content information of the image.
  • steps S31-S33 are the same as the steps S21-S23 in the second embodiment, and details are not described herein again.
  • This embodiment performs annotation processing on the image using the scene content information. Specifically, when the user issues an image, the user does not need to manually input the text to interpret the content of the image, and the terminal automatically acquires the scene content information of the image, and automatically generates annotation information according to the scene content information to explain the specific content of the image.
  • the terminal automatically generates the annotation information to explain the content of the image, and other users can conveniently view the photo. Learn what the content in the photo is and how it relates to other content on the web. For example, a scene appears in the image, and the annotation information of the scene, the name, the category, and the uniqueness of the scene can be automatically given according to the scene content information of the image.
  • the terminal acquires the scene content information of the image, and automatically generates the annotation information according to the scene content information.
  • the scene content information of the image is used, and the annotation information is automatically generated at the same time as the image is published to interpret the specific content of the image, thereby eliminating the user's manual input and providing the user with a new image sharing experience.
  • a fourth embodiment of the image processing method of the present invention is proposed, and the method includes the following steps:
  • S41 Perform deep learning by using big data, and train a scene recognition parameter capable of distinguishing scene features of the image.
  • S42 Perform scene recognition on the image according to the preset scene recognition parameter, and generate scene content information of the image.
  • steps S41-S43 are the same as the steps S21-S23 in the second embodiment, and details are not described herein again.
  • the image content is optimized by using the scene content information, mainly to adjust the color of the image to enhance the visual effect of the image.
  • the terminal automatically acquires the scene content information of the image, and adopts different optimization strategies according to the scene content information of the image, and optimizes the content of different regions in the image to different degrees.
  • the scene content information display image contains sky or/and grass
  • it automatically turns the sky bluer, showing a blue effect, making the grass greener, showing a green feeling, making the content in the image Optimized to the optimal effect
  • the scene content information shows that the weather in the image is gloomy
  • the gray weather background can be changed Change to a sunny, sunny weather background, enhance low-light areas, and more.
  • the terminal automatically acquires scene content information of the image, and automatically performs optimization processing according to the scene content information.
  • the developer when the image is optimized, can also view the image attributes, obtain the scene content information of the image, and optimize the image by using the scene content information of the image.
  • the image content information of the image is automatically used to optimize the image, so that the optimization process is more targeted and more accurate, and the visual effect of the image is enhanced.
  • the image content information of the image may be used for other aspects of the image processing, which is not limited in the embodiment of the present invention.
  • the image content information of the image may be used for other aspects of the image processing, which is not limited in the embodiment of the present invention.
  • the image processing method of the embodiment of the present invention can also be applied to a non-mobile terminal device such as a personal computer.
  • An embodiment of the present invention further provides an image processing apparatus, which is applied to the foregoing mobile terminal. Based on the above-described mobile terminal hardware structure and communication system, various embodiments of the image processing apparatus of the present invention are proposed.
  • the scene recognition module is configured to perform scene recognition on the image according to the preset scene recognition parameter, and generate scene content information of the image.
  • the scene recognition module immediately uses the deep learning technology to perform scene recognition on the image according to the scene recognition parameter, and generates scene content information of the image, and the scene content information. That is, the text information describing the scene features of the image.
  • acquiring an image from the outside includes downloading an image from a network or receiving an image transmitted by an external device.
  • the scene recognition parameter can distinguish the scene feature of the image, and the scene recognition parameter is a parameter that is directly obtained from the outside and stored locally.
  • the scene content information includes a content tag of an image, a coordinate position of a pixel point, and a content association Information, etc., in other words, includes at least objects in the image, and may also include image backgrounds, positional layouts of objects, attribute features, and the like, such as colors, categories, shapes, and associated information.
  • the depth learning technique is used to identify the scene according to the scene recognition parameter, and the object in the image is detected as strawberry, and the color of the strawberry, the type of the food, the nutritional health, and the like, and finally generate the following information.
  • Scene content information strawberries, food, organic plants, fruits, berries, nutrition, health, fresh, red, grass green, close-up, etc.
  • the depth learning technique is used to identify the scene according to the scene recognition parameter, and the blue sky is detected in the upper right part of the image, and the reddish-brown dome rock is in the lower left part, and the green tree is displayed in the middle.
  • the following scene content information is generated: a background is a reddish-brown rock dome and a blue sky, and the foreground is a dry landscape of some green trees, shrubs and light brown grass.
  • Write module Set to write scene content information to the file attributes of the image.
  • the file attributes of the image include not only the image height, the exposure time, the number of data bits, the shooting position, etc.
  • the parameter information also includes the scene content information of the image, adding a new attribute to the image.
  • the image content of the image can be obtained by directly viewing the file attribute of the image, so that the user can obtain richer image information according to the file attribute of the image, so that the user can quickly browse or filter the image. image.
  • the end user or a third party user can further process the image by using the scene content information of the image.
  • a second embodiment of an image processing apparatus is proposed.
  • the difference between this embodiment and the first embodiment is that a depth learning module is added, and the deep learning module is configured to: use deep data for deep learning and training.
  • a scene recognition parameter capable of distinguishing scene features of the image.
  • Deep learning is one of the most important breakthroughs in the field of artificial intelligence in the past decade. It has achieved great success in many fields such as speech recognition, natural language processing, computer vision, image and video analysis, and multimedia. Deep learning is a method of modeling patterns (sounds, images, etc.) in the field of machine learning. It is also a statistical-based probabilistic model that builds on various patterns. After the modulo, various modes can be identified. For example, when the mode to be modeled is sound, the recognition can be understood as speech recognition.
  • Deep learning stems from the research of artificial neural networks.
  • the multi-layer perceptron with multiple hidden layers is a deep learning structure.
  • Deep learning combines low-level features to form more abstract high-level representation attribute categories or features to discover distributed feature representations of data.
  • This feature extraction method is sometimes manually designed or specified, sometimes it is summarized by the computer itself given the relatively large amount of data.
  • Deep learning proposes a method for the computer to automatically learn the characteristics of the pattern, and integrates the feature learning into the process of building the model, thus reducing the incompleteness caused by the artificial design features.
  • the deep learning module can automatically learn the characteristics of the mode and achieve good recognition accuracy
  • the premise of the work is that the user can provide "significant" order of data. That is to say, in an application scenario that can only provide a limited amount of data, the deep learning module cannot estimate the law of the data without bias, so the recognition effect may not be as good as some existing simple algorithms.
  • the deep learning module first uses the big data platform to collect different scenes.
  • the feature data is then input into the convolutional neural network to automatically learn various features of different scenes, and to train the nonlinear feature combination parameters of the different scenes, that is, the scene recognition parameters, and then in the specific scene.
  • these scene recognition parameters can be used to identify different scenes, and distinguish different backgrounds, objects and attribute characteristics of the objects in the scene.
  • the convolutional neural network classifies and recognizes the scene content: first, input an image; then, extract a sub-region; then, calculate a convolutional neural network feature; and finally, perform region classification.
  • the terminal can automatically obtain the scene recognition parameter by using the big data for deep learning.
  • a third embodiment of the image processing apparatus of the present invention is proposed.
  • the difference between this embodiment and the second embodiment is that an image processing module is added, and the image processing module is configured to: process the image according to the scene content information. .
  • the image processing module includes a classification unit, an annotation unit, and an optimization processing unit, where:
  • Classification unit Set to classify images based on scene content information.
  • the classification unit immediately analyzes the scene features included in the image according to the scene content information, and classifies the image according to the visual content according to the features, for example, according to the Landscapes, portraits, animals, food, weather, and the environment are divided into different categories.
  • the images may be classified into foods, fruits, strawberries, and the like; according to the scene content information of FIG. 5, the images may be classified into landscapes, dry landscapes, and the like.
  • the classification unit can automatically acquire the scene features included in the image by parsing the scene content information, and classify the image according to the visual content according to the scene features, for example, according to the landscape, Portraits, animals, food, weather, and the environment are divided into different categories.
  • Comment unit Set to generate comment information based on scene content information when an image is published.
  • the annotation unit automatically acquires the scene content information of the image, and automatically generates annotation information according to the scene content information to explain the specific content of the image.
  • the annotation unit automatically generates the annotation information to explain the content of the image, and other users can view the photo very much. It's easy to see what's in the photo and how it relates to other content on the web. For example, a scene appears in the image, and the annotation information of the scene, the name, the category, and the uniqueness of the scene can be automatically given according to the scene content information of the image.
  • Optimization unit Set to optimize the image according to the scene content information. Mainly to adjust the color of the image to enhance the visual effect of the image
  • the optimization unit automatically acquires the scene content information of the image, and adopts different optimization strategies according to the scene content information of the image, for different regions in the image.
  • the content is optimized to varying degrees. For example, when the scene content information display image contains sky or / and grass, the optimization unit automatically turns the sky blue, showing a blue effect, making the grass greener, showing a green feeling, making the image
  • the content is optimized to the optimal effect; when the scene content information shows that the weather in the image is gloomy, the gray weather background can be transformed into a sunny weather background, the low-light area is strengthened, and the like.
  • the image content information is automatically classified by using the scene content information of the image, and a new image classification method is provided; using the scene content information of the image, the annotation information is automatically generated at the same time as the image is released to explain the specific content of the image.
  • the user's manual input is saved, which provides a new image sharing experience for the user; the image content information of the image is automatically used to optimize the image, so that the optimization process is more targeted and more accurate, and the visual effect of the image is enhanced.
  • only one or two of the classification unit, the annotation unit, and the optimization unit may be included in the image processing module.
  • the deep learning module may also be omitted, and the scene recognition parameters are acquired from the outside as in the first embodiment and stored locally.
  • the image processing apparatus can also be applied to a non-mobile terminal device such as a personal computer.
  • the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better.
  • Implementation Based on such understanding, the technical solution of the present invention in essence or the contribution to the related art can be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, light).
  • the disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present invention.
  • An embodiment of the present invention discloses an image processing apparatus, a terminal, and a method.
  • the image processing apparatus includes: a scene recognition module, configured to perform scene recognition on an image according to a preset scene recognition parameter, and generate scene content information of the image.
  • the scene content information is text information describing a scene feature of the image; and the writing module is configured to write the scene content information into a file attribute of the image.
  • the file attribute of the image includes not only the shooting parameter information but also the scene content information of the image, and the file attribute of the image is richer and more comprehensive; further, after the user acquires the image, the file attribute of the image is directly viewed without opening the image.
  • the specific content of the image can be obtained, so that the user can obtain richer image information according to the file attribute of the image, so that the user can quickly browse or filter the image.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

本发明实施例公开了一种图像处理装置、终端和方法,所述图像处理装置包括:场景识别模块,设置为根据预置的场景识别参数对图像进行场景识别,生成所述图像的场景内容信息,所述场景内容信息为描述所述图像的场景特征的文字信息;写入模块,设置为将所述场景内容信息写入所述图像的文件属性中。

Description

图像处理装置、终端和方法 技术领域
本发明实施例涉及但不限于通信技术领域。
背景技术
移动互联网时代,移动终端存储的图像数据量呈现爆发式增长。移动终端拍摄图像时,会自动记录图像高度、曝光时间、数据位数、拍摄地理位置等拍摄参数信息,并写入到图像的文件属性中。用户可以通过查看图像的文件属性,了解图像拍摄时的基本信息。然而,对于图像的具体内容,则无法通过文件属性了解,必须打开图像文件凭人眼主观查看获取。因此,相关技术中图像的文件属性包含的信息不够丰富和全面,用户不能利用图像的文件属性对图像进行快速浏览或筛选。
发明内容
本发明实施例提出了一种图像处理装置、终端和方法,可为图像增加一种新的属性,至少使得图像的文件属性包含的信息更加丰富和全面。
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本发明实施例提出一种图像处理装置,包括:
场景识别模块,设置为根据预置的场景识别参数对图像进行场景识别,生成所述图像的场景内容信息,所述场景内容信息为描述所述图像的场景特征的文字信息;
写入模块,设置为将所述场景内容信息写入所述图像的文件属性中。
可选地,该装置还包括图像处理模块,所述图像处理模块设置为:根据所述场景内容信息对所述图像进行处理。
可选地,所述图像处理模块包括分类单元,所述分类单元设置为:根据 所述场景内容信息对所述图像进行分类。
可选地,所述图像处理模块包括注释单元,所述注释单元设置为:当发布所述图像时,根据所述场景内容信息生成注释信息。
可选地,所述图像处理模块包括优化单元,所述优化单元设置为:根据所述场景内容信息对所述图像进行优化处理。
可选地,所述场景识别模块设置为:当拍摄一张图像或从外部获取一张图像后,立即对所述图像进行场景识别。
可选地,所述图像处理装置还包括深度学习模块,所述深度学习模块设置为:利用大数据进行深度学习,训练出能够分辨图像的场景特征的场景识别参数。
可选地,所述分类单元,设置为根据场景内容信息分析出该图像包含的场景特征,根据这些特征将图像按照视觉内容进行分类。
可选地,所述优化单元,设置为获取图像的场景内容信息,根据图像的场景内容信息采取不同的优化策略,对于图像中不同区域的内容进行不同程度的优化;
所述优化包括:调整图像的颜色,增强图像的视觉效果。
本发明实施例同时提出一种终端,所述终端包括上文所述的图像处理装置。
本发明实施例同时提出一种图像处理方法,包括步骤:
根据预置的场景识别参数对图像进行场景识别,生成所述图像的场景内容信息,所述场景内容信息为描述所述图像的场景特征的文字信息;
将所述场景内容信息写入所述图像的文件属性中。
可选地,所述将所述场景内容信息写入所述图像的文件属性中的步骤之后还包括:
根据所述场景内容信息对所述图像进行处理。
可选地,所述根据所述场景内容信息对所述图像进行处理包括:根据所述场景内容信息对所述图像进行分类。
可选地,所述根据所述场景内容信息对所述图像进行处理包括:当发布所述图像时,根据所述场景内容信息生成注释信息。
可选地,所述根据所述场景内容信息对所述图像进行处理包括:根据所述场景内容信息对所述图像进行优化处理。
可选地,所述方法还包括:当拍摄一张图像或从外部获取一张图像后,立即对所述图像进行场景识别
可选地,所述根据预设的场景识别参数对图像进行场景识别的步骤之前还包括:利用大数据进行深度学习,训练出能够分辨图像的场景特征的场景识别参数。
可选地,所述根据所述场景内容信息对所述图像进行分类,包括:
根据场景内容信息分析出该图像包含的场景特征,根据这些特征将图像按照视觉内容进行分类。
可选地,所述根据所述场景内容信息对所述图像进行优化处理,包括:
获取图像的场景内容信息,根据图像的场景内容信息采取不同的优化策略,对于图像中不同区域的内容进行不同程度的优化;
所述优化包括:调整图像的颜色,增强图像的视觉效果。
可选地,所述场景内容信息包括图像的内容标签、像素点的坐标位置、内容关联信息。
本发明实施例所提出的一种图像处理装置,通过对图像场景识别,生成场景内容信息并写入到图像的文件属性中这一系列处理,使得图像的文件属性中不仅包括图像高度、曝光时间、数据位数、拍摄地理位置等拍摄参数信息,还包括图像的场景内容信息,使得图像的文件属性更加丰富和全面。终端用户或者第三方用户获取图像后,无需打开图像,直接查看图像的文件属性就能获取图像的具体内容,使得用户可以根据图像的文件属性获取更丰富的图像信息,方便用户快速浏览或筛选图像。
同时,还可以利用图像的场景内容信息对图像做进一步处理。例如:利用图像的场景内容信息自动对图像进行分类,提供了一种新的图像分类方式;利用图像的场景内容信息,在发布图像的同时自动生成注释信息来解释图像 的具体内容,省去用户手动输入操作,为用户提供了一种新的图像分享体验;利用图像的场景内容信息自动对图像进行优化处理,使得优化处理更有针对性和更加准确,增强了图像的视觉效果。
在阅读并理解了附图和详细描述后,可以明白其他方面。
附图概述
图1为实现本发明各个实施例的移动终端的硬件结构示意图;
图2为如图1所示的移动终端的无线通信系统示意图;
图3为本发明的图像处理方法第一实施例的流程图;
图4为本发明实施例中对一图像进行场景识别的示意图;
图5为本发明实施例中对另一图像进行场景识别的示意图;
图6为本发明的图像处理方法第二实施例的流程图;
图7为本发明实施例中卷积神经网络对场景内容进行分类识别示意图;
图8为本发明的图像处理方法第三实施例的流程图;
图9为本发明的图像处理方法第四实施例的流程图;
图10为本发明的图像处理装置第一实施例的模块示意图;
图11为本发明的图像处理装置第二实施例的模块示意图;
图12为本发明的图像处理装置第三实施例的模块示意图;
图13为图12中的图像处理模块的模块示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
本发明的实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
现在将参考附图描述实现本申请各个实施例的移动终端。在后续的描述 中,使用用于表示元件的诸如“模块”、“部件”或“单元”的后缀仅为了有利于本申请的说明,其本身并没有特定的意义。因此,"模块"与"部件"可以混合地使用。
移动终端可以以各种形式来实施。例如,本发明实施例中描述的终端可以包括诸如移动电话、智能电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、导航装置等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。下面,假设终端是移动终端。然而,本领域技术人员将理解的是,除了特别用于移动目的的元件之外,根据本发明的实施方式的构造也能够应用于固定类型的终端。
图1为实现本发明各个实施例的移动终端的硬件结构示意。
移动终端100可以包括无线通信单元110、A/V(音频/视频)输入单元120、用户输入单元130、感测单元140、输出单元150、存储器160、接口单元170、控制器180和电源单元190等等。图1示出了具有各种组件的移动终端,但是应理解的是,并不要求实施所有示出的组件,可以替代地实施更多或更少的组件。将在下面详细描述移动终端的元件。
无线通信单元110通常包括一个或多个组件,其允许移动终端100与无线通信系统或网络之间的无线电通信。例如,无线通信单元可以包括广播接收模块111、移动通信模块112、无线互联网模块113、短程通信模块114和位置信息模块115中的至少一个。
广播接收模块111经由广播信道从外部广播管理服务器接收广播信号和/或广播相关信息。广播信道可以包括卫星信道和/或地面信道。广播管理服务器可以是生成并发送广播信号和/或广播相关信息的服务器或者接收之前生成的广播信号和/或广播相关信息并且将其发送给终端的服务器。广播信号可以包括TV广播信号、无线电广播信号、数据广播信号等等。而且,广播信号还可以包括与TV或无线电广播信号组合的广播信号。广播相关信息也可以经由移动通信网络提供,并且在该情况下,广播相关信息可以由移动通信模块112来接收。广播信号可以以各种形式存在,例如,其可以以数字多媒体广播(DMB)的电子节目指南(EPG)、数字视频广播手持(DVB-H)的电子服务指南(ESG)等等的形式而存在。广播接收模块111可以通过使用各种类型的广 播系统接收信号广播。特别地,广播接收模块111可以通过使用诸如多媒体广播-地面(DMB-T)、数字多媒体广播-卫星(DMB-S)、数字视频广播-手持(DVB-H),前向链路媒体(MediaFLO@)的数据广播系统、地面数字广播综合服务(ISDB-T)等等的数字广播系统接收数字广播。广播接收模块111可以被构造为适合提供广播信号的各种广播系统以及上述数字广播系统。经由广播接收模块111接收的广播信号和/或广播相关信息可以存储在存储器160(或者其它类型的存储介质)中。
移动通信模块112将无线电信号发送到基站(例如,接入点、节点B等等)、外部终端以及服务器中的至少一个和/或从其接收无线电信号。这样的无线电信号可以包括语音通话信号、视频通话信号、或者根据文本和/或多媒体消息发送和/或接收的各种类型的数据。
无线互联网模块113支持移动终端的无线互联网接入。该模块可以内部或外部地耦接到终端。该模块所涉及的无线互联网接入技术可以包括WLAN(无线LAN)(Wi-Fi)、Wibro(无线宽带)、Wimax(全球微波互联接入)、HSDPA(高速下行链路分组接入)等等。
短程通信模块114是设置为支持短程通信的模块。短程通信技术的一些示例包括蓝牙TM、射频识别(RFID)、红外数据协会(IrDA)、超宽带(UWB)、紫蜂TM等等。
位置信息模块115是设置为检查或获取移动终端的位置信息的模块。位置信息模块的典型示例是GPS(全球定位系统)。根据当前的技术,GPS模块115计算来自三个或更多卫星的距离信息和准确的时间信息并且对于计算的信息应用三角测量法,从而根据经度、纬度和高度准确地计算三维当前位置信息。当前,设置为计算位置和时间信息的方法使用三颗卫星并且通过使用另外的一颗卫星校正计算出的位置和时间信息的误差。此外,GPS模块115能够通过实时地连续计算当前位置信息来计算速度信息。
A/V输入单元120设置为接收音频或视频信号。A/V输入单元120可以包括相机121和麦克风1220,相机121对在视频捕获模式或图像捕获模式中由图像捕获装置获得的静态图片或视频的图像数据进行处理。处理后的图像帧可以显示在显示模块151上。经相机121处理后的图像帧可以存储在存储 器160(或其它存储介质)中或者经由无线通信单元110进行发送,可以根据移动终端的构造提供两个或更多相机1210。麦克风122可以在电话通话模式、记录模式、语音识别模式等等运行模式中经由麦克风接收声音(音频数据),并且能够将这样的声音处理为音频数据。处理后的音频(语音)数据可以在电话通话模式的情况下转换为可经由移动通信模块112发送到移动通信基站的格式输出。麦克风122可以实施各种类型的噪声消除(或抑制)算法以消除(或抑制)在接收和发送音频信号的过程中产生的噪声或者干扰。
用户输入单元130可以根据用户输入的命令生成键输入数据以控制移动终端的各种操作。用户输入单元130允许用户输入各种类型的信息,并且可以包括键盘、锅仔片、触摸板(例如,检测由于被接触而导致的电阻、压力、电容等等的变化的触敏组件)、滚轮、摇杆等等。特别地,当触摸板以层的形式叠加在显示模块151上时,可以形成触摸屏。
感测单元140检测移动终端100的当前状态,(例如,移动终端100的打开或关闭状态)、移动终端100的位置、用户对于移动终端100的接触(即,触摸输入)的有无、移动终端100的取向、移动终端100的加速或减速移动和方向等等,并且生成用于控制移动终端100的操作的命令或信号。例如,当移动终端100实施为滑动型移动电话时,感测单元140可以感测该滑动型电话是打开还是关闭。另外,感测单元140能够检测电源单元190是否提供电力或者接口单元170是否与外部装置耦接。感测单元140可以包括接近传感器1410将在下面结合触摸屏来对此进行描述。
接口单元170用作至少一个外部装置与移动终端100连接可以通过的接口。例如,外部装置可以包括有线或无线头戴式耳机端口、外部电源(或电池充电器)端口、有线或无线数据端口、存储卡端口、设置为连接具有识别模块的装置的端口、音频输入/输出(I/O)端口、视频I/O端口、耳机端口等等。识别模块可以是存储用于验证用户使用移动终端100的各种信息并且可以包括用户识别模块(UIM)、客户识别模块(SIM)、通用客户识别模块(USIM)等等。另外,具有识别模块的装置(下面称为"识别装置")可以采取智能卡的形式,因此,识别装置可以经由端口或其它连接装置与移动终端100连接。接口单元170可以设置为接收来自外部装置的输入(例如,数据信息、电力等等)并且将 接收到的输入传输到移动终端100内的一个或多个元件或者可以用于在移动终端和外部装置之间传输数据。
另外,当移动终端100与外部底座连接时,接口单元170可以用作允许通过其将电力从底座提供到移动终端100的路径或者可以用作允许从底座输入的各种命令信号通过其传输到移动终端的路径。从底座输入的各种命令信号或电力可以用作用于识别移动终端是否准确地安装在底座上的信号。输出单元150被构造为以视觉、音频和/或触觉方式提供输出信号(例如,音频信号、视频信号、警报信号、振动信号等等)。输出单元150可以包括显示模块151、音频输出模块152、警报模块153等等。
显示模块151可以显示在移动终端100中处理的信息。例如,当移动终端100处于电话通话模式时,显示模块151可以显示与通话或其它通信(例如,文本消息收发、多媒体文件下载等等)相关的用户界面(UI)或图形用户界面(GUI)。当移动终端100处于视频通话模式或者图像捕获模式时,显示模块151可以显示捕获的图像和/或接收的图像、示出视频或图像以及相关功能的UI或GUI等等。
同时,当显示模块151和触摸板以层的形式彼此叠加以形成触摸屏时,显示模块151可以用作输入装置和输出装置。显示模块151可以包括液晶显示器(LCD)、薄膜晶体管LCD(TFT-LCD)、有机发光二极管(OLED)显示器、柔性显示器、三维(3D)显示器等等中的至少一种。这些显示器中的一些可以被构造为透明状以允许用户从外部观看,这可以称为透明显示器,典型的透明显示器可以例如为TOLED(透明有机发光二极管)显示器等等。根据特定想要的实施方式,移动终端100可以包括两个或更多显示模块(或其它显示装置),例如,移动终端可以包括外部显示模块(图1未示出)和内部显示模块(图1未示出)。触摸屏可设置为检测触摸输入压力以及触摸输入位置和触摸输入面积。
音频输出模块152可以在移动终端处于呼叫信号接收模式、通话模式、记录模式、语音识别模式、广播接收模式等等模式下时,将无线通信单元110接收的或者在存储器160中存储的音频数据转换音频信号并且输出为声音。而且,音频输出模块152可以提供与移动终端100执行的特定功能相关的音 频输出(例如,呼叫信号接收声音、消息接收声音等等)。音频输出模块152可以包括扬声器、蜂鸣器等等。
警报模块153可以提供输出以将事件的发生通知给移动终端100。典型的事件可以包括呼叫接收、消息接收、键信号输入、触摸输入等等。除了音频或视频输出之外,警报模块153可以以不同的方式提供输出以通知事件的发生。例如,警报模块153可以以振动的形式提供输出,当接收到呼叫、消息或一些其它进入通信(incomingcommunication)时,警报模块153可以提供触觉输出(即,振动)以将其通知给用户。通过提供这样的触觉输出,即使在用户的移动电话处于用户的口袋中时,用户也能够识别出各种事件的发生。警报模块153也可以经由显示模块151或音频输出模块152提供通知事件的发生的输出。
存储器160可以存储由控制器180执行的处理和控制操作的软件程序等等,或者可以暂时地存储己经输出或将要输出的数据(例如,电话簿、消息、静态图像、视频等等)。而且,存储器160可以存储关于当触摸施加到触摸屏时输出的各种方式的振动和音频信号的数据。
存储器160可以包括至少一种类型的存储介质,所述存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等等。而且,移动终端100可以与通过网络连接执行存储器160的存储功能的网络存储装置协作。
控制器180通常控制移动终端的总体操作。例如,控制器180执行与语音通话、数据通信、视频通话等等相关的控制和处理。另外,控制器180可以包括用于再现(或回放)多媒体数据的多媒体模块1810,多媒体模块1810可以构造在控制器180内,或者可以构造为与控制器180分离。控制器180可以执行模式识别处理,以将在触摸屏上执行的手写输入或者图片绘制输入识别为字符或图像。
电源单元190在控制器180的控制下接收外部电力或内部电力并且提供操作各元件和组件所需的适当的电力。
这里描述的各种实施方式可以以使用例如计算机软件、硬件或其任何组合的计算机可读介质来实施。对于硬件实施,这里描述的实施方式可以通过使用特定用途集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理装置(DSPD)、可编程逻辑装置(PLD)、现场可编程门阵列(FPGA)、处理器、控制器、微控制器、微处理器、被设计为执行这里描述的功能的电子单元中的至少一种来实施,在一些情况下,这样的实施方式可以在控制器180中实施。对于软件实施,诸如过程或功能的实施方式可以与允许执行至少一种功能或操作的单独的软件模块来实施。软件代码可以由以任何适当的编程语言编写的软件应用程序(或程序)来实施,软件代码可以存储在存储器160中并且由控制器180执行。
至此,己经按照其功能描述了移动终端。下面,为了简要起见,将描述诸如折叠型、直板型、摆动型、滑动型移动终端等等的各种类型的移动终端中的滑动型移动终端作为示例。因此,本发明实施例能够应用于任何类型的移动终端,并且不限于滑动型移动终端。
如图1中所示的移动终端100可以被构造为利用经由帧或分组发送数据的诸如有线和无线通信系统以及基于卫星的通信系统来操作。
现在将参考图2描述其中根据本发明实施例的移动终端能够操作的通信系统。
这样的通信系统可以使用不同的空中接口和/或物理层。例如,由通信系统使用的空中接口包括例如频分多址(FDMA)、时分多址(TDMA)、码分多址(CDMA)和通用移动通信系统(UMTS)(特别地,长期演进(LTE))、全球移动通信系统(GSM)等等。作为非限制性示例,下面的描述涉及CDMA通信系统,但是这样的教导同样适用于其它类型的系统。
参考图2,CDMA无线通信系统可以包括多个移动终端100、多个基站(BS)270、基站控制器(BSC)275和移动交换中心(MSC)280。MSC280被构造为与公共电话交换网络(PSTN)290形成接口。MSC280还被构造为与可以经由回程线路耦接到基站270的BSC275形成接口。回程线路可以根据若干己知的接口中的任一种来构造,所述接口包括例如E1/T1、ATM,IP、PPP、帧中继、HDSL、ADSL或xDSL。将理解的是,如图2中所示的系统可以包括多 个BSC2750。
每个BS270可以服务一个或多个分区(或区域),由多向天线或指向特定方向的天线覆盖的每个分区放射状地远离BS270。或者,每个分区可以由用于分集接收的两个或更多天线覆盖。每个BS270可以被构造为支持多个频率分配,并且每个频率分配具有特定频谱(例如,1.25MHz,5MHz等等)。
分区与频率分配的交叉可以被称为CDMA信道。BS270也可以被称为基站收发器子系统(BTS)或者其它等效术语。在这样的情况下,术语"基站"可以用于笼统地表示单个BSC275和至少一个BS270。基站也可以被称为"蜂窝站"。或者,特定BS270的各分区可以被称为多个蜂窝站。
如图2中所示,广播发射器(BT)295将广播信号发送给在系统内操作的移动终端100。如图1中所示的广播接收模块111被设置在移动终端100处以接收由BT295发送的广播信号。在图2中,示出了几个全球定位系统(GPS)卫星300。卫星300帮助定位多个移动终端100中的至少一个。
在图2中,描绘了多个卫星300,但是理解的是,可以利用任何数目的卫星获得有用的定位信息。如图1中所示的GPS模块115通常被构造为与卫星300配合以获得想要的定位信息。替代GPS跟踪技术或者在GPS跟踪技术之外,可以使用可以跟踪移动终端的位置的其它技术。另外,至少一个GPS卫星300可以选择性地或者额外地处理卫星DMB传输。
作为无线通信系统的一个典型操作,BS270接收来自各种移动终端100的反向链路信号。移动终端100通常参与通话、消息收发和其它类型的通信。特定基站270接收的每个反向链路信号被在特定BS270内进行处理。获得的数据被转发给相关的BSC275。BSC提供通话资源分配和包括BS270之间的软切换过程的协调的移动管理功能。BSC275还将接收到的数据路由到MSC280,其提供用于与PSTN290形成接口的额外的路由服务。类似地,PSTN290与MSC280形成接口,MSC与BSC275形成接口,并且BSC275相应地控制BS270以将正向链路信号发送到移动终端100。
基于上述移动终端硬件结构以及通信系统,提出本申请的图像处理方法各实施例。
如图3所示,提出本发明的图像处理方法第一实施例,所述方法包括以下步骤:
S11、根据预置的场景识别参数对图像进行场景识别,生成该图像的场景内容信息。
具体的,当终端拍摄一张图像后或从外部获取一张图像后,立即利用深度学习技术根据场景识别参数对该图像进行场景识别,生成该图像的场景内容信息,该场景内容信息即描述该图像的场景特征的文字信息。其中,从外部获取图像,包括从网络上下载图像,或者接收外部设备传送的图像。
所述场景识别参数能够分辨出图像的场景特征,场景识别参数可以直接从外部获取本存储于本地,也可以由终端利用大数据进行深度学习而训练得出。通过深度学习训练得出场景识别参数的方式将在下一实施例中详细说明。
所述场景内容信息包括图像的内容标签、像素点的坐标位置、内容关联信息等,换句话说,至少包括图像中的对象,还可以包括图像背景、各对象的位置布局、属性特征等,所述属性特征如颜色、种类、形状以及关联信息等。
如图4所示,利用深度学习技术并根据场景识别参数对图4进行场景识别,检测出图像中的对象为草莓,以及草莓的颜色、所属食物种类、营养健康等属性特征信息,最终生成以下场景内容信息:草莓、食物、有机植物、水果、浆果、营养、健康、新鲜、红色、草绿色、特写等。
如图5所示,利用深度学习技术并根据场景识别参数对图5进行场景识别,检测出图像右上部为蓝色的天空,左下部为红褐色的圆顶岩石,中间为绿色树木,整体显示为一干燥景观,从而生成以下场景内容信息:一片背景为红褐色岩石圆顶和蓝天,前景为一些绿树、灌木和浅棕色小草的干燥景观。
S12、将场景内容信息写入图像的文件属性中。
可见,通过对图像场景识别,生成场景内容信息并写入到图像的文件属性中这一系列处理,使得图像的文件属性中不仅包括图像高度、曝光时间、数据位数、拍摄地理位置等拍摄参数信息,还包括图像的场景内容信息,为图像增加了一种新的属性。终端用户或者第三方用户获取图像后,无需打开 图像,直接查看图像的文件属性就能获取图像的具体内容,使得用户可以根据图像的文件属性获取更丰富的图像信息,方便用户快速浏览或筛选图像。
此外,终端用户或者第三方用户还可以利用图像的场景内容信息对图像进行进一步处理,具体处理过程将在后面的实施例中详细说明。
如图6所示,提出本发明的图像处理方法第二实施例,所述方法包括以下步骤:
S21、利用大数据进行深度学习,训练出能够分辨图像的场景特征的场景识别参数。
深度学习是近十年来人工智能领域取得的最重要的突破之一,它在语音识别、自然语言处理、计算机视觉、图像与视频分析、多媒体等诸多领域都取得了巨大成功。深度学习是机器学习领域中对模式(声音、图像等等)进行建模的一种方法,它也是一种基于统计的概率模型,在对各种模式进行建模之后,便可以对各种模式进行识别,例如待建模的模式是声音时,这种识别便可以理解为语音识别。
深度学习的概念源于人工神经网络的研究,含多隐层的多层感知器就是一种深度学习结构。深度学习通过组合低层特征形成更加抽象的高层表示属性类别或特征,以发现数据的分布式特征表示。为了进行某种模式的识别,通常的做法首先是以某种方式,提取这个模式中的特征。这个特征的提取方式有时候是人工设计或指定的,有时候是在给定相对较多数据的前提下,由计算机自己总结出来的。深度学习提出了一种让计算机自动学习出模式特征的方法,并将特征学习融入到了建立模型的过程中,从而减少了人为设计特征造成的不完备性。
深度学习虽然能够自动的学习模式的特征,并可以达到很好的识别精度,但这种算法工作的前提是,使用者能够提供“相当大”量级的数据。也就是说在只能提供有限数据量的应用场景下,深度学习算法便不能够对数据的规律进行无偏差的估计了,因此在识别效果上可能不如一些已有的简单算法。
目前,随着大数据的兴起,终端设备特别是移动终端大量的语音和图像数据为深度学习提供了源源不断的数据来源,具体到图像场景识别中,深度 学习首先利用大数据平台收集不同场景的特征数据,然后将这这些特征数据输入到卷积神经网络中,进行自动学习不同场景的各种特征,训练出分类这些不同场景的非线性特征组合参数,即场景识别参数,之后在具体的场景识别中就可以利用这些场景识别参数去识别不同的场景,分辨出场景中的不同背景、对象以及对象的属性特征。
如图7所示,为深度学习过程中,卷积神经网络对场景内容的分类识别过程:首先,输入图像;接着,提取子区域;然后,计算卷积神经网络特征;最后,进行区域分类。
S22、根据预置的场景识别参数对图像进行场景识别,生成该图像的场景内容信息。
具体的,当终端拍摄一张图像后或从外部获取一张图像后,立即利用深度学习技术根据场景识别参数对该图像进行场景识别,生成该图像的场景内容信息,该场景内容信息即描述该图像的场景特征的文字信息。其中,从外部获取图像,包括从网络上下载图像,或者接收外部设备传送的图像。
所述场景内容信息包括图像的内容标签、像素点的坐标位置、内容关联信息等,换句话说,至少包括图像中的对象,还可以包括图像背景、各对象的位置布局、属性特征等,所述属性特征如颜色、种类、形状以及关联信息等。
如图4所示,利用深度学习技术并根据场景识别参数对图4进行场景识别,检测出图像中的对象为草莓,以及草莓的颜色、所属食物种类、营养健康等属性特征信息,最终生成以下场景内容信息:草莓、食物、有机植物、水果、浆果、营养、健康、新鲜、红色、草绿色、特写等。
如图5所示,利用深度学习技术并根据场景识别参数对图5进行场景识别,检测出图像右上部为蓝色的天空,左下部为红褐色的圆顶岩石,中间为绿色树木,整体显示为一干燥景观,从而生成以下场景内容信息:一片背景为红褐色岩石圆顶和蓝天,前景为一些绿树、灌木和浅棕色小草的干燥景观。
S23、将场景内容信息写入图像的文件属性中。
S24、根据场景内容信息对图像进行分类。
本实施例利用场景内容信息对图像进行分类处理。目前,普通的图像一般按照时间、拍摄地点、图像大小等属性进行分类。本实施例中,当终端将图像的场景内容信息写入图像的文件属性后,立即根据场景内容信息分析出该图像包含的场景特征,根据这些特征将图像按照视觉内容进行分类,例如按照风景、人像、动物、食物、天气、环境等分成不同的类别。
举例而言,根据图4的场景内容信息,可以将图像分类为食物类、水果类、草莓类、特写类等;根据图5的场景内容信息,可以将图像分类为风景类、干燥景观类等。
此外,当第三方使用者获取到包含有场景内容信息的图像时,终端通过解析该场景内容信息即可以自动获取该图像包含的场景特征,根据这些场景特征将图像按照视觉内容进行分类,例如按照风景、人像、动物、食物、天气、环境等分成不同的类别。
从而,本实施例利用图像的场景内容信息,自动对图像进行分类,提供了一种新的图像分类方式。
如图8所示,提出本发明的图像处理方法第三实施例,所述方法包括以下步骤:
S31、利用大数据进行深度学习,训练出能够分辨图像的场景特征的场景识别参数。
S32、根据预置的场景识别参数对图像进行场景识别,生成图像的场景内容信息。
S33、将场景内容信息写入图像的文件属性中。
本实施例中,步骤S31-S33分别与第二实施例中的步骤S21-S23相同,在此不再赘述。
S34、当发布图像时,根据场景内容信息生成注释信息。
本实施例利用场景内容信息对图像进行注释处理。具体的,当用户发布图像时,无需用户手动输入文字解释该图像的内容,终端自动获取该图像的场景内容信息,并根据该场景内容信息自动生成注释信息,以解释该图像的具体内容。
例如,社交应用场景中,用户拍摄了照片并在社交软件上传照片时,无须解释拍摄内容,照片上传后,终端自动生成注释信息,对图像的内容进行解释,其他用户查看该照片时可以很方便地了解到该照片里面的内容是什么,以及该内容和网络上的其他内容关联性。例如,图像中出现一处景物,可以根据图像的场景内容信息自动给出该景物的相关位置、名称、所属种类、独特性等注释信息。
此外,当第三方使用者获取包含场景内容信息的图像,并进行发布时,终端获取该图像的场景内容信息,根据场景内容信息自动生成注释信息。
本实施例利用图像的场景内容信息,在发布图像的同时自动生成注释信息来解释图像的具体内容,省去了用户手动输入,为用户提供了一种新的图像分享体验。
如图9所示,提出本发明的图像处理方法第四实施例,所述方法包括以下步骤:
S41、利用大数据进行深度学习,训练出能够分辨图像的场景特征的场景识别参数。
S42、根据预置的场景识别参数对图像进行场景识别,生成图像的场景内容信息。
S43、将场景内容信息写入图像的文件属性中。
本实施例中,步骤S41-S43分别与第二实施例中的步骤S21-S23相同,在此不再赘述。
S44、根据场景内容信息对图像进行优化处理。
本实施例利用场景内容信息对图像进行优化处理,主要是对图像的颜色进行调整,增强图像的视觉效果。具体的,当对图像进行优化处理时,终端自动获取图像的场景内容信息,根据图像的场景内容信息采取不同的优化策略,对于图像中不同区域的内容进行不同程度的优化。例如,当场景内容信息显示图像中包含天空或/和草地时,则自动将天空变得更蓝,显出蔚蓝的效果,将草地变得更绿,显出绿油油的感觉,使得图像中的内容优化到最优效果;当场景内容信息显示图像中的天气阴沉时,可以将灰蒙蒙的天气背景变 换成阳光明亮的天气背景,对低光照区域进行加强等等。
此外,当第三方使用者或开发者获取包含场景内容信息的图像,并进行优化处理时,终端自动获取该图像的场景内容信息,根据场景内容信息自动进行优化处理。
在某些实施例中,在对图像进行优化处理时,开发者也可以查看图像属性,获取图像的场景内容信息,借助图像的场景内容信息对图像进行优化处理。
本实施例利用图像的场景内容信息自动对图像进行优化处理,使得优化处理更有针对性和更加准确,增强了图像的视觉效果。
应当理解,除了前述实施例列举的利用图像的场景内容信息进行的图像处理方式外,还可以利用图像的场景内容信息对图像进行其它方面的处理,本发明实施例对此不作限制,同理均在本发明实施例的保护范围内。
本发明实施例的图像处理方法,还可以应用于个人电脑等非移动终端设备。
本发明实施例还提供一种图像处理装置,应用于前述移动终端。现基于上述移动终端硬件结构以及通信系统,提出本发明的图像处理装置各实施例。
参见图10,提出本发明的图像处理装置第一实施例,所述装置包括以下模块:
场景识别模块:设置为根据预置的场景识别参数对图像进行场景识别,生成图像的场景内容信息。
具体的,当终端拍摄一张图像后或从外部获取一张图像后,场景识别模块立即利用深度学习技术根据场景识别参数对该图像进行场景识别,生成该图像的场景内容信息,该场景内容信息即描述该图像的场景特征的文字信息。其中,从外部获取图像,包括从网络上下载图像,或者接收外部设备传送的图像。
所述场景识别参数能够分辨出图像的场景特征,场景识别参数为直接从外部获取并存储于本地的参数。
所述场景内容信息包括图像的内容标签、像素点的坐标位置、内容关联 信息等,换句话说,至少包括图像中的对象,还可以包括图像背景、各对象的位置布局、属性特征等,所述属性特征如颜色、种类、形状以及关联信息等。
如图4所示,利用深度学习技术并根据场景识别参数对图4进行场景识别,检测出图像中的对象为草莓,以及草莓的颜色、所属食物种类、营养健康等属性特征信息,最终生成以下场景内容信息:草莓、食物、有机植物、水果、浆果、营养、健康、新鲜、红色、草绿色、特写等。
如图5所示,利用深度学习技术并根据场景识别参数对图5进行场景识别,检测出图像右上部为蓝色的天空,左下部为红褐色的圆顶岩石,中间为绿色树木,整体显示为一干燥景观,从而生成以下场景内容信息:一片背景为红褐色岩石圆顶和蓝天,前景为一些绿树、灌木和浅棕色小草的干燥景观。
写入模块:设置为将场景内容信息写入图像的文件属性中。
从而,通过对图像进行场景识别,生成场景内容信息并写入到图像的文件属性中这一系列处理,使得图像的文件属性中不仅包括图像高度、曝光时间、数据位数、拍摄地理位置等拍摄参数信息,还包括图像的场景内容信息,为图像增加了一种新的属性。终端用户或者第三方用户获取图像后,无需打开图像,直接查看图像的文件属性就能获取图像的场景内容信息,使得用户可以根据图像的文件属性获取更丰富的图像信息,方便用户快速浏览或筛选图像。
此外,终端用户或者第三方用户还可以利用图像的场景内容信息对图像进行进一步处理。
参见图11,提出本发明的图像处理装置第二实施例,本实施例与第一实施例的区别是增加了一深度学习模块,所述深度学习模块设置为:利用大数据进行深度学习,训练出能够分辨图像的场景特征的场景识别参数。
深度学习是近十年来人工智能领域取得的最重要的突破之一,它在语音识别、自然语言处理、计算机视觉、图像与视频分析、多媒体等诸多领域都取得了巨大成功。深度学习是机器学习领域中对模式(声音、图像等等)进行建模的一种方法,它也是一种基于统计的概率模型,在对各种模式进行建 模之后,便可以对各种模式进行识别,例如待建模的模式是声音时,这种识别便可以理解为语音识别。
深度学习的概念源于人工神经网络的研究,含多隐层的多层感知器就是一种深度学习结构。深度学习通过组合低层特征形成更加抽象的高层表示属性类别或特征,以发现数据的分布式特征表示。为了进行某种模式的识别,通常的做法首先是以某种方式,提取这个模式中的特征。这个特征的提取方式有时候是人工设计或指定的,有时候是在给定相对较多数据的前提下,由计算机自己总结出来的。深度学习提出了一种让计算机自动学习出模式特征的方法,并将特征学习融入到了建立模型的过程中,从而减少了人为设计特征造成的不完备性。
深度学习模块虽然能够自动的学习模式的特征,并可以达到很好的识别精度,但其工作的前提是,使用者能够提供“相当大”量级的数据。也就是说在只能提供有限数据量的应用场景下,深度学习模块便不能够对数据的规律进行无偏差的估计了,因此在识别效果上可能不如一些已有的简单算法。
目前随着大数据的兴起,终端设备特别是移动终端大量的语音和图像数据为深度学习提供了源源不断的数据来源,具体到图像场景识别中,深度学习模块首先利用大数据平台收集不同场景的特征数据,然后将这这些特征数据输入到卷积神经网络中,进行自动学习不同场景的各种特征,训练出分类这些不同场景的非线性特征组合参数,即场景识别参数,之后在具体的场景识别中就可以利用这些场景识别参数去识别不同的场景,分辨出场景中的不同背景、对象以及对象的属性特征。
如图7所示,为深度学习过程中,卷积神经网络对场景内容的分类识别过程:首先,输入图像;接着,提取子区域;然后,计算卷积神经网络特征;最后,进行区域分类。
本实施例可以由终端自动利用大数据进行深度学习而获得场景识别参数。
参见图12,提出本发明的图像处理装置第三实施例,本实施例与第二实施例的区别是增加了一图像处理模块,所述图像处理模块设置为:根据场景内容信息对图像进行处理。
具体的,如图13所示,图像处理模块包括分类单元、注释单元和优化处理单元,其中:
分类单元:设置为根据场景内容信息对图像进行分类。
具体的,当写入模块将图像的场景内容信息写入图像的文件属性后,分类单元立即根据场景内容信息分析出该图像包含的场景特征,根据这些特征将图像按照视觉内容进行分类,例如按照风景、人像、动物、食物、天气、环境等分成不同的类别。
举例而言,根据图4的场景内容信息,可以将图像分类为食物类、水果类、草莓类等;根据图5的场景内容信息,可以将图像分类为风景类、干燥景观类等。
此外,当终端获取到包含有场景内容信息的图像时,分类单元通过解析该场景内容信息即可以自动获取该图像包含的场景特征,根据这些场景特征将图像按照视觉内容进行分类,例如按照风景、人像、动物、食物、天气、环境等分成不同的类别。
注释单元:设置为当发布图像时,根据场景内容信息生成注释信息。
具体的,当用户发布图像时,无需用户手动输入文字解释该图像的内容,注释单元自动获取该图像的场景内容信息,并根据该场景内容信息自动生成注释信息,以解释该图像的具体内容。
例如,社交应用场景中,用户拍摄了照片并在社交软件上传照片时,无须解释拍摄内容,照片上传后,注释单元自动生成注释信息,对图像的内容进行解释,其他用户查看该照片时可以很方便地了解到该照片里面的内容是什么,以及该内容和网络上的其他内容关联性。例如,图像中出现一处景物,可以根据图像的场景内容信息自动给出该景物的相关位置、名称、所属种类、独特性等注释信息。
优化单元:设置为根据场景内容信息对图像进行优化处理。主要是对图像的颜色进行调整,增强图像的视觉效果
具体的,当对图像进行优化处理时,优化单元自动获取图像的场景内容信息,根据图像的场景内容信息采取不同的优化策略,对于图像中不同区域 的内容进行不同程度的优化。例如,当场景内容信息显示图像中包含天空或/和草地时,优化单元则自动将天空变得更蓝,显出蔚蓝的效果,将草地变得更绿,显出绿油油的感觉,使得图像中的内容优化到最优效果;当场景内容信息显示图像中的天气阴沉时,可以将灰蒙蒙的天气背景变换成阳光明亮的天气背景,对低光照区域进行加强等等。
本实施例中,利用图像的场景内容信息自动对图像进行分类,提供了一种新的图像分类方式;利用图像的场景内容信息,在发布图像的同时自动生成注释信息来解释图像的具体内容,省去了用户手动输入,为用户提供了一种新的图像分享体验;利用图像的场景内容信息自动对图像进行优化处理,使得优化处理更有针对性和更加准确,增强了图像的视觉效果。
在某些实施例中,图像处理模块中也可以只包括分类单元、注释单元和优化单元中的其中一个或者两个。
在某些实施例中,也可以省略深度学习模块,像第一实施例那样从外部获取场景识别参数并存储于本地。
本发明实施例的图像处理装置,还可以应用于个人电脑等非移动终端设备。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光 盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。
以上仅为本发明的优选实施例,并非因此限制本发明的保护范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的保护范围内。
工业实用性
本发明实施例公开了一种图像处理装置、终端和方法,所述图像处理装置包括:场景识别模块,设置为根据预置的场景识别参数对图像进行场景识别,生成所述图像的场景内容信息,所述场景内容信息为描述所述图像的场景特征的文字信息;写入模块,设置为将所述场景内容信息写入所述图像的文件属性中。本发明实施例使得图像的文件属性中不仅包括拍摄参数信息,还包括图像的场景内容信息,图像的文件属性更加丰富和全面;而且,用户获取图像后,无需打开图像,直接查看图像的文件属性就能获取图像的具体内容,使得用户可以根据图像的文件属性获取更丰富的图像信息,方便用户快速浏览或筛选图像。

Claims (20)

  1. 一种图像处理装置,其中,包括:
    场景识别模块,设置为根据预置的场景识别参数对图像进行场景识别,生成所述图像的场景内容信息,所述场景内容信息为描述所述图像的场景特征的文字信息;
    写入模块,设置为将所述场景内容信息写入所述图像的文件属性中。
  2. 根据权利要求1所述的图像处理装置,该装置还包括图像处理模块,所述图像处理模块设置为:根据所述场景内容信息对所述图像进行处理。
  3. 根据权利要求2所述的图像处理装置,其中,所述图像处理模块包括分类单元,所述分类单元设置为:根据所述场景内容信息对所述图像进行分类。
  4. 根据权利要求2所述的图像处理装置,其中,所述图像处理模块包括注释单元,所述注释单元设置为:当发布所述图像时,根据所述场景内容信息生成注释信息。
  5. 根据权利要求2所述的图像处理装置,其中,所述图像处理模块包括优化单元,所述优化单元设置为:根据所述场景内容信息对所述图像进行优化处理。
  6. 根据权利要求1所述的图像处理装置,其中,所述场景识别模块还设置为:当拍摄一张图像或从外部获取一张图像后,立即对所述图像进行场景识别。
  7. 根据权利要求1-6中任一项所述的图像处理装置,其中,所述图像处理装置还包括深度学习模块,所述深度学习模块设置为:利用大数据进行深度学习,训练出能够分辨图像的场景特征的场景识别参数。
  8. 根据权利要求3所述的图像处理装置,其中,所述分类单元,设置为根据场景内容信息分析出该图像包含的场景特征,根据这些特征将图像按照视觉内容进行分类。
  9. 根据权利要求5所述的图像处理装置,其中,所述优化单元,设置为获取图像的场景内容信息,根据图像的场景内容信息采取不同的优化策略, 对于图像中不同区域的内容进行不同程度的优化;
    所述优化包括:调整图像的颜色,增强图像的视觉效果。
  10. 一种终端,所述终端包括权利要求1-9中任一项所述的图像处理装置。
  11. 一种图像处理方法,所述方法包括步骤:
    根据预置的场景识别参数对图像进行场景识别,生成所述图像的场景内容信息,所述场景内容信息为描述所述图像的场景特征的文字信息;
    将所述场景内容信息写入所述图像的文件属性中。
  12. 根据权利要求11所述的图像处理方法,所述将所述场景内容信息写入所述图像的文件属性中的步骤之后,该方法还包括:
    根据所述场景内容信息对所述图像进行处理。
  13. 根据权利要求12所述的图像处理方法,其中,所述根据所述场景内容信息对所述图像进行处理包括:
    根据所述场景内容信息对所述图像进行分类。
  14. 根据权利要求12所述的图像处理方法,其中,所述根据所述场景内容信息对所述图像进行处理包括:
    当发布所述图像时,根据所述场景内容信息生成注释信息。
  15. 根据权利要求12所述的图像处理方法,其中,所述根据所述场景内容信息对所述图像进行处理包括:
    根据所述场景内容信息对所述图像进行优化处理。
  16. 根据权利要求11所述的图像处理方法,所述方法还包括:
    当拍摄一张图像或从外部获取一张图像后,立即对所述图像进行场景识别。
  17. 根据权利要求11-16中任一项所述的图像处理方法,所述根据预设的场景识别参数对图像进行场景识别的步骤之前还包括:
    利用大数据进行深度学习,训练出能够分辨图像的场景特征的场景识别参数。
  18. 根据权利要求13所述的图像处理方法,其中,所述根据所述场景内容信息对所述图像进行分类,包括:
    根据场景内容信息分析出该图像包含的场景特征,根据这些特征将图像按照视觉内容进行分类。
  19. 根据权利要求15所述的图像处理方法,其中,所述根据所述场景内容信息对所述图像进行优化处理,包括:
    获取图像的场景内容信息,根据图像的场景内容信息采取不同的优化策略,对于图像中不同区域的内容进行不同程度的优化;
    所述优化包括:调整图像的颜色,增强图像的视觉效果。
  20. 根据权利要求11-16中任一项所述的图像处理方法,其中,所述场景内容信息包括图像的内容标签、像素点的坐标位置、内容关联信息。
PCT/CN2016/099865 2015-09-30 2016-09-23 图像处理装置、终端和方法 WO2017054676A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510644164.2 2015-09-30
CN201510644164.2A CN105302872A (zh) 2015-09-30 2015-09-30 图像处理装置和方法

Publications (1)

Publication Number Publication Date
WO2017054676A1 true WO2017054676A1 (zh) 2017-04-06

Family

ID=55200142

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/099865 WO2017054676A1 (zh) 2015-09-30 2016-09-23 图像处理装置、终端和方法

Country Status (2)

Country Link
CN (1) CN105302872A (zh)
WO (1) WO2017054676A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109255826A (zh) * 2018-10-11 2019-01-22 平安科技(深圳)有限公司 中文训练图像生成方法、装置、计算机设备及存储介质
CN111027622A (zh) * 2019-12-09 2020-04-17 Oppo广东移动通信有限公司 图片标签生成方法、装置、计算机设备及存储介质
CN114677691A (zh) * 2022-04-06 2022-06-28 北京百度网讯科技有限公司 文本识别方法、装置、电子设备及存储介质

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105302872A (zh) * 2015-09-30 2016-02-03 努比亚技术有限公司 图像处理装置和方法
CN106156310A (zh) * 2016-06-30 2016-11-23 努比亚技术有限公司 一种图片处理装置和方法
CN106991427A (zh) * 2017-02-10 2017-07-28 海尔优家智能科技(北京)有限公司 果蔬新鲜度的识别方法及装置
CN109406412A (zh) * 2017-08-18 2019-03-01 广州极飞科技有限公司 一种植物健康状态监控方法及装置
CN107808125A (zh) * 2017-09-30 2018-03-16 珠海格力电器股份有限公司 图像共享方法及装置
CN109688351B (zh) 2017-10-13 2020-12-15 华为技术有限公司 一种图像信号处理方法、装置及设备
CN107820020A (zh) * 2017-12-06 2018-03-20 广东欧珀移动通信有限公司 拍摄参数的调整方法、装置、存储介质及移动终端
CN108462876B (zh) * 2018-01-19 2021-01-26 瑞芯微电子股份有限公司 一种视频解码优化调整装置及方法
CN111566639A (zh) * 2018-02-09 2020-08-21 华为技术有限公司 一种图像分类方法及设备
CN108629767B (zh) * 2018-04-28 2021-03-26 Oppo广东移动通信有限公司 一种场景检测的方法、装置及移动终端
CN108683826B (zh) * 2018-05-15 2021-12-14 腾讯科技(深圳)有限公司 视频数据处理方法、装置、计算机设备和存储介质
CN110619251B (zh) * 2018-06-19 2022-06-10 Oppo广东移动通信有限公司 图像处理方法和装置、存储介质、电子设备
CN109815462B (zh) * 2018-12-10 2023-12-01 维沃移动通信有限公司 一种文本生成方法及终端设备
CN110717475A (zh) * 2019-10-18 2020-01-21 北京汽车集团有限公司 自动驾驶场景分类方法及系统
CN112287790A (zh) * 2020-10-20 2021-01-29 北京字跳网络技术有限公司 影像处理方法、装置、存储介质及电子设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102422286A (zh) * 2009-03-11 2012-04-18 香港浸会大学 利用图像获取参数和元数据自动和半自动的图像分类、注释和标签
CN104572905A (zh) * 2014-12-26 2015-04-29 小米科技有限责任公司 照片索引创建方法、照片搜索方法及装置
CN105302872A (zh) * 2015-09-30 2016-02-03 努比亚技术有限公司 图像处理装置和方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7555165B2 (en) * 2003-11-13 2009-06-30 Eastman Kodak Company Method for semantic scene classification using camera metadata and content-based cues

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102422286A (zh) * 2009-03-11 2012-04-18 香港浸会大学 利用图像获取参数和元数据自动和半自动的图像分类、注释和标签
CN104572905A (zh) * 2014-12-26 2015-04-29 小米科技有限责任公司 照片索引创建方法、照片搜索方法及装置
CN105302872A (zh) * 2015-09-30 2016-02-03 努比亚技术有限公司 图像处理装置和方法

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109255826A (zh) * 2018-10-11 2019-01-22 平安科技(深圳)有限公司 中文训练图像生成方法、装置、计算机设备及存储介质
CN109255826B (zh) * 2018-10-11 2023-11-21 平安科技(深圳)有限公司 中文训练图像生成方法、装置、计算机设备及存储介质
CN111027622A (zh) * 2019-12-09 2020-04-17 Oppo广东移动通信有限公司 图片标签生成方法、装置、计算机设备及存储介质
CN111027622B (zh) * 2019-12-09 2023-12-08 Oppo广东移动通信有限公司 图片标签生成方法、装置、计算机设备及存储介质
CN114677691A (zh) * 2022-04-06 2022-06-28 北京百度网讯科技有限公司 文本识别方法、装置、电子设备及存储介质
CN114677691B (zh) * 2022-04-06 2023-10-03 北京百度网讯科技有限公司 文本识别方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN105302872A (zh) 2016-02-03

Similar Documents

Publication Publication Date Title
WO2017054676A1 (zh) 图像处理装置、终端和方法
CN105404484B (zh) 终端分屏装置及方法
CN106156310A (zh) 一种图片处理装置和方法
CN106502693A (zh) 一种图像显示方法和装置
US8326354B2 (en) Portable terminal for explaining information of wine and control method thereof
CN106155311A (zh) Ar头戴设备、ar交互系统及ar场景的交互方法
CN106453941A (zh) 双屏操作方法及移动终端
CN107705251A (zh) 图片拼接方法、移动终端及计算机可读存储介质
CN106686301A (zh) 照片拍摄方法及装置
CN106657650B (zh) 一种系统表情推荐方法、装置及终端
CN106791204A (zh) 移动终端及其拍摄方法
CN104679890B (zh) 图片推送方法及装置
CN106909274A (zh) 一种图像显示方法和装置
CN105956999A (zh) 缩略图生成装置和方法
CN106534619A (zh) 一种调整对焦区域的方法、装置和终端
CN105933529A (zh) 拍摄画面的显示方法及装置
CN107071321B (zh) 一种视频文件的处理方法、装置和终端
CN106790202A (zh) 一种多媒体文件分享处理方法、装置及终端
CN106851113A (zh) 一种基于双摄像头的拍照方法及移动终端
CN106850941A (zh) 照片拍摄方法及装置
CN106534693A (zh) 一种照片处理方法、装置及终端
CN107071263A (zh) 一种图像处理方法及终端
CN105242483B (zh) 一种实现对焦的方法和装置、实现拍照的方法和装置
CN106372607A (zh) 一种从视频中提取图片的方法及移动终端
CN105893490A (zh) 图片展示装置和方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16850300

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16850300

Country of ref document: EP

Kind code of ref document: A1