WO2023102935A1 - Image data processing method, intelligent terminal, and storage medium - Google Patents

Image data processing method, intelligent terminal, and storage medium Download PDF

Info

Publication number
WO2023102935A1
WO2023102935A1 PCT/CN2021/137246 CN2021137246W WO2023102935A1 WO 2023102935 A1 WO2023102935 A1 WO 2023102935A1 CN 2021137246 W CN2021137246 W CN 2021137246W WO 2023102935 A1 WO2023102935 A1 WO 2023102935A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
information
scene
optionally
semantic information
Prior art date
Application number
PCT/CN2021/137246
Other languages
French (fr)
Chinese (zh)
Inventor
应贲
Original Assignee
深圳传音控股股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳传音控股股份有限公司 filed Critical 深圳传音控股股份有限公司
Priority to PCT/CN2021/137246 priority Critical patent/WO2023102935A1/en
Publication of WO2023102935A1 publication Critical patent/WO2023102935A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04845Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range for image manipulation, e.g. dragging, rotation, expansion or change of colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/80Creating or modifying a manually drawn or painted image using a manual input device, e.g. mouse, light pen, direction keys on keyboard

Definitions

  • the present application relates to the technical field of image data processing, and in particular to an image data processing method, an intelligent terminal and a storage medium.
  • Computational photography refers to digital image capture and processing techniques that use digital computation rather than optical processing. Computational photography can increase the capabilities of camera equipment, or introduce more features than film-based photography, or reduce the cost or size of camera elements.
  • the present application provides an image data processing method, a smart terminal and a storage medium, so as to uniformly standardize the semantic information of computational photography in the data flow involved in computational photography.
  • the present application provides a method for processing image data, comprising the following steps:
  • the image information includes basic image information and image data.
  • the image data is the image itself.
  • basic image information can also be referred to as basic image description information, which can include image description information identification, basic description information length, image type identification, image length, image width, image color space, bit width, and storage mode.
  • the image description information identifier is used to identify the "basic description information" field of the image.
  • the length of the basic description information indicates the total length of the basic description information field, including the image description information identifier.
  • the image type identifier is used to identify whether the image data type is a single-frame image, multi-frame image or video stream.
  • the image length is the length of the image data itself.
  • the image width that is, the width of the image data itself.
  • image color space image data color space description, such as RGGB, RGBW, RYYB, etc.
  • bitwidth the number of bits per component of the image.
  • the storage mode refers to the arrangement mode of each pixel of each component in the image color space in a storage space (such as memory, flash memory, or hard disk, etc.).
  • semantic information is used to interpret the image.
  • the semantic information includes at least one of the following: depth information, scene classification information, instance segmentation information, and object detection information.
  • the scene classification information is used to characterize the scene represented by the image.
  • the instance segmentation information is used to characterize the segmentation information of the instance in the image.
  • the depth information includes at least one of the following:
  • Depth image the pixel value in the depth image is used to represent the distance between the pixel point in the image and the camera used to image the image;
  • the image contains indication information of an infinite part, where the infinite part is a distance beyond the farthest distance that the device can detect.
  • the semantic information includes depth information
  • the step S1 includes: obtaining the depth information based on the image information through a laser ranging radar and/or a depth information analysis network.
  • the depth information parsing network is used to parse image information to generate depth information.
  • Step S1 includes: extracting image scene features of the image based on the image information; determining or generating the scene classification information of the image according to the image scene features.
  • determining or generating the scene classification information of the image according to the scene characteristics of the image includes: inputting the scene characteristics of the image into the scene classification model, and obtaining the probability that the image output by the scene classification model corresponds to at least one scene; Among the probabilities of at least one scene, it is determined that the scene corresponding to the maximum probability is the scene classification information of the image.
  • the scene classification model is used to determine the probability that the image corresponds to at least one scene.
  • step S2 includes: based on a preset format, filling semantic information in a reserved field of the image data stream.
  • the preset format is any combination of at least one semantic information.
  • the preset formats corresponding to different types of semantic information are different.
  • the preset format includes at least one of table, single-channel bitmap, matrix and key-value pair.
  • the preset format further includes an information header, which is used to indicate whether the image data stream contains semantic information, and/or, the information header is used to indicate the type of semantic information contained in the image data stream.
  • the method further includes: determining the identification information corresponding to the semantic information according to the preset corresponding relationship.
  • the semantic information includes at least one of scene classification information, instance segmentation information, and target detection information;
  • the identification information includes at least one of scene ID, instance ID, and target ID, and the preset correspondence includes scene name and At least one of the corresponding relationship between scene IDs, the corresponding relationship between instance names and instance IDs, and the corresponding relationship between target names and target IDs.
  • the present application also provides an image data processing method, comprising the following steps:
  • S20 Perform preset processing according to the semantic information.
  • semantic information is used to interpret the image.
  • the semantic information includes at least one of the following:
  • Depth image the pixel value in the depth image is used to represent the distance between the pixel point in the image and the camera used to image the image;
  • the image contains an indication of an infinity part, which is a distance beyond the furthest distance that the device can detect;
  • the scene classification information is used to characterize the scene represented by the image.
  • the instance segmentation information is used to represent the instance segmentation information in the image.
  • step S10 includes: reading semantic information in a reserved field of the image data stream based on a preset format.
  • the preset format is any combination of at least one semantic information.
  • the preset formats corresponding to different types of semantic information are different.
  • the preset format includes at least one of table, single-channel bitmap, matrix and key-value pair.
  • the preset format further includes an information header, which is used to indicate whether the image data stream contains semantic information, and/or, the information header is used to indicate the type of semantic information contained in the image data stream.
  • step S20 includes at least one of the following:
  • Semantic information includes scene classification information, according to the scene classification information, adjust the target parameters of the camera in the corresponding scene;
  • Semantic information includes target detection information, according to the target detection information, adjust the target of the camera's automatic focus;
  • Semantic information includes instance segmentation information, according to the instance segmentation information, the target instance in the image is obtained; according to the target instance, the image is preset.
  • the preset processing includes at least one of instance blurring, instance deformation, instance color retention, performing lut mapping on instances, and performing lut mapping on backgrounds.
  • the target parameters include at least one of automatic exposure parameters, display lookup tables, automatic focus parameters and white balance parameters.
  • the present application also provides an image data processing device, including:
  • a processing module configured to determine or generate semantic information of the image based on the image information
  • the saving module is used for saving semantic information in the image data stream based on a preset format.
  • the image information includes basic image information and image data.
  • the image data is the image itself.
  • the basic image information can also be called the basic description information of the image, which can include image description information identification, basic description information length, image type identification, image length, image width, image color space, bit width and storage method, etc.
  • image description information identification can include image description information identification, basic description information length, image type identification, image length, image width, image color space, bit width and storage method, etc.
  • Image description information identifier used to identify the "basic description information" field of the image
  • Basic description information length indicating the total length of the basic description information field, including the image description information identifier
  • Image type identification which is used to identify whether the image data type is a single-frame image, multi-frame image or video stream
  • Image length that is, the length of the image data itself
  • Image width that is, the width of the image data itself
  • Image color space image data color space description, such as RGGB, RGBW, RYYB, etc.;
  • Bit width the number of bits per component of the image
  • Storage method the arrangement method of each pixel of each component in the image color space in the storage space (such as memory, flash memory, or hard disk, etc.).
  • semantic information is used to interpret the image.
  • the semantic information includes at least one of the following: depth information, scene classification information, instance segmentation information, and object detection information.
  • the scene classification information is used to characterize the scene represented by the image.
  • the instance segmentation information is used to represent the instance segmentation information in the image.
  • the depth information includes at least one of the following:
  • Depth image the pixel value in the depth image is used to represent the distance between the pixel point in the image and the camera used when imaging the image;
  • the semantic information includes depth information
  • the processing module is specifically configured to: obtain the depth information through a laser ranging radar and/or a depth information analysis network based on image information.
  • the depth information parsing network is used to parse image information to generate depth information.
  • the semantic information includes scene classification information
  • the processing module is further configured to: extract the image scene features of the image based on the image information; determine or generate the scene classification information of the image according to the image scene features.
  • the processing module is also used to: input the image scene features into the scene classification model to obtain the probability that the image output by the scene classification model corresponds to at least one scene; among the probability that the image corresponds to at least one scene, determine the maximum probability The corresponding scene is scene classification information of the image.
  • the scene classification model is used to determine the probability that the image corresponds to at least one scene.
  • the saving module is specifically configured to: fill semantic information in a reserved field of the image data stream based on a preset format.
  • the preset format is any combination of at least one semantic information.
  • the preset formats corresponding to different types of semantic information are different.
  • the preset format includes at least one of table, single-channel bitmap, matrix and key-value pair.
  • the preset format further includes an information header, which is used to indicate whether the image data stream contains semantic information, and/or, the information header is used to indicate the type of semantic information contained in the image data stream.
  • the saving module is further configured to: based on a preset format, before saving the semantic information in the image data stream, determine the identification information corresponding to the semantic information according to a preset correspondence.
  • the semantic information includes at least one of scene classification information, instance segmentation information, and target detection information;
  • the identification information includes at least one of scene ID, instance ID, and target ID, and the preset correspondence includes scene name and At least one of the corresponding relationship between scene IDs, the corresponding relationship between instance names and instance IDs, and the corresponding relationship between target names and target IDs.
  • the present application also provides an image data processing device, including:
  • An acquisition module configured to acquire semantic information from image data streams based on a preset format
  • the processing module is used for performing preset processing according to the semantic information.
  • semantic information is used to interpret the image.
  • the semantic information includes at least one of the following:
  • Depth image the pixel value in the depth image is used to represent the distance between the pixel point in the image and the camera used to image the image;
  • the image contains an indication of an infinity part, which is a distance beyond the furthest distance that the device can detect;
  • the scene classification information is used to characterize the scene represented by the image.
  • the instance segmentation information is used to represent the instance segmentation information in the image.
  • the obtaining module is specifically configured to: read semantic information in a reserved field of the image data stream based on a preset format.
  • the preset format is any combination of at least one semantic information.
  • the preset formats corresponding to different types of semantic information are different.
  • the preset format includes at least one of table, single-channel bitmap, matrix and key-value pair.
  • the preset format further includes an information header, which is used to indicate whether the image data stream contains semantic information, and/or, the information header is used to indicate the type of semantic information contained in the image data stream.
  • processing module is specifically used for at least one of the following:
  • Semantic information includes scene classification information, according to the scene classification information, adjust the target parameters of the camera in the corresponding scene;
  • Semantic information includes target detection information, according to the target detection information, adjust the target of the camera's automatic focus;
  • Semantic information includes instance segmentation information, according to the instance segmentation information, the target instance in the image is obtained; according to the target instance, the image is preset.
  • the preset processing includes at least one of instance blurring, instance deformation, instance color retention, performing lut mapping on instances, and performing lut mapping on backgrounds.
  • the target parameters include at least one of automatic exposure parameters, display lookup tables, automatic focus parameters and white balance parameters.
  • the present application also provides an intelligent terminal, including: a memory and a processor, wherein an image data processing program is stored in the memory, and when the image data processing program is executed by the processor, the steps of any one of the above image data processing methods are implemented.
  • the present application also provides a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, the steps of any one of the above-mentioned image data processing methods are realized.
  • the present application also provides a computer program product, the computer program product includes a computer program; when the computer program is executed, the steps of any one of the image data processing methods above are realized.
  • the image data processing method of the present application determines or generates the semantic information of the image based on the image information, and the semantic information is used to interpret the image; based on the preset format, the semantic information is stored in the data stream of the image.
  • the semantic information for interpreting the image is stored in the data stream of the image, so as to uniformly standardize the semantic information of computational photography in the data stream involved in computational photography.
  • FIG. 1 is a schematic diagram of a hardware structure of an intelligent terminal implementing various embodiments of the present application
  • FIG. 2 is a system architecture diagram of a communication network provided by an embodiment of the present application.
  • Fig. 3 is a schematic flowchart of an image data processing method according to a first embodiment
  • FIG. 4 is an example diagram of a depth image shown in an embodiment of the present application.
  • FIG. 5 is an example diagram of instance segmentation information shown in an embodiment of the present application.
  • Fig. 6 is a schematic flowchart of an image data processing method according to a second embodiment
  • Fig. 7 is a schematic structural diagram of an image data processing device according to a third embodiment
  • Fig. 8 is a schematic structural diagram of an image data processing device according to a fourth embodiment
  • Fig. 9 is a schematic structural diagram of a smart terminal according to a fifth embodiment.
  • first, second, third, etc. may be used herein to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of this document, first information may also be called second information, and similarly, second information may also be called first information.
  • first information may also be called second information, and similarly, second information may also be called first information.
  • second information may also be called first information.
  • the word “if” as used herein may be interpreted as “at” or “when” or “in response to a determination”.
  • the singular forms "a”, “an” and “the” are intended to include the plural forms as well, unless the context indicates otherwise.
  • A, B, C means “any of the following: A; B; C; A and B; A and C; B and C; A and B and C
  • A, B or C or "A, B and/or C” means "any of the following: A; B; C; A and B; A and C; B and C; A and B and C”. Exceptions to this definition will only arise when combinations of elements, functions, steps or operations are inherently mutually exclusive in some way.
  • the words “if”, “if” as used herein may be interpreted as “at” or “when” or “in response to determining” or “in response to detecting”.
  • the phrases “if determined” or “if detected (the stated condition or event)” could be interpreted as “when determined” or “in response to the determination” or “when detected (the stated condition or event) )” or “in response to detection of (a stated condition or event)”.
  • step codes such as S1, S2, S10, and S20 are used for the purpose of expressing the corresponding content more clearly and concisely, and do not constitute a substantive limitation on the order.
  • S2 may be executed first and then S1, or S20 may be executed first and then S10, etc., but these should be within the protection scope of the present application.
  • Smart terminals can be implemented in various forms.
  • the smart terminals described in this application may include mobile phones, tablet computers, notebook computers, palmtop computers, personal digital assistants (Personal Digital Assistant, PDA), portable media players (Portable Media Player, PMP), navigation devices, Smart terminals such as wearable devices, smart bracelets, and pedometers, as well as fixed terminals such as digital TVs and desktop computers.
  • PDA Personal Digital Assistant
  • PMP portable media players
  • navigation devices Smart terminals such as wearable devices, smart bracelets, and pedometers
  • Smart terminals such as wearable devices, smart bracelets, and pedometers
  • fixed terminals such as digital TVs and desktop computers.
  • a smart terminal will be taken as an example, and those skilled in the art will understand that, in addition to elements specially used for mobile purposes, the configurations according to the embodiments of the present application can also be applied to fixed-type terminals.
  • FIG. 1 is a schematic diagram of the hardware structure of a smart terminal implementing various embodiments of the present application.
  • the smart terminal 100 may include: an RF (Radio Frequency, radio frequency) unit 101, a WiFi module 102, an audio output unit 103, an /V (audio/video) input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, memory 109, processor 110, and power supply 111 and other components.
  • RF Radio Frequency, radio frequency
  • the radio frequency unit 101 can be used for sending and receiving information or receiving and sending signals during a call. Specifically, after receiving the downlink information of the base station, it is processed by the processor 110; in addition, the uplink data is sent to the base station.
  • the radio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like.
  • the radio frequency unit 101 can also communicate with the network and other devices through wireless communication.
  • the above wireless communication can use any communication standard or protocol, including but not limited to GSM (Global System of Mobile communication, Global System for Mobile Communications), GPRS (General Packet Radio Service, General Packet Radio Service), CDMA2000 (Code Division Multiple Access 2000 , Code Division Multiple Access 2000), WCDMA (Wideband Code Division Multiple Access, Wideband Code Division Multiple Access), TD-SCDMA (Time Division-Synchronous Code Division Multiple Access, Time Division Synchronous Code Division Multiple Access), FDD-LTE (Frequency Division Duplexing-Long Term Evolution, frequency division duplex long-term evolution), TDD-LTE (Time Division Duplexing-Long Term Evolution, time-division duplex long-term evolution) and 5G, etc.
  • GSM Global System of Mobile communication, Global System for Mobile Communications
  • GPRS General Packet Radio Service
  • CDMA2000 Code Division Multiple Access 2000
  • WCDMA Wideband Code Division Multiple Access
  • TD-SCDMA Time Division-Synchronous Code Division Multiple Access, Time Division Synchro
  • WiFi is a short-distance wireless transmission technology.
  • the smart terminal can help users send and receive emails, browse web pages, and access streaming media, etc., and it provides users with wireless broadband Internet access.
  • Fig. 1 shows the WiFi module 102, it can be understood that it is not an essential component of the smart terminal, and can be completely omitted as required without changing the essence of the invention.
  • the audio output unit 103 can store the information received by the radio frequency unit 101 or the WiFi module 102 or stored in the memory 109 when the smart terminal 100 is in a call signal receiving mode, a call mode, a recording mode, a voice recognition mode, a broadcast receiving mode, or the like.
  • the audio data is converted into an audio signal and output as sound.
  • the audio output unit 103 can also provide audio output related to specific functions performed by the smart terminal 100 (optionally, call signal receiving sound, message receiving sound, etc.).
  • the audio output unit 103 may include a speaker, a buzzer, and the like.
  • the A/V input unit 104 is used to receive audio or video signals.
  • the A/V input unit 104 may include a graphics processor (Graphics Processing Unit, GPU) 1041 and a microphone 1042, and the graphics processor 1041 is used for still images obtained by an image capture device (such as a camera) or The image data of the video is processed.
  • the processed image can be displayed on the display unit 106 .
  • the image processed by the graphics processor 1041 may be stored in the memory 109 (or other storage medium) or sent via the radio frequency unit 101 or the WiFi module 102 .
  • the microphone 1042 can receive sound (audio data) via the microphone 1042 in a phone call mode, a recording mode, a voice recognition mode, and the like operating modes, and can process such sound as audio data.
  • the processed audio (voice) data can be converted into a format transmittable to a mobile communication base station via the radio frequency unit 101 for output in case of a phone call mode.
  • the microphone 1042 may implement various types of noise cancellation (or suppression) algorithms to cancel (or suppress) noise or interference generated in the process of receiving and transmitting audio signals.
  • the smart terminal 100 also includes at least one sensor 105, such as a light sensor, a motion sensor, and other sensors.
  • the light sensor includes an ambient light sensor and a proximity sensor.
  • the ambient light sensor can adjust the brightness of the display panel 1061 according to the brightness of the ambient light, and the proximity sensor can turn off the display when the smart terminal 100 moves to the ear. panel 1061 and/or backlight.
  • the accelerometer sensor can detect the magnitude of acceleration in various directions (generally three axes), and can detect the magnitude and direction of gravity when it is stationary, and can be used for applications that recognize the posture of mobile phones (such as horizontal and vertical screen switching, related Games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tap), etc.; as for mobile phones, fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, Other sensors such as thermometers and infrared sensors will not be described in detail here.
  • the display unit 106 is used to display information input by the user or information provided to the user.
  • the display unit 106 may include a display panel 1061, and the display panel 1061 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an organic light-emitting diode (Organic Light-Emitting Diode, OLED), or the like.
  • LCD Liquid Crystal Display
  • OLED Organic Light-Emitting Diode
  • the user input unit 107 can be used to receive input numbers or character information, and generate key signal input related to user settings and function control of the smart terminal.
  • the user input unit 107 may include a touch panel 1071 and other input devices 1072 .
  • the touch panel 1071 also referred to as a touch screen, can collect touch operations of the user on or near it (for example, the user uses any suitable object or accessory such as a finger or a stylus on the touch panel 1071 or near the touch panel 1071). operation), and drive the corresponding connection device according to the preset program.
  • the touch panel 1071 may include two parts, a touch detection device and a touch controller.
  • the touch detection device detects the user's touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into contact coordinates , and then sent to the processor 110, and can receive the command sent by the processor 110 and execute it.
  • the touch panel 1071 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave.
  • the user input unit 107 may also include other input devices 1072 .
  • other input devices 1072 may include, but are not limited to, one or more of physical keyboards, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, etc., which are not specifically described here. limited.
  • the touch panel 1071 may cover the display panel 1061.
  • the touch panel 1071 detects a touch operation on or near it, it transmits to the processor 110 to determine the type of the touch event, and then the processor 110 determines the touch event according to the touch event.
  • the corresponding visual output is provided on the display panel 1061 .
  • the touch panel 1071 and the display panel 1061 are used as two independent components to realize the input and output functions of the smart terminal, in some embodiments, the touch panel 1071 and the display panel 1061 can be integrated.
  • the implementation of the input and output functions of the smart terminal is not specifically limited here.
  • the interface unit 108 is used as an interface through which at least one external device can be connected with the smart terminal 100 .
  • the external device may include a wired or wireless headset port, an external power (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device with an identification module, an audio input /Output (I/O) ports, video I/O ports, headphone ports, and more.
  • the interface unit 108 may be used to receive input from an external device (optionally, data information, power, etc.) and transmit the received input to one or more components within the smart terminal 100 or may be used to transfer data to and from external devices.
  • the memory 109 can be used to store software programs as well as various data.
  • the memory 109 can mainly include a storage program area and a storage data area.
  • the storage program area can store an operating system, at least one function required application program (such as a sound playback function, an image playback function, etc.) etc.
  • the storage data area can be Store data (such as audio data, phone book, etc.) created according to the use of the mobile phone.
  • the memory 109 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage devices.
  • the processor 110 is the control center of the smart terminal, and uses various interfaces and lines to connect various parts of the whole smart terminal, by running or executing software programs and/or modules stored in the memory 109, and calling data stored in the memory 109 , execute various functions of the smart terminal and process data, so as to monitor the smart terminal as a whole.
  • the processor 110 may include one or more processing units; preferably, the processor 110 may integrate an application processor and a modem processor.
  • the application processor mainly processes operating systems, user interfaces, and application programs, etc.
  • the demodulation processor mainly handles wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 110 .
  • the smart terminal 100 can also include a power supply 111 (such as a battery) for supplying power to various components.
  • a power supply 111 (such as a battery) for supplying power to various components.
  • the power supply 111 can be logically connected to the processor 110 through a power management system, so as to manage charging, discharging, and power consumption through the power management system. and other functions.
  • the smart terminal 100 may also include a Bluetooth module, etc., which will not be repeated here.
  • the following describes the communication network system on which the smart terminal of the present application is based.
  • FIG. 2 is a structure diagram of a communication network system provided by an embodiment of the present application.
  • the communication network system is an LTE system of general mobile communication technology.
  • 201 E-UTRAN (Evolved UMTS Terrestrial Radio Access Network, Evolved UMTS Terrestrial Radio Access Network) 202, EPC (Evolved Packet Core, Evolved Packet Core Network) 203 and the operator's IP service 204.
  • E-UTRAN Evolved UMTS Terrestrial Radio Access Network
  • EPC Evolved Packet Core, Evolved Packet Core Network
  • the UE 201 may be the above-mentioned terminal 100, which will not be repeated here.
  • E-UTRAN 202 includes eNodeB 2021 and other eNodeB 2022 and so on.
  • the eNodeB 2021 can be connected to other eNodeB 2022 through a backhaul (for example, X2 interface), the eNodeB 2021 is connected to the EPC 203 , and the eNodeB 2021 can provide access from the UE 201 to the EPC 203 .
  • a backhaul for example, X2 interface
  • EPC203 may include MME (Mobility Management Entity, Mobility Management Entity) 2031, HSS (Home Subscriber Server, Home Subscriber Server) 2032, other MME2033, SGW (Serving Gate Way, Serving Gateway) 2034, PGW (PDN Gate Way, packet data Network Gateway) 2035 and PCRF (Policy and Charging Rules Function, Policy and Charging Functional Entity) 2036, etc.
  • MME2031 is a control node that handles signaling between UE201 and EPC203, and provides bearer and connection management.
  • HSS2032 is used to provide some registers to manage functions such as home location register (not shown in the figure), and save some user-specific information about service features and data rates.
  • PCRF2036 is the policy and charging control policy decision point of service data flow and IP bearer resources, it is the policy and charging execution function A unit (not shown) selects and provides available policy and charging control decisions.
  • the IP service 204 may include Internet, Intranet, IMS (IP Multimedia Subsystem, IP Multimedia Subsystem) or other IP services.
  • IMS IP Multimedia Subsystem, IP Multimedia Subsystem
  • LTE system is used as an example above, those skilled in the art should know that this application is not only applicable to the LTE system, but also applicable to other wireless communication systems, such as GSM, CDMA2000, WCDMA, TD-SCDMA and future new wireless communication systems.
  • the network system (such as 5G), etc., is not limited here.
  • the present application provides an image data processing method, an intelligent terminal and a storage medium.
  • the semantic information for interpreting images is stored based on a preset format, so that in the data flow involved in computational photography Unified specification of semantic information for computational photography.
  • Fig. 3 is a schematic flowchart of an image data processing method according to the first embodiment.
  • An embodiment of the present application provides an image data processing method, which is optionally applied to a smart terminal such as the aforementioned smart terminal. As shown in Figure 3, the image data processing method includes the following steps:
  • S1 Based on the image information, determine or generate the semantic information of the image.
  • the image information includes basic image information and image data.
  • the image data is the image itself.
  • the basic image information can also be called the basic description information of the image, which can include image description information identification, basic description information length, image type identification, image length, image width, image color space, bit width and storage method, etc.
  • image description information identification can include image description information identification, basic description information length, image type identification, image length, image width, image color space, bit width and storage method, etc.
  • Image description information identifier used to identify the "basic description information" field of the image
  • Basic description information length indicating the total length of the basic description information field, including the image description information identifier
  • Image type identification which is used to identify whether the image data type is a single-frame image, multi-frame image or video stream
  • Image length that is, the length of the image data itself
  • Image width that is, the width of the image data itself
  • Image color space image data color space description, such as RGGB (Bayer filter, Bayer filter, also known as RGBG or GRGB), RGBW (adding white sub-pixels (W) to the original RGB three primary colors), RYYB (with two A yellow sub-pixel (Y) replaces two green sub-pixels (G)), etc.;
  • Bit width the number of bits (bits) for each component of the image
  • Storage method the arrangement method of each pixel of each component in the image color space in the storage space (such as memory, flash memory, or hard disk, etc.).
  • semantic information is used to interpret the image.
  • common semantic information includes image depth information, scene classification information, instance segmentation information, and object detection information.
  • semantic information in this embodiment of the present application may include at least one of the following: depth information, scene classification information, instance segmentation information, and object detection information, but not limited thereto.
  • the depth information may include at least one of the following:
  • Depth image the pixel value in the depth image is used to represent the distance between the pixel point in the image and the camera used to image the image;
  • the image contains indication information of an infinite part, where the infinite part is a distance beyond the farthest distance that the device can detect. It can be understood that the information within a certain distance range that the device can detect includes the shortest distance and the furthest distance. If the furthest distance exceeds this distance, it can be represented by the maximum value corresponding to infinity; while using the next largest value, To indicate the farthest distance that can be detected currently.
  • the distance between the maximum value and the minimum value will be equally divided into 256 parts, and all pixels will be quantized into 256 parts.
  • a depth image with the same resolution as the original image can be generated, as shown in Figure 4, which will be attached to the computational photography data stream as another channel of the image. It should be noted that with the development of equipment performance, 2 to the 8th power of 256 can also be expanded to a range of 512 or higher. At this time, the distance accuracy that can be provided will be greatly improved.
  • Table 1 is an expression method of depth information of an image with sky:
  • the scene classification information is used to characterize the scene represented by the image. It can be understood that an image can be divided into multiple scenes in most cases. For example, a cake image in a birthday party scene can be expressed as a party scene or a food scene. Therefore, when expressing the scene, the probability of the five scenes that the image most likely belongs to will be listed, such as:
  • ⁇ 1:0.5 ⁇ means that the probability that the image belongs to the scene whose scene ID is "1" is 0.5; ⁇ 22:0.2 ⁇ means that the probability that the image belongs to the scene whose scene ID is "22" is 0.2; ⁇ 25:0.15 ⁇ means that the probability that the image belongs to the scene whose scene ID is "25” is 0.15; ⁇ 45:0.1 ⁇ means that the probability that the image belongs to the scene whose scene ID is "45” is 0.1; ⁇ 55:0.05 ⁇ means that the image belongs to the scene whose ID is " The probability for the 55" scenario is 0.05.
  • the scene classification information needs a dictionary, which is used to resolve the number expressed in the form of numbers to which scene it belongs to.
  • a dictionary which is used to resolve the number expressed in the form of numbers to which scene it belongs to.
  • examples of different scenarios are highlighted by Table 2.
  • the first and third columns in Table 2 are scene IDs, expressed as numbers in digital form, which scene they belong to; the second and fourth columns are instances belonging to this scene.
  • the instance segmentation information is used to characterize the segmentation information of the instance in the image.
  • a matrix consistent with the resolution of the image such as 800*600, which can be expressed as a channel of the image
  • the instance segmentation information in the image can be used to express the instance segmentation information in the image.
  • Figure 5 where 0 is the background, and 2/15/35 are the instance IDs, that is, the IDs of the name information corresponding to the instances.
  • the target detection information of the image it can be understood that for the target that can be detected in the image, the form of ⁇ target ID: target coordinate ⁇ can be used to save the target list.
  • a dictionary is required to resolve.
  • the dictionary may use the dictionary used in the above example segmentation.
  • Combination plan depth information scene classification information Instance segmentation information target detection information
  • Combination Example 1 yes yes yes no Combination example 2 yes yes no no no Combination Example 3 yes no yes yes Combination Example 4 no yes yes yes
  • the semantic information includes depth information, scene classification information and instance segmentation information.
  • the semantic information includes depth information and scene classification information.
  • the semantic information includes depth information, instance segmentation information and object detection information.
  • the semantic information includes scene classification information, instance segmentation information and object detection information.
  • the semantic information includes depth information, scene classification information, instance segmentation information and object detection information.
  • the semantic information in the data stream may include instance segmentation information, so that the instance segmentation information is used to obtain specific instances in the image, so that a single shot can be used to achieve the effect of blurring instances; or, the semantic information in the data stream Target detection information can be included, so that the camera can adjust the auto-focus object in a targeted manner, so that the focus can be focused on the target to be photographed, and the imaging quality can be improved; or, the semantic information in the data stream can include scene classification information, to Adjust the 3A parameters for this scene.
  • the 3A parameters are auto focus (AF), auto exposure (AE) and auto white balance (AWB).
  • 3A digital imaging technology utilizes auto-focus algorithm, auto-exposure algorithm and auto-white balance algorithm to maximize the image contrast, improve the over-exposure or under-exposure of the main subject, and compensate the chromatic aberration of the picture under different light irradiation, so as to present a brighter image. High-quality image information.
  • the camera adopting 3A digital imaging technology can well guarantee the accurate color reproduction of the image, presenting a perfect day and night monitoring effect.
  • the semantic information can be not only the above four items, but also more other semantic information, so the information header needs to leave a sufficient length.
  • step S1 may include: based on the image information, obtaining the depth information through a laser ranging radar and/or a depth information analysis network.
  • the depth information parsing network is used to parse image information to generate depth information.
  • step S1 may include: extracting image scene features of the image based on the image information; determining or generating scene classification information of the image according to the image scene features.
  • determining or generating the scene classification information of the image according to the scene characteristics of the image includes: inputting the scene characteristics of the image into the scene classification model, and obtaining the probability that the image output by the scene classification model corresponds to at least one scene; Among the probabilities of at least one scene, it is determined that the scene corresponding to the maximum probability is the scene classification information of the image.
  • the scene classification model is used to determine the probability that the image corresponds to at least one scene.
  • S2 Preserve semantic information in the image data stream based on a preset format.
  • this step includes: based on a preset format, filling semantic information in a reserved field of the image data stream.
  • the preset format is any combination of at least one semantic information.
  • the preset formats corresponding to different types of semantic information are different.
  • the preset format includes at least one of table, single-channel bitmap, matrix and key-value pair.
  • the table is shown in Table 1, and the matrix is shown in FIG. 5 .
  • the preset format further includes an information header, which is used to indicate whether the image data stream contains semantic information, and/or, the information header is used to indicate the type of semantic information contained in the image data stream.
  • the information header can be expressed as: Frame semantic info include:0 0 0 1, that is, only the fourth item is included; the information header can also be expressed as: Frame semantic info include:0 0 0 0.
  • the information header specifically corresponds to a lookup table containing semantic information, such as:
  • the field of the information header is a variable-length field.
  • the information header indicates that a certain semantic information exists, that is, when it is expressed in the form of 0 0 0 1, this field reserves enough length for the corresponding semantic information to be used for Express this semantic information.
  • the image data processing method may further include: determining identification information corresponding to the semantic information according to a preset correspondence relationship.
  • the semantic information includes at least one of scene classification information, instance segmentation information, and target detection information;
  • the identification information includes at least one of scene ID, instance ID, and target ID, and the preset correspondence includes scene name and At least one of the corresponding relationship between scene IDs, the corresponding relationship between instance names and instance IDs, and the corresponding relationship between target names and target IDs.
  • the preset correspondence relationship may be specifically presented in the form of a dictionary, but this embodiment of the present application is not limited thereto, and may be set accordingly according to actual needs.
  • the image data processing method of the embodiment of the present application determines or generates semantic information of the image based on image information, and the semantic information is used to interpret the image; based on a preset format, the semantic information is stored in the data stream of the image.
  • the semantic information for interpreting the image is stored in the data stream of the image, so as to uniformly standardize the semantic information of computational photography in the data stream involved in computational photography.
  • the image data processing method may further include: acquiring semantic information from the image data stream based on a preset format; and performing preset processing according to the semantic information.
  • acquiring semantic information from the image data stream based on a preset format may further include: acquiring semantic information from the image data stream based on a preset format; and performing preset processing according to the semantic information.
  • Fig. 6 is a schematic flowchart of an image data processing method according to a second embodiment.
  • An embodiment of the present application provides an image data processing method, which is applied to computational photography of a smart terminal such as the aforementioned smart terminal. As shown in Figure 6, the image data processing method includes the following steps:
  • S10 Obtain semantic information from image data streams based on a preset format.
  • semantic information is used to interpret the image.
  • common semantic information includes image depth information, scene classification information, instance segmentation information, and object detection information.
  • semantic information in this embodiment of the present application may include at least one of the following: depth information, scene classification information, instance segmentation information, and object detection information, but not limited thereto.
  • the depth information may include at least one of the following:
  • Depth image the pixel value in the depth image is used to represent the distance between the pixel point in the image and the camera used to image the image;
  • the image contains indication information of an infinite part, where the infinite part is a distance beyond the farthest distance that the device can detect. It can be understood that the information within a certain distance range that the device can detect includes the shortest distance and the furthest distance. If the furthest distance exceeds this distance, it can be represented by the maximum value corresponding to infinity; while using the next largest value, To indicate the farthest distance that can be detected currently.
  • the distance between the maximum value and the minimum value will be equally divided into 256 parts, and all pixels will be quantized into 256 parts.
  • a depth image with the same resolution as the original image can be generated, as shown in Figure 4, which will be attached to the computational photography data stream as another channel of the image. It should be noted that with the development of equipment performance, 2 to the 8th power of 256 can also be expanded to a range of 512 or higher. At this time, the distance accuracy that can be provided will be greatly improved.
  • Table 1 shows a method for expressing depth information of an image with sky.
  • the scene classification information is used to characterize the scene represented by the image. It can be understood that an image can be divided into multiple scenes in most cases. For example, a cake image in a birthday party scene can be expressed as a party scene or a food scene. Therefore, when expressing the scene, the probability of the five scenes that the image most likely belongs to will be listed, such as:
  • ⁇ 1:0.5 ⁇ means that the probability that the image belongs to the scene whose scene ID is "1" is 0.5; ⁇ 22:0.2 ⁇ means that the probability that the image belongs to the scene whose scene ID is "22" is 0.2; ⁇ 25:0.15 ⁇ means that the probability that the image belongs to the scene whose scene ID is "25” is 0.15; ⁇ 45:0.1 ⁇ means that the probability that the image belongs to the scene whose scene ID is "45” is 0.1; ⁇ 55:0.05 ⁇ means that the image belongs to the scene whose ID is " The probability for the 55" scenario is 0.05.
  • the scene classification information requires a dictionary for parsing the number expressed in digital form to which scene it belongs to, as shown in Table 2 above.
  • the first column and the third column in Table 2 are scene IDs, expressed as numbers in digital form, which scene they belong to; the second column and fourth column are instances belonging to the scene.
  • the instance segmentation information is used to characterize the segmentation information of the instance in the image.
  • a matrix consistent with the resolution of the image such as 800*600, which can be expressed as a channel of the image
  • the instance segmentation information in the image can be used to express the instance segmentation information in the image.
  • Figure 5 where 0 is the background, and 2/15/35 are the instance IDs, that is, the IDs of the name information corresponding to the instances.
  • the target detection information of the image it can be understood that for the target that can be detected in the image, the form of ⁇ target ID: target coordinate ⁇ can be used to save the target list.
  • a dictionary is required to resolve.
  • the dictionary may use the dictionary used in the above example segmentation.
  • the semantic information includes depth information, scene classification information and instance segmentation information.
  • the semantic information includes depth information and scene classification information.
  • the semantic information includes depth information, instance segmentation information and object detection information.
  • the semantic information includes scene classification information, instance segmentation information and object detection information.
  • the semantic information includes depth information, scene classification information, instance segmentation information and object detection information.
  • the semantic information can be not only the above four items, but also more other semantic information, so the information header needs to leave a sufficient length.
  • the preset format is any combination of at least one semantic information.
  • the preset formats corresponding to different types of semantic information are different.
  • the preset format includes at least one of table, single-channel bitmap, matrix and key-value pair.
  • the table is shown in Table 1, and the matrix is shown in FIG. 5 .
  • the preset format further includes an information header, which is used to indicate whether the image data stream contains semantic information, and/or, the information header is used to indicate the type of semantic information contained in the image data stream.
  • the information header can be expressed as: Frame semantic info include:0 0 0 1, that is, only the fourth item is included; the information header can also be expressed as: Frame semantic info include:0 0 0 0.
  • the information header specifically corresponds to a lookup table containing semantic information, such as:
  • the field of the information header is a variable-length field.
  • the information header indicates that a certain semantic information exists, that is, when it is expressed in the form of 0 0 0 1, this field reserves enough length for the corresponding semantic information to be used for Express this semantic information.
  • the step S10 includes: reading semantic information in a reserved field of the image data stream based on a preset format.
  • S20 Perform preset processing according to the semantic information.
  • step S20 may include: adjusting the target parameters of the camera in the corresponding scene according to the scene classification information.
  • the target parameters include at least one of 3A parameters, a display lookup table, or other parameters related to imaging quality.
  • the 3A parameters are auto focus (AF) parameters, auto exposure (AE) parameters and auto white balance (AWB) parameters.
  • 3A digital imaging technology utilizes auto-focus algorithm, auto-exposure algorithm and auto-white balance algorithm to maximize the image contrast, improve the over-exposure or under-exposure of the main subject, and compensate the chromatic aberration of the picture under different light irradiation, so as to present a brighter image. High-quality image information.
  • the camera adopting 3A digital imaging technology can well guarantee the accurate color reproduction of the image, presenting a perfect day and night monitoring effect.
  • the semantic information includes instance segmentation information.
  • step S20 may include: obtaining the target instance in the image according to the instance segmentation information; and performing preset processing on the image according to the target instance.
  • the preset processing may include at least one of processing such as instance blurring, instance deformation, instance color retention, mapping processing for the instance, and mapping processing for the background.
  • the semantic information includes instance segmentation information
  • the instance segmentation information can be used to obtain specific instances in the image, so as to achieve instance blurring, instance deformation, instance color retention, mapping processing for instances, and mapping processing for backgrounds, etc. after effect.
  • step S20 may include: adjusting the camera's auto-focus target according to the target detection information, so that the camera can specifically adjust the auto-focus target, so that the focus is on the target to be photographed, and the imaging quality is improved.
  • semantic information is obtained from an image data stream based on a preset format, and the semantic information is used to interpret the image; preset processing is performed according to the semantic information.
  • the semantic information used to interpret the image is stored in the image data stream based on the preset format.
  • the semantic information of computational photography can be uniformly standardized in the data stream involved in computational photography; When computational photography performs related processing, for different applications of images, it only needs to obtain corresponding semantic information from the data stream, without repeating the same processing on images, thereby avoiding the waste of computing resources.
  • the image data processing method may further include: determining or generating the semantic information of the image based on the image information; storing the semantic information in the data stream of the image based on a preset format .
  • the image data processing method may further include: determining or generating the semantic information of the image based on the image information; storing the semantic information in the data stream of the image based on a preset format .
  • Fig. 7 is a schematic structural diagram of an image data processing device according to a third embodiment.
  • An embodiment of the present application provides an image data processing device.
  • the image data processing device 70 includes:
  • a processing module 71 configured to determine or generate semantic information of the image based on the image information
  • the saving module 72 is configured to save semantic information in the image data stream based on a preset format.
  • the image information includes basic image information and image data.
  • the image data is the image itself.
  • the basic image information can also be called the basic description information of the image, which can include image description information identification, basic description information length, image type identification, image length, image width, image color space, bit width and storage method, etc.
  • image description information identification can include image description information identification, basic description information length, image type identification, image length, image width, image color space, bit width and storage method, etc.
  • Image description information identifier used to identify the "basic description information" field of the image
  • Basic description information length indicating the total length of the basic description information field, including the image description information identifier
  • Image type identification which is used to identify whether the image data type is a single-frame image, multi-frame image or video stream
  • Image length that is, the length of the image data itself
  • Image width that is, the width of the image data itself
  • Image color space image data color space description, such as RGBGB (also called RGBG or GRGB), RGBW, RYYB, etc.;
  • Bit width the number of bits per component of the image
  • Storage method the arrangement method of each pixel of each component in the image color space in the storage space (such as memory, flash memory, or hard disk, etc.).
  • semantic information is used to interpret the image.
  • the semantic information includes at least one of the following: depth information, scene classification information, instance segmentation information, and object detection information.
  • the scene classification information is used to characterize the scene represented by the image.
  • the instance segmentation information is used to represent the instance segmentation information in the image.
  • the depth information includes at least one of the following:
  • Depth image the pixel value in the depth image is used to represent the distance between the pixel point in the image and the camera used to image the image;
  • An indication of whether the image contains an infinity portion which is a distance beyond the furthest the device can detect. It can be understood that the information within a certain distance range that the device can detect includes the shortest distance and the furthest distance. If the furthest distance exceeds this distance, it can be represented by the maximum value corresponding to infinity; while using the next largest value, To indicate the farthest distance that can be detected currently.
  • the semantic information includes depth information
  • the processing module 71 is specifically configured to: obtain the depth information based on the image information through a laser ranging radar and/or a depth information analysis network.
  • the depth information parsing network is used to parse image information to generate depth information.
  • the semantic information includes scene classification information
  • the processing module 71 is further configured to: extract the image scene features of the image based on the image information; determine or generate the scene classification information of the image according to the image scene features.
  • the processing module 71 is further configured to: input image scene features into the scene classification model to obtain the probability that the image output by the scene classification model corresponds to at least one scene; among the probability that the image corresponds to at least one scene, determine the maximum The scene corresponding to the probability is the scene classification information of the image.
  • the scene classification model is used to determine the probability that the image corresponds to at least one scene.
  • the saving module 72 is specifically configured to: fill semantic information in a reserved field of the image data stream based on a preset format.
  • the preset format is any combination of at least one semantic information.
  • the preset formats corresponding to different types of semantic information are different.
  • the preset format includes at least one of table, single-channel bitmap, matrix and key-value pair.
  • the preset format further includes an information header, which is used to indicate whether the image data stream contains semantic information, and/or, the information header is used to indicate the type of semantic information contained in the image data stream.
  • the saving module is further configured to: based on a preset format, before saving the semantic information in the image data stream, determine the identification information corresponding to the semantic information according to a preset correspondence.
  • the semantic information includes at least one of scene classification information, instance segmentation information, and target detection information;
  • the identification information includes at least one of scene ID, instance ID, and target ID, and the preset correspondence includes scene name and At least one of the corresponding relationship between scene IDs, the corresponding relationship between instance names and instance IDs, and the corresponding relationship between target names and target IDs.
  • the processing module 71 may also be configured to: acquire semantic information from the image data stream based on a preset format; perform preset processing according to the semantic information.
  • the processing module 71 may also be configured to: acquire semantic information from the image data stream based on a preset format; perform preset processing according to the semantic information.
  • Fig. 8 is a schematic structural diagram of an image data processing device according to a fourth embodiment.
  • An embodiment of the present application provides an image data processing device.
  • the image data processing device 80 includes:
  • An acquisition module 81 configured to acquire semantic information from the image data stream based on a preset format
  • the processing module 82 is configured to perform preset processing according to the semantic information.
  • semantic information is used to interpret the image.
  • the semantic information includes at least one of the following:
  • Depth image the pixel value in the depth image is used to represent the distance between the pixel point in the image and the camera used to image the image;
  • the image contains an indication of an infinity part, which is a distance beyond the furthest distance that the device can detect;
  • the scene classification information is used to characterize the scene represented by the image.
  • the instance segmentation information is used to represent the instance segmentation information in the image.
  • the obtaining module 81 is specifically configured to: read semantic information in a reserved field of the image data stream based on a preset format.
  • the preset format is any combination of at least one semantic information.
  • the preset formats corresponding to different types of semantic information are different.
  • the preset format includes at least one of table, single-channel bitmap, matrix and key-value pair.
  • the preset format further includes an information header, which is used to indicate whether the image data stream contains semantic information, and/or, the information header is used to indicate the type of semantic information contained in the image data stream.
  • processing module 82 is specifically used for at least one of the following:
  • Semantic information includes scene classification information, according to the scene classification information, adjust the target parameters of the camera in the corresponding scene;
  • Semantic information includes target detection information, according to the target detection information, adjust the target of the camera's automatic focus;
  • Semantic information includes instance segmentation information, according to the instance segmentation information, the target instance in the image is obtained; according to the target instance, the image is preset.
  • the preset processing includes at least one of instance blurring, instance deformation, instance color retention, performing lut mapping on instances, and performing lut mapping on backgrounds.
  • the target parameters include at least one of automatic exposure parameters, display lookup tables, automatic focus parameters and white balance parameters.
  • the processing module 82 is further configured to: determine or generate semantic information of the image based on the image information; and save the semantic information in the data stream of the image based on a preset format.
  • the processing module 82 is further configured to: determine or generate semantic information of the image based on the image information; and save the semantic information in the data stream of the image based on a preset format.
  • FIG. 9 is a schematic structural diagram of a smart terminal according to a fifth embodiment.
  • An embodiment of the present application provides an intelligent terminal.
  • an intelligent terminal 90 includes a memory 91 and a processor 92.
  • An image data processing program is stored in the memory 91.
  • the image data processing program is executed by the processor 92, any of the above-mentioned
  • the steps of the image data processing method in an embodiment have similar implementation principles and beneficial effects, and will not be repeated here.
  • the above-mentioned smart terminal 90 further includes a communication interface 93 , and the communication interface 93 may be connected to the processor 92 through the bus 94 .
  • the processor 92 can control the communication interface 93 to implement the receiving and sending functions of the smart terminal 90 .
  • the above-mentioned integrated modules implemented in the form of software function modules can be stored in a computer-readable storage medium.
  • the above-mentioned software function modules are stored in a storage medium, and include several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or a processor (English: processor) to execute the methods of the various embodiments of the present application. partial steps.
  • An embodiment of the present application further provides a computer-readable storage medium, on which an image data processing program is stored, and when the image data processing program is executed by a processor, the steps of the image data processing method in any of the foregoing embodiments are implemented.
  • An embodiment of the present application further provides a computer program product, the computer program product includes computer program code, and when the computer program code is run on the computer, the computer is made to execute the methods in the above various possible implementation manners.
  • the embodiment of the present application also provides a chip, including a memory and a processor.
  • the memory is used to store a computer program
  • the processor is used to call and run the computer program from the memory, so that the device installed with the chip executes the above various possible implementation modes. Methods.
  • Units in the device in the embodiment of the present application may be combined, divided and deleted according to actual needs.
  • the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or in other words, the part that contributes to the prior art, and the computer software product is stored in one of the above storage media (such as ROM/RAM, magnetic CD, CD), including several instructions to make a terminal device (which may be a mobile phone, computer, server, controlled terminal, or network device, etc.) execute the method of each embodiment of the present application.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • a computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer can be a general purpose computer, special purpose computer, a computer network, or other programmable apparatus.
  • Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, alternatively, computer instructions may be transferred from a website site, computer, server or data center via a wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server, a data center, etc. integrated with one or more available media.
  • Usable media may be magnetic media, (optionally floppy disks, memory disks, tapes), optical media (optionally, DVD), or semiconductor media (eg Solid State Disk (SSD)), etc.

Abstract

The present application provides an image data processing method, an intelligent terminal, and a storage medium. The image data processing method comprises: determining or generating semantic information of an image on the basis of image information; and storing the semantic information in a data stream of the image on the basis of a preset format. The semantic information of photography is calculated in a unified and standard manner in a data stream involved in computational photography.

Description

图像数据处理方法、智能终端及存储介质Image data processing method, intelligent terminal and storage medium 技术领域technical field
本申请涉及图像数据处理技术领域,具体涉及一种图像数据处理方法、智能终端及存储介质。The present application relates to the technical field of image data processing, and in particular to an image data processing method, an intelligent terminal and a storage medium.
背景技术Background technique
计算摄影(computational photography)是指使用数字计算而不是光学处理的数字图像捕获和处理技术。计算摄影可以提高摄像设备的能力,或者引入相比基于胶片的摄影更多的特征,或者降低摄像元件的成本或尺寸。Computational photography refers to digital image capture and processing techniques that use digital computation rather than optical processing. Computational photography can increase the capabilities of camera equipment, or introduce more features than film-based photography, or reduce the cost or size of camera elements.
在构思及实现本申请过程中,发明人发现至少存在如下问题:计算摄影涉及的数据流尚没有统一的标准进行规范。During the process of conceiving and implementing this application, the inventors found at least the following problems: there is no unified standard for regulating the data flow involved in computational photography.
前面的叙述在于提供一般的背景信息,并不一定构成现有技术。The foregoing description is provided to provide general background information and does not necessarily constitute prior art.
技术解决方案technical solution
针对上述技术问题,本申请提供一种图像数据处理方法、智能终端及存储介质,以在计算摄影涉及的数据流中统一规范计算摄影的语义信息。In view of the above technical problems, the present application provides an image data processing method, a smart terminal and a storage medium, so as to uniformly standardize the semantic information of computational photography in the data flow involved in computational photography.
为解决上述技术问题,本申请提供一种图像数据处理方法,包括以下步骤:In order to solve the above technical problems, the present application provides a method for processing image data, comprising the following steps:
S1:基于图像信息,确定或生成图像的语义信息;S1: Based on the image information, determine or generate the semantic information of the image;
S2:基于预设格式,图像的数据流包含语义信息。S2: Based on the preset format, the data stream of the image contains semantic information.
可选地,图像信息包括图像基本信息和图像数据。Optionally, the image information includes basic image information and image data.
可选地,图像数据即图像本身。Optionally, the image data is the image itself.
可选地,图像基本信息也可以称为图像的基本描述信息,可以包括图像描述信息标识、基本描述信息长度、图像类型标识、图像长度、图像宽度、图像色彩空间、位宽和存储方式等。Optionally, basic image information can also be referred to as basic image description information, which can include image description information identification, basic description information length, image type identification, image length, image width, image color space, bit width, and storage mode.
可选地,图像描述信息标识,用于标识图像“基本描述信息”字段。Optionally, the image description information identifier is used to identify the "basic description information" field of the image.
可选地,基本描述信息长度,表示基本描述信息字段的总长度,包含图像描述信息标识。Optionally, the length of the basic description information indicates the total length of the basic description information field, including the image description information identifier.
可选地,图像类型标识,用于标识影像数据类型是单帧图像、多帧图像或视频流。Optionally, the image type identifier is used to identify whether the image data type is a single-frame image, multi-frame image or video stream.
可选地,图像长度,即图像数据本身的长度。Optionally, the image length is the length of the image data itself.
可选地,图像宽度,即图像数据本身的宽度。Optionally, the image width, that is, the width of the image data itself.
可选地,图像色彩空间,图像数据色彩空间描述,如RGGB,RGBW,RYYB等。Optionally, image color space, image data color space description, such as RGGB, RGBW, RYYB, etc.
可选地,位宽,图像每个分量的比特数。Optionally, bitwidth, the number of bits per component of the image.
可选地,存储方式,图像色彩空间中每个分量的每个像素在存储空间(如内存,再如闪存,还如硬盘等)中的排列方式。Optionally, the storage mode refers to the arrangement mode of each pixel of each component in the image color space in a storage space (such as memory, flash memory, or hard disk, etc.).
可选地,语义信息用于解读图像。Optionally, semantic information is used to interpret the image.
可选地,语义信息包括以下至少一种:深度信息、场景分类信息、实例分割信息以及目标检测信息。Optionally, the semantic information includes at least one of the following: depth information, scene classification information, instance segmentation information, and object detection information.
可选地,场景分类信息用于表征图像表达的场景。Optionally, the scene classification information is used to characterize the scene represented by the image.
可选地,实例分割信息用于表征图像中实例的分割信息。Optionally, the instance segmentation information is used to characterize the segmentation information of the instance in the image.
可选地,深度信息包括以下至少一种:Optionally, the depth information includes at least one of the following:
深度图像,深度图像中像素值用于表征图像中像素点与成像图像时使用的相机的距离;Depth image, the pixel value in the depth image is used to represent the distance between the pixel point in the image and the camera used to image the image;
所述距离中的最大值;the maximum of said distances;
所述距离中的最小值;the minimum of said distances;
所述最大值和所述最小值之间距离的量化范围;a quantified range of the distance between said maximum value and said minimum value;
所述图像是否包含无限远部分的指示信息,无限远部分为超过设备所能侦测的最远距离的距离。Whether the image contains indication information of an infinite part, where the infinite part is a distance beyond the farthest distance that the device can detect.
可选地,语义信息包括深度信息,S1步骤包括:基于图像信息,通过激光测距雷达和/或深度信息解析网络获得所述深度信息。可选地,深度信息解析网络用于解析图像信息以生成深度信息。Optionally, the semantic information includes depth information, and the step S1 includes: obtaining the depth information based on the image information through a laser ranging radar and/or a depth information analysis network. Optionally, the depth information parsing network is used to parse image information to generate depth information.
可选地,语义信息包括场景分类信息,S1步骤,包括:基于图像信息,提取图像的图像场景特征;根据图像场景特征,确定或生成图像的场景分类信息。Optionally, the semantic information includes scene classification information. Step S1 includes: extracting image scene features of the image based on the image information; determining or generating the scene classification information of the image according to the image scene features.
可选地,上述根据图像场景特征,确定或生成图像的场景分类信息,包括:将图像场景特征输入场景分类模型,得到场景分类模型输出的所述图像对应至少一种场景的概率;在图像对应至少一种场景的概率中,确定最大概率对应的场景为图像的场景分类信息。可选地,场景分类模型用于确定图像对应至少一种场景的概率。Optionally, determining or generating the scene classification information of the image according to the scene characteristics of the image includes: inputting the scene characteristics of the image into the scene classification model, and obtaining the probability that the image output by the scene classification model corresponds to at least one scene; Among the probabilities of at least one scene, it is determined that the scene corresponding to the maximum probability is the scene classification information of the image. Optionally, the scene classification model is used to determine the probability that the image corresponds to at least one scene.
可选地,S2步骤,包括:基于预设格式,在图像的数据流的预留字段中填充语义信息。可选地,预设格式为至少一种语义信息的任意组合。Optionally, step S2 includes: based on a preset format, filling semantic information in a reserved field of the image data stream. Optionally, the preset format is any combination of at least one semantic information.
可选地,不同类型的语义信息对应的预设格式是不同的。Optionally, the preset formats corresponding to different types of semantic information are different.
可选地,预设格式包括表格、单通道位图、矩阵和键值对中的至少一种。Optionally, the preset format includes at least one of table, single-channel bitmap, matrix and key-value pair.
可选地,预设格式还包括信息头,信息头用于表征图像的数据流中是否包含语义信息, 和/或,信息头用于表征图像的数据流中所包含的语义信息的类型。Optionally, the preset format further includes an information header, which is used to indicate whether the image data stream contains semantic information, and/or, the information header is used to indicate the type of semantic information contained in the image data stream.
可选地,S2步骤之前,还包括:根据预设对应关系,确定语义信息对应的标识信息。Optionally, before step S2, the method further includes: determining the identification information corresponding to the semantic information according to the preset corresponding relationship.
可选地,语义信息包括场景分类信息、实例分割信息以及目标检测信息中的至少一种;标识信息对应包括场景ID、实例ID和目标ID中的至少一种,预设对应关系包含场景名称和场景ID的对应关系、实例名称和实例ID的对应关系以及目标名称和目标ID的对应关系中的至少一种。Optionally, the semantic information includes at least one of scene classification information, instance segmentation information, and target detection information; the identification information includes at least one of scene ID, instance ID, and target ID, and the preset correspondence includes scene name and At least one of the corresponding relationship between scene IDs, the corresponding relationship between instance names and instance IDs, and the corresponding relationship between target names and target IDs.
本申请还提供一种图像数据处理方法,包括以下步骤:The present application also provides an image data processing method, comprising the following steps:
S10:基于预设格式,从图像的数据流中获取语义信息;S10: Obtain semantic information from the data stream of the image based on the preset format;
S20:根据语义信息进行预设处理。S20: Perform preset processing according to the semantic information.
可选地,语义信息用于解读图像。Optionally, semantic information is used to interpret the image.
可选地,语义信息包括以下至少一种:Optionally, the semantic information includes at least one of the following:
深度图像,深度图像中像素值用于表征图像中像素点与成像图像时使用的相机的距离;Depth image, the pixel value in the depth image is used to represent the distance between the pixel point in the image and the camera used to image the image;
所述距离中的最大值;the maximum of said distances;
所述距离中的最小值;the minimum of said distances;
所述最大值和所述最小值之间距离的量化范围;a quantified range of the distance between said maximum value and said minimum value;
图像是否包含无限远部分的指示信息,无限远部分为超过设备所能侦测的最远距离的距离;Whether the image contains an indication of an infinity part, which is a distance beyond the furthest distance that the device can detect;
场景分类信息;Scene classification information;
实例分割信息;Instance segmentation information;
目标检测信息。Target detection information.
可选地,场景分类信息用于表征图像表达的场景。Optionally, the scene classification information is used to characterize the scene represented by the image.
可选地,实例分割信息用于表征图像中实例分割信息。Optionally, the instance segmentation information is used to represent the instance segmentation information in the image.
可选地,S10步骤,包括:基于预设格式,在图像的数据流的预留字段中读取语义信息。可选地,预设格式为至少一种语义信息的任意组合。Optionally, step S10 includes: reading semantic information in a reserved field of the image data stream based on a preset format. Optionally, the preset format is any combination of at least one semantic information.
可选地,不同类型的语义信息对应的预设格式是不同的。Optionally, the preset formats corresponding to different types of semantic information are different.
可选地,预设格式包括表格、单通道位图、矩阵和键值对中的至少一种。Optionally, the preset format includes at least one of table, single-channel bitmap, matrix and key-value pair.
可选地,预设格式还包括信息头,信息头用于表征图像的数据流中是否包含语义信息,和/或,信息头用于表征图像的数据流中所包含的语义信息的类型。Optionally, the preset format further includes an information header, which is used to indicate whether the image data stream contains semantic information, and/or, the information header is used to indicate the type of semantic information contained in the image data stream.
可选地,S20步骤,包括以下至少一种:Optionally, step S20 includes at least one of the following:
语义信息包含场景分类信息,根据场景分类信息,调整相机在对应场景下的目标参数;Semantic information includes scene classification information, according to the scene classification information, adjust the target parameters of the camera in the corresponding scene;
语义信息包含目标检测信息,根据目标检测信息,调整相机自动对焦的目标;Semantic information includes target detection information, according to the target detection information, adjust the target of the camera's automatic focus;
语义信息包含实例分割信息,根据实例分割信息,获得图像中的目标实例;根据目标实例,对图像进行预设处理。Semantic information includes instance segmentation information, according to the instance segmentation information, the target instance in the image is obtained; according to the target instance, the image is preset.
可选地,预设处理包括实例虚化、实例变形、实例留色、针对实例进行lut映射以及和针对背景进行lut映射中的至少一种。Optionally, the preset processing includes at least one of instance blurring, instance deformation, instance color retention, performing lut mapping on instances, and performing lut mapping on backgrounds.
可选地,目标参数包括自动曝光参数、显示查找表、自动对焦参数和白平衡参数中的至少一种。Optionally, the target parameters include at least one of automatic exposure parameters, display lookup tables, automatic focus parameters and white balance parameters.
本申请还提供一种图像数据处理装置,包括:The present application also provides an image data processing device, including:
处理模块,用于基于图像信息,确定或生成图像的语义信息;A processing module, configured to determine or generate semantic information of the image based on the image information;
保存模块,用于基于预设格式,在图像的数据流中保存语义信息。The saving module is used for saving semantic information in the image data stream based on a preset format.
可选地,图像信息包括图像基本信息和图像数据。可选地,图像数据即图像本身。图像基本信息也可以称为图像的基本描述信息,可以包括图像描述信息标识、基本描述信息长度、图像类型标识、图像长度、图像宽度、图像色彩空间、位宽和存储方式等。可选地:Optionally, the image information includes basic image information and image data. Optionally, the image data is the image itself. The basic image information can also be called the basic description information of the image, which can include image description information identification, basic description information length, image type identification, image length, image width, image color space, bit width and storage method, etc. Optionally:
图像描述信息标识,用于标识图像“基本描述信息”字段;Image description information identifier, used to identify the "basic description information" field of the image;
基本描述信息长度,表示基本描述信息字段的总长度,包含图像描述信息标识;Basic description information length, indicating the total length of the basic description information field, including the image description information identifier;
图像类型标识,用于标识影像数据类型是单帧图像、多帧图像或视频流;Image type identification, which is used to identify whether the image data type is a single-frame image, multi-frame image or video stream;
图像长度,即图像数据本身的长度;Image length, that is, the length of the image data itself;
图像宽度,即图像数据本身的宽度;Image width, that is, the width of the image data itself;
图像色彩空间,图像数据色彩空间描述,如RGGB,RGBW,RYYB等;Image color space, image data color space description, such as RGGB, RGBW, RYYB, etc.;
位宽,图像每个分量的比特数;Bit width, the number of bits per component of the image;
存储方式,图像色彩空间中每个分量的每个像素在存储空间(如内存,再如闪存,还如硬盘等)中的排列方式。Storage method, the arrangement method of each pixel of each component in the image color space in the storage space (such as memory, flash memory, or hard disk, etc.).
可选地,语义信息用于解读图像。Optionally, semantic information is used to interpret the image.
可选地,语义信息包括以下至少一种:深度信息、场景分类信息、实例分割信息以及目标检测信息。Optionally, the semantic information includes at least one of the following: depth information, scene classification information, instance segmentation information, and object detection information.
可选地,场景分类信息用于表征图像表达的场景。Optionally, the scene classification information is used to characterize the scene represented by the image.
可选地,实例分割信息用于表征图像中实例分割信息。Optionally, the instance segmentation information is used to represent the instance segmentation information in the image.
可选地,深度信息包括以下至少一种:Optionally, the depth information includes at least one of the following:
深度图像,深度图像中像素值用于表征图像中像素点与成像图像时使用的相机的距 离;Depth image, the pixel value in the depth image is used to represent the distance between the pixel point in the image and the camera used when imaging the image;
所述距离中的最大值;the maximum of said distances;
所述距离中的最小值;the minimum of said distances;
所述最大值和所述最小值之间距离的量化范围;a quantified range of the distance between said maximum value and said minimum value;
图像是否包含无限远部分的指示信息,无限远部分为超过设备所能侦测的最远距离的距离。An indication of whether the image contains an infinity portion, which is a distance beyond the furthest the device can detect.
可选地,语义信息包括深度信息,处理模块具体用于:基于图像信息,通过激光测距雷达和/或深度信息解析网络获得所述深度信息。Optionally, the semantic information includes depth information, and the processing module is specifically configured to: obtain the depth information through a laser ranging radar and/or a depth information analysis network based on image information.
可选地,深度信息解析网络用于解析图像信息以生成深度信息。Optionally, the depth information parsing network is used to parse image information to generate depth information.
可选地,语义信息包括场景分类信息,处理模块还用于:基于图像信息,提取图像的图像场景特征;根据图像场景特征,确定或生成图像的场景分类信息。Optionally, the semantic information includes scene classification information, and the processing module is further configured to: extract the image scene features of the image based on the image information; determine or generate the scene classification information of the image according to the image scene features.
可选地,处理模块还用于:将图像场景特征输入场景分类模型,得到场景分类模型输出的所述图像对应至少一种场景的概率;在图像对应至少一种场景的概率中,确定最大概率对应的场景为图像的场景分类信息。可选地,场景分类模型用于确定图像对应至少一种场景的概率。Optionally, the processing module is also used to: input the image scene features into the scene classification model to obtain the probability that the image output by the scene classification model corresponds to at least one scene; among the probability that the image corresponds to at least one scene, determine the maximum probability The corresponding scene is scene classification information of the image. Optionally, the scene classification model is used to determine the probability that the image corresponds to at least one scene.
可选地,保存模块具体用于:基于预设格式,在图像的数据流的预留字段中填充语义信息。可选地,预设格式为至少一种语义信息的任意组合。Optionally, the saving module is specifically configured to: fill semantic information in a reserved field of the image data stream based on a preset format. Optionally, the preset format is any combination of at least one semantic information.
可选地,不同类型的语义信息对应的预设格式是不同的。Optionally, the preset formats corresponding to different types of semantic information are different.
可选地,预设格式包括表格、单通道位图、矩阵和键值对中的至少一种。Optionally, the preset format includes at least one of table, single-channel bitmap, matrix and key-value pair.
可选地,预设格式还包括信息头,信息头用于表征图像的数据流中是否包含语义信息,和/或,信息头用于表征图像的数据流中所包含的语义信息的类型。Optionally, the preset format further includes an information header, which is used to indicate whether the image data stream contains semantic information, and/or, the information header is used to indicate the type of semantic information contained in the image data stream.
可选地,保存模块还用于:基于预设格式,在图像的数据流中保存语义信息之前,根据预设对应关系,确定语义信息对应的标识信息。可选地,语义信息包括场景分类信息、实例分割信息以及目标检测信息中的至少一种;标识信息对应包括场景ID、实例ID和目标ID中的至少一种,预设对应关系包含场景名称和场景ID的对应关系、实例名称和实例ID的对应关系以及目标名称和目标ID的对应关系中的至少一种。Optionally, the saving module is further configured to: based on a preset format, before saving the semantic information in the image data stream, determine the identification information corresponding to the semantic information according to a preset correspondence. Optionally, the semantic information includes at least one of scene classification information, instance segmentation information, and target detection information; the identification information includes at least one of scene ID, instance ID, and target ID, and the preset correspondence includes scene name and At least one of the corresponding relationship between scene IDs, the corresponding relationship between instance names and instance IDs, and the corresponding relationship between target names and target IDs.
本申请还提供一种图像数据处理装置,包括:The present application also provides an image data processing device, including:
获取模块,用于基于预设格式,从图像的数据流中获取语义信息;An acquisition module, configured to acquire semantic information from image data streams based on a preset format;
处理模块,用于根据语义信息进行预设处理。The processing module is used for performing preset processing according to the semantic information.
可选地,语义信息用于解读图像。Optionally, semantic information is used to interpret the image.
可选地,语义信息包括以下至少一种:Optionally, the semantic information includes at least one of the following:
深度图像,深度图像中像素值用于表征图像中像素点与成像图像时使用的相机的距离;Depth image, the pixel value in the depth image is used to represent the distance between the pixel point in the image and the camera used to image the image;
所述距离中的最大值;the maximum of said distances;
所述距离中的最小值;the minimum of said distances;
所述最大值和所述最小值之间距离的量化范围;a quantified range of the distance between said maximum value and said minimum value;
图像是否包含无限远部分的指示信息,无限远部分为超过设备所能侦测的最远距离的距离;Whether the image contains an indication of an infinity part, which is a distance beyond the furthest distance that the device can detect;
场景分类信息;Scene classification information;
实例分割信息;Instance segmentation information;
目标检测信息。Target detection information.
可选地,场景分类信息用于表征图像表达的场景。Optionally, the scene classification information is used to characterize the scene represented by the image.
可选地,实例分割信息用于表征图像中实例分割信息。Optionally, the instance segmentation information is used to represent the instance segmentation information in the image.
可选地,获取模块具体用于:基于预设格式,在图像的数据流的预留字段中读取语义信息。可选地,预设格式为至少一种语义信息的任意组合。Optionally, the obtaining module is specifically configured to: read semantic information in a reserved field of the image data stream based on a preset format. Optionally, the preset format is any combination of at least one semantic information.
可选地,不同类型的语义信息对应的预设格式是不同的。Optionally, the preset formats corresponding to different types of semantic information are different.
可选地,预设格式包括表格、单通道位图、矩阵和键值对中的至少一种。Optionally, the preset format includes at least one of table, single-channel bitmap, matrix and key-value pair.
可选地,预设格式还包括信息头,信息头用于表征图像的数据流中是否包含语义信息,和/或,信息头用于表征图像的数据流中所包含的语义信息的类型。Optionally, the preset format further includes an information header, which is used to indicate whether the image data stream contains semantic information, and/or, the information header is used to indicate the type of semantic information contained in the image data stream.
可选地,处理模块具体用于以下至少一种:Optionally, the processing module is specifically used for at least one of the following:
语义信息包含场景分类信息,根据场景分类信息,调整相机在对应场景下的目标参数;Semantic information includes scene classification information, according to the scene classification information, adjust the target parameters of the camera in the corresponding scene;
语义信息包含目标检测信息,根据目标检测信息,调整相机自动对焦的目标;Semantic information includes target detection information, according to the target detection information, adjust the target of the camera's automatic focus;
语义信息包含实例分割信息,根据实例分割信息,获得图像中的目标实例;根据目标实例,对图像进行预设处理。Semantic information includes instance segmentation information, according to the instance segmentation information, the target instance in the image is obtained; according to the target instance, the image is preset.
可选地,预设处理包括实例虚化、实例变形、实例留色、针对实例进行lut映射以及和针对背景进行lut映射中的至少一种。Optionally, the preset processing includes at least one of instance blurring, instance deformation, instance color retention, performing lut mapping on instances, and performing lut mapping on backgrounds.
可选地,目标参数包括自动曝光参数、显示查找表、自动对焦参数和白平衡参数中的至少一种。Optionally, the target parameters include at least one of automatic exposure parameters, display lookup tables, automatic focus parameters and white balance parameters.
本申请还提供一种智能终端,包括:存储器、处理器,其中,存储器上存储有图像数据处理程序,图像数据处理程序被处理器执行时实现如上述任一图像数据处理方法的步 骤。The present application also provides an intelligent terminal, including: a memory and a processor, wherein an image data processing program is stored in the memory, and when the image data processing program is executed by the processor, the steps of any one of the above image data processing methods are implemented.
本申请还提供一种计算机可读存储介质,存储介质存储有计算机程序,计算机程序被处理器执行时实现如上述任一图像数据处理方法的步骤。The present application also provides a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, the steps of any one of the above-mentioned image data processing methods are realized.
本申请还提供一种计算机程序产品,计算机程序产品包括计算机程序;计算机程序被执行时,实现如上述任一图像数据处理方法的步骤。The present application also provides a computer program product, the computer program product includes a computer program; when the computer program is executed, the steps of any one of the image data processing methods above are realized.
如上所述,本申请的图像数据处理方法,基于图像信息,确定或生成图像的语义信息,该语义信息用于解读图像;基于预设格式,在图像的数据流中保存语义信息。通过上述方式,基于预设格式在图像的数据流中保存用于解读图像的语义信息,以在计算摄影涉及的数据流中统一规范计算摄影的语义信息。As mentioned above, the image data processing method of the present application determines or generates the semantic information of the image based on the image information, and the semantic information is used to interpret the image; based on the preset format, the semantic information is stored in the data stream of the image. Through the above method, based on the preset format, the semantic information for interpreting the image is stored in the data stream of the image, so as to uniformly standardize the semantic information of computational photography in the data stream involved in computational photography.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。为了更清楚地说明本申请实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description serve to explain the principles of the application. In order to more clearly illustrate the technical solutions of the embodiments of the present application, the accompanying drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, for those of ordinary skill in the art, the Under the premise, other drawings can also be obtained based on these drawings.
图1为实现本申请各个实施例的一种智能终端的硬件结构示意图;FIG. 1 is a schematic diagram of a hardware structure of an intelligent terminal implementing various embodiments of the present application;
图2为本申请实施例提供的一种通信网络系统架构图;FIG. 2 is a system architecture diagram of a communication network provided by an embodiment of the present application;
图3为根据第一实施例示出的图像数据处理方法的流程示意图;Fig. 3 is a schematic flowchart of an image data processing method according to a first embodiment;
图4为本申请实施例示出的深度图像的示例图;FIG. 4 is an example diagram of a depth image shown in an embodiment of the present application;
图5为本申请实施例示出的实例分割信息的示例图;FIG. 5 is an example diagram of instance segmentation information shown in an embodiment of the present application;
图6为根据第二实施例示出的图像数据处理方法的流程示意图;Fig. 6 is a schematic flowchart of an image data processing method according to a second embodiment;
图7为根据第三实施例示出的图像数据处理装置的结构示意图;Fig. 7 is a schematic structural diagram of an image data processing device according to a third embodiment;
图8为根据第四实施例示出的图像数据处理装置的结构示意图;Fig. 8 is a schematic structural diagram of an image data processing device according to a fourth embodiment;
图9为根据第五实施例示出的智能终端的结构示意图。Fig. 9 is a schematic structural diagram of a smart terminal according to a fifth embodiment.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。通过上述附图,已示出本申请明确的实施例,后文中将有更详细的描述。这些附图和文字描述并不是为了通过任何方式限制本申请构思的范围,而是通过参考特定实施例为本领域技术人员说明本申请的概念。The realization, functional features and advantages of the present application will be further described in conjunction with the embodiments and with reference to the accompanying drawings. By means of the above drawings, specific embodiments of the present application have been shown, which will be described in more detail hereinafter. These drawings and text descriptions are not intended to limit the scope of the concept of the application in any way, but to illustrate the concept of the application for those skilled in the art by referring to specific embodiments.
本申请的实施方式Embodiment of this application
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图 时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present application as recited in the appended claims.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素,此外,本申请不同实施例中具有同样命名的部件、特征、要素可能具有相同含义,也可能具有不同含义,其具体含义需以其在该具体实施例中的解释或者进一步结合该具体实施例中上下文进行确定。It should be noted that, in this document, the term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the statement "comprising a..." does not exclude the presence of other identical elements in the process, method, article, or device that includes the element. In addition, different implementations of the present application Components, features, and elements with the same name in the example may have the same meaning, or may have different meanings, and the specific meaning shall be determined based on the explanation in the specific embodiment or further combined with the context in the specific embodiment.
应当理解,尽管在本文可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本文范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语"如果"可以被解释成为"在……时"或"当……时"或"响应于确定"。再者,如同在本文中所使用的,单数形式“一”、“一个”和“该”旨在也包括复数形式,除非上下文中有相反的指示。应当进一步理解,术语“包含”、“包括”表明存在所述的特征、步骤、操作、元件、组件、项目、种类、和/或组,但不排除一个或多个其他特征、步骤、操作、元件、组件、项目、种类、和/或组的存在、出现或添加。本申请使用的术语“或”、“和/或”、“包括以下至少一个”等可被解释为包括性的,或意味着任一个或任何组合。例如,“包括以下至少一个:A、B、C”意味着“以下任一个:A;B;C;A和B;A和C;B和C;A和B和C”,再如,“A、B或C”或者“A、B和/或C”意味着“以下任一个:A;B;C;A和B;A和C;B和C;A和B和C”。仅当元件、功能、步骤或操作的组合在某些方式下内在地互相排斥时,才会出现该定义的例外。It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of this document, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, the word "if" as used herein may be interpreted as "at" or "when" or "in response to a determination". Furthermore, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It should be further understood that the terms "comprising", "comprising" indicate the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not exclude one or more other features, steps, operations, The existence, occurrence or addition of an element, component, item, species, and/or group. The terms "or", "and/or", "comprising at least one of" and the like used in this application may be interpreted as inclusive, or mean any one or any combination. For example, "including at least one of the following: A, B, C" means "any of the following: A; B; C; A and B; A and C; B and C; A and B and C", another example, " A, B or C" or "A, B and/or C" means "any of the following: A; B; C; A and B; A and C; B and C; A and B and C". Exceptions to this definition will only arise when combinations of elements, functions, steps or operations are inherently mutually exclusive in some way.
应该理解的是,虽然本申请实施例中的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flow chart in the embodiment of the present application are displayed sequentially as indicated by the arrows, these steps are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some of the steps in the figure may include multiple sub-steps or multiple stages, these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the execution order is not necessarily sequential Instead, it may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
取决于语境,如在此所使用的词语“如果”、“若”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。类似地,取决于语境,短语“如果确定”或“如果检测(陈述的条件或事件)”可以被解释成为“当确定时”或“响应于确定”或“当检测(陈述的条件或事件)时”或“响应于检测(陈述的条件或事件)”。Depending on the context, the words "if", "if" as used herein may be interpreted as "at" or "when" or "in response to determining" or "in response to detecting". Similarly, depending on the context, the phrases "if determined" or "if detected (the stated condition or event)" could be interpreted as "when determined" or "in response to the determination" or "when detected (the stated condition or event) )" or "in response to detection of (a stated condition or event)".
需要说明的是,在本文中,采用了诸如S1、S2、S10、S20等步骤代号,其目的是为了更清楚简要地表述相应内容,不构成顺序上的实质性限制,本领域技术人员在具体实施时,可能会先执行S2后执行S1,或者,可能会先执行S20后执行S10等,但这些均应在本申请的保护范围之内。It should be noted that in this article, step codes such as S1, S2, S10, and S20 are used for the purpose of expressing the corresponding content more clearly and concisely, and do not constitute a substantive limitation on the order. During implementation, S2 may be executed first and then S1, or S20 may be executed first and then S10, etc., but these should be within the protection scope of the present application.
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.
在后续的描述中,使用用于表示元件的诸如“模块”、“部件”或者“单元”的后缀仅为了有利于本申请的说明,其本身没有特定的意义。因此,“模块”、“部件”或者“单元”可以混合地使用。In the following description, the use of suffixes such as 'module', 'part' or 'unit' for denoting elements is only for facilitating the description of the present application and has no specific meaning by itself. Therefore, 'module', 'part' or 'unit' may be mixedly used.
智能终端可以以各种形式来实施。例如,本申请中描述的智能终端可以包括诸如手机、平板电脑、笔记本电脑、掌上电脑、个人数字助理(Personal Digital Assistant,PDA)、便捷式媒体播放器(Portable Media Player,PMP)、导航装置、可穿戴设备、智能手环、计步器等智能终端,以及诸如数字TV、台式计算机等固定终端。Smart terminals can be implemented in various forms. For example, the smart terminals described in this application may include mobile phones, tablet computers, notebook computers, palmtop computers, personal digital assistants (Personal Digital Assistant, PDA), portable media players (Portable Media Player, PMP), navigation devices, Smart terminals such as wearable devices, smart bracelets, and pedometers, as well as fixed terminals such as digital TVs and desktop computers.
后续描述中将以智能终端为例进行说明,本领域技术人员将理解的是,除了特别用于移动目的的元件之外,根据本申请的实施方式的构造也能够应用于固定类型的终端。In the subsequent description, a smart terminal will be taken as an example, and those skilled in the art will understand that, in addition to elements specially used for mobile purposes, the configurations according to the embodiments of the present application can also be applied to fixed-type terminals.
请参阅图1,其为实现本申请各个实施例的一种智能终端的硬件结构示意图,该智能终端100可以包括:RF(Radio Frequency,射频)单元101、WiFi模块102、音频输出单元103、A/V(音频/视频)输入单元104、传感器105、显示单元106、用户输入单元107、接口单元108、存储器109、处理器110、以及电源111等部件。本领域技术人员可以理解,图1中示出的智能终端结构并不构成对智能终端的限定,智能终端可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Please refer to FIG. 1, which is a schematic diagram of the hardware structure of a smart terminal implementing various embodiments of the present application. The smart terminal 100 may include: an RF (Radio Frequency, radio frequency) unit 101, a WiFi module 102, an audio output unit 103, an /V (audio/video) input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, memory 109, processor 110, and power supply 111 and other components. Those skilled in the art can understand that the smart terminal structure shown in Figure 1 does not constitute a limitation on the smart terminal, and the smart terminal may include more or less components than shown in the figure, or combine certain components, or different components layout.
下面结合图1对智能终端的各个部件进行具体的介绍:The following is a specific introduction to each component of the smart terminal in conjunction with Figure 1:
射频单元101可用于收发信息或通话过程中,信号的接收和发送,具体的,将基站的下行信息接收后,给处理器110处理;另外,将上行的数据发送给基站。通常,射频单元101包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器、双工器等。此外,射频单元101还可以通过无线通信与网络和其他设备通信。上述无线通信可以使用任一通信标准或协议,包括但不限于GSM(Global System of Mobile communication, 全球移动通讯系统)、GPRS(General Packet Radio Service,通用分组无线服务)、CDMA2000(Code Division Multiple Access 2000,码分多址2000)、WCDMA(Wideband Code Division Multiple Access,宽带码分多址)、TD-SCDMA(Time Division-Synchronous Code Division Multiple Access,时分同步码分多址)、FDD-LTE(Frequency Division Duplexing-Long Term Evolution,频分双工长期演进)、TDD-LTE(Time Division Duplexing-Long Term Evolution,分时双工长期演进)和5G等。The radio frequency unit 101 can be used for sending and receiving information or receiving and sending signals during a call. Specifically, after receiving the downlink information of the base station, it is processed by the processor 110; in addition, the uplink data is sent to the base station. Generally, the radio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 101 can also communicate with the network and other devices through wireless communication. The above wireless communication can use any communication standard or protocol, including but not limited to GSM (Global System of Mobile communication, Global System for Mobile Communications), GPRS (General Packet Radio Service, General Packet Radio Service), CDMA2000 (Code Division Multiple Access 2000 , Code Division Multiple Access 2000), WCDMA (Wideband Code Division Multiple Access, Wideband Code Division Multiple Access), TD-SCDMA (Time Division-Synchronous Code Division Multiple Access, Time Division Synchronous Code Division Multiple Access), FDD-LTE (Frequency Division Duplexing-Long Term Evolution, frequency division duplex long-term evolution), TDD-LTE (Time Division Duplexing-Long Term Evolution, time-division duplex long-term evolution) and 5G, etc.
WiFi属于短距离无线传输技术,智能终端通过WiFi模块102可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图1示出了WiFi模块102,但是可以理解的是,其并不属于智能终端的必须构成,完全可以根据需要在不改变发明的本质的范围内而省略。WiFi is a short-distance wireless transmission technology. Through the WiFi module 102, the smart terminal can help users send and receive emails, browse web pages, and access streaming media, etc., and it provides users with wireless broadband Internet access. Although Fig. 1 shows the WiFi module 102, it can be understood that it is not an essential component of the smart terminal, and can be completely omitted as required without changing the essence of the invention.
音频输出单元103可以在智能终端100处于呼叫信号接收模式、通话模式、记录模式、语音识别模式、广播接收模式等等模式下时,将射频单元101或WiFi模块102接收的或者在存储器109中存储的音频数据转换成音频信号并且输出为声音。而且,音频输出单元103还可以提供与智能终端100执行的特定功能相关的音频输出(可选地,呼叫信号接收声音、消息接收声音等等)。音频输出单元103可以包括扬声器、蜂鸣器等等。The audio output unit 103 can store the information received by the radio frequency unit 101 or the WiFi module 102 or stored in the memory 109 when the smart terminal 100 is in a call signal receiving mode, a call mode, a recording mode, a voice recognition mode, a broadcast receiving mode, or the like. The audio data is converted into an audio signal and output as sound. Moreover, the audio output unit 103 can also provide audio output related to specific functions performed by the smart terminal 100 (optionally, call signal receiving sound, message receiving sound, etc.). The audio output unit 103 may include a speaker, a buzzer, and the like.
A/V输入单元104用于接收音频或视频信号。A/V输入单元104可以包括图形处理器(Graphics Processing Unit,GPU)1041和麦克风1042,图形处理器1041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图像或视频的图像数据进行处理。处理后的图像可以显示在显示单元106上。经图形处理器1041处理后的图像可以存储在存储器109(或其它存储介质)中或者经由射频单元101或WiFi模块102进行发送。麦克风1042可以在电话通话模式、记录模式、语音识别模式等等运行模式中经由麦克风1042接收声音(音频数据),并且能够将这样的声音处理为音频数据。处理后的音频(语音)数据可以在电话通话模式的情况下转换为可经由射频单元101发送到移动通信基站的格式输出。麦克风1042可以实施各种类型的噪声消除(或抑制)算法以消除(或抑制)在接收和发送音频信号的过程中产生的噪声或者干扰。The A/V input unit 104 is used to receive audio or video signals. The A/V input unit 104 may include a graphics processor (Graphics Processing Unit, GPU) 1041 and a microphone 1042, and the graphics processor 1041 is used for still images obtained by an image capture device (such as a camera) or The image data of the video is processed. The processed image can be displayed on the display unit 106 . The image processed by the graphics processor 1041 may be stored in the memory 109 (or other storage medium) or sent via the radio frequency unit 101 or the WiFi module 102 . The microphone 1042 can receive sound (audio data) via the microphone 1042 in a phone call mode, a recording mode, a voice recognition mode, and the like operating modes, and can process such sound as audio data. The processed audio (voice) data can be converted into a format transmittable to a mobile communication base station via the radio frequency unit 101 for output in case of a phone call mode. The microphone 1042 may implement various types of noise cancellation (or suppression) algorithms to cancel (or suppress) noise or interference generated in the process of receiving and transmitting audio signals.
智能终端100还包括至少一种传感器105,比如光传感器、运动传感器以及其他传感器。可选地,光传感器包括环境光传感器及接近传感器,可选地,环境光传感器可根据环境光线的明暗来调节显示面板1061的亮度,接近传感器可在智能终端100移动到耳边时,关闭显示面板1061和/或背光。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态 的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于手机还可配置的指纹传感器、压力传感器、虹膜传感器、分子传感器、陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。The smart terminal 100 also includes at least one sensor 105, such as a light sensor, a motion sensor, and other sensors. Optionally, the light sensor includes an ambient light sensor and a proximity sensor. Optionally, the ambient light sensor can adjust the brightness of the display panel 1061 according to the brightness of the ambient light, and the proximity sensor can turn off the display when the smart terminal 100 moves to the ear. panel 1061 and/or backlight. As a kind of motion sensor, the accelerometer sensor can detect the magnitude of acceleration in various directions (generally three axes), and can detect the magnitude and direction of gravity when it is stationary, and can be used for applications that recognize the posture of mobile phones (such as horizontal and vertical screen switching, related Games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tap), etc.; as for mobile phones, fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, Other sensors such as thermometers and infrared sensors will not be described in detail here.
显示单元106用于显示由用户输入的信息或提供给用户的信息。显示单元106可包括显示面板1061,可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板1061。The display unit 106 is used to display information input by the user or information provided to the user. The display unit 106 may include a display panel 1061, and the display panel 1061 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an organic light-emitting diode (Organic Light-Emitting Diode, OLED), or the like.
用户输入单元107可用于接收输入的数字或字符信息,以及产生与智能终端的用户设置以及功能控制有关的键信号输入。可选地,用户输入单元107可包括触控面板1071以及其他输入设备1072。触控面板1071,也称为触摸屏,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板1071上或在触控面板1071附近的操作),并根据预先设定的程式驱动相应的连接装置。触控面板1071可包括触摸检测装置和触摸控制器两个部分。可选地,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器110,并能接收处理器110发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板1071。除了触控面板1071,用户输入单元107还可以包括其他输入设备1072。可选地,其他输入设备1072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种,具体此处不做限定。The user input unit 107 can be used to receive input numbers or character information, and generate key signal input related to user settings and function control of the smart terminal. Optionally, the user input unit 107 may include a touch panel 1071 and other input devices 1072 . The touch panel 1071, also referred to as a touch screen, can collect touch operations of the user on or near it (for example, the user uses any suitable object or accessory such as a finger or a stylus on the touch panel 1071 or near the touch panel 1071). operation), and drive the corresponding connection device according to the preset program. The touch panel 1071 may include two parts, a touch detection device and a touch controller. Optionally, the touch detection device detects the user's touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into contact coordinates , and then sent to the processor 110, and can receive the command sent by the processor 110 and execute it. In addition, the touch panel 1071 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the touch panel 1071 , the user input unit 107 may also include other input devices 1072 . Optionally, other input devices 1072 may include, but are not limited to, one or more of physical keyboards, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, etc., which are not specifically described here. limited.
可选地,触控面板1071可覆盖显示面板1061,当触控面板1071检测到在其上或附近的触摸操作后,传送给处理器110以确定触摸事件的类型,随后处理器110根据触摸事件的类型在显示面板1061上提供相应的视觉输出。虽然在图1中,触控面板1071与显示面板1061是作为两个独立的部件来实现智能终端的输入和输出功能,但是在某些实施例中,可以将触控面板1071与显示面板1061集成而实现智能终端的输入和输出功能,具体此处不做限定。Optionally, the touch panel 1071 may cover the display panel 1061. When the touch panel 1071 detects a touch operation on or near it, it transmits to the processor 110 to determine the type of the touch event, and then the processor 110 determines the touch event according to the touch event. The corresponding visual output is provided on the display panel 1061 . Although in FIG. 1, the touch panel 1071 and the display panel 1061 are used as two independent components to realize the input and output functions of the smart terminal, in some embodiments, the touch panel 1071 and the display panel 1061 can be integrated. The implementation of the input and output functions of the smart terminal is not specifically limited here.
接口单元108用作至少一个外部装置与智能终端100连接可以通过的接口。可选地,外部装置可以包括有线或无线头戴式耳机端口、外部电源(或电池充电器)端口、有线或无线数据端口、存储卡端口、用于连接具有识别模块的装置的端口、音频输入/输出(I/O)端口、视频I/O端口、耳机端口等等。接口单元108可以用于接收来自外部装置的输入(可选地,数据信息、电力等等)并且将接收到的输入传输到智能终端100内的一个或多个元件或者可以用于在智能终端100和外部装置之间传输数据。The interface unit 108 is used as an interface through which at least one external device can be connected with the smart terminal 100 . Optionally, the external device may include a wired or wireless headset port, an external power (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device with an identification module, an audio input /Output (I/O) ports, video I/O ports, headphone ports, and more. The interface unit 108 may be used to receive input from an external device (optionally, data information, power, etc.) and transmit the received input to one or more components within the smart terminal 100 or may be used to transfer data to and from external devices.
存储器109可用于存储软件程序以及各种数据。存储器109可主要包括存储程序区和存储数据区,可选地,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器109可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The memory 109 can be used to store software programs as well as various data. The memory 109 can mainly include a storage program area and a storage data area. Optionally, the storage program area can store an operating system, at least one function required application program (such as a sound playback function, an image playback function, etc.) etc.; the storage data area can be Store data (such as audio data, phone book, etc.) created according to the use of the mobile phone. In addition, the memory 109 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage devices.
处理器110是智能终端的控制中心,利用各种接口和线路连接整个智能终端的各个部分,通过运行或执行存储在存储器109内的软件程序和/或模块,以及调用存储在存储器109内的数据,执行智能终端的各种功能和处理数据,从而对智能终端进行整体监控。处理器110可包括一个或多个处理单元;优选的,处理器110可集成应用处理器和调制解调处理器,可选地,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器110中。The processor 110 is the control center of the smart terminal, and uses various interfaces and lines to connect various parts of the whole smart terminal, by running or executing software programs and/or modules stored in the memory 109, and calling data stored in the memory 109 , execute various functions of the smart terminal and process data, so as to monitor the smart terminal as a whole. The processor 110 may include one or more processing units; preferably, the processor 110 may integrate an application processor and a modem processor. Optionally, the application processor mainly processes operating systems, user interfaces, and application programs, etc. The demodulation processor mainly handles wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 110 .
智能终端100还可以包括给各个部件供电的电源111(比如电池),优选的,电源111可以通过电源管理系统与处理器110逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。The smart terminal 100 can also include a power supply 111 (such as a battery) for supplying power to various components. Preferably, the power supply 111 can be logically connected to the processor 110 through a power management system, so as to manage charging, discharging, and power consumption through the power management system. and other functions.
尽管图1未示出,智能终端100还可以包括蓝牙模块等,在此不再赘述。Although not shown in FIG. 1 , the smart terminal 100 may also include a Bluetooth module, etc., which will not be repeated here.
为了便于理解本申请实施例,下面对本申请的智能终端所基于的通信网络系统进行描述。In order to facilitate understanding of the embodiments of the present application, the following describes the communication network system on which the smart terminal of the present application is based.
请参阅图2,图2为本申请实施例提供的一种通信网络系统架构图,该通信网络系统为通用移动通信技术的LTE系统,该LTE系统包括依次通讯连接的UE(User Equipment,用户设备)201,E-UTRAN(Evolved UMTS Terrestrial Radio Access Network,演进式UMTS陆地无线接入网)202,EPC(Evolved Packet Core,演进式分组核心网)203和运营商的IP业务204。Please refer to FIG. 2. FIG. 2 is a structure diagram of a communication network system provided by an embodiment of the present application. The communication network system is an LTE system of general mobile communication technology. ) 201, E-UTRAN (Evolved UMTS Terrestrial Radio Access Network, Evolved UMTS Terrestrial Radio Access Network) 202, EPC (Evolved Packet Core, Evolved Packet Core Network) 203 and the operator's IP service 204.
可选地,UE201可以是上述终端100,此处不再赘述。Optionally, the UE 201 may be the above-mentioned terminal 100, which will not be repeated here.
E-UTRAN202包括eNodeB2021和其它eNodeB2022等。可选地,eNodeB2021可以通过回程(backhaul)(例如X2接口)与其它eNodeB2022连接,eNodeB2021连接到EPC203,eNodeB2021可以提供UE201到EPC203的接入。 E-UTRAN 202 includes eNodeB 2021 and other eNodeB 2022 and so on. Optionally, the eNodeB 2021 can be connected to other eNodeB 2022 through a backhaul (for example, X2 interface), the eNodeB 2021 is connected to the EPC 203 , and the eNodeB 2021 can provide access from the UE 201 to the EPC 203 .
EPC203可以包括MME(Mobility Management Entity,移动性管理实体)2031,HSS(Home Subscriber Server,归属用户服务器)2032,其它MME2033,SGW(Serving Gate Way,服务网关)2034,PGW(PDN Gate Way,分组数据网络网关)2035和PCRF(Policy and Charging Rules Function,政策和资费功能实体)2036等。可选地,MME2031是处 理UE201和EPC203之间信令的控制节点,提供承载和连接管理。HSS2032用于提供一些寄存器来管理诸如归属位置寄存器(图中未示)之类的功能,并且保存有一些有关服务特征、数据速率等用户专用的信息。所有用户数据都可以通过SGW2034进行发送,PGW2035可以提供UE 201的IP地址分配以及其它功能,PCRF2036是业务数据流和IP承载资源的策略与计费控制策略决策点,它为策略与计费执行功能单元(图中未示)选择及提供可用的策略和计费控制决策。EPC203 may include MME (Mobility Management Entity, Mobility Management Entity) 2031, HSS (Home Subscriber Server, Home Subscriber Server) 2032, other MME2033, SGW (Serving Gate Way, Serving Gateway) 2034, PGW (PDN Gate Way, packet data Network Gateway) 2035 and PCRF (Policy and Charging Rules Function, Policy and Charging Functional Entity) 2036, etc. Optionally, MME2031 is a control node that handles signaling between UE201 and EPC203, and provides bearer and connection management. HSS2032 is used to provide some registers to manage functions such as home location register (not shown in the figure), and save some user-specific information about service features and data rates. All user data can be sent through SGW2034, PGW2035 can provide UE 201 IP address allocation and other functions, PCRF2036 is the policy and charging control policy decision point of service data flow and IP bearer resources, it is the policy and charging execution function A unit (not shown) selects and provides available policy and charging control decisions.
IP业务204可以包括因特网、内联网、IMS(IP Multimedia Subsystem,IP多媒体子系统)或其它IP业务等。The IP service 204 may include Internet, Intranet, IMS (IP Multimedia Subsystem, IP Multimedia Subsystem) or other IP services.
虽然上述以LTE系统为例进行了介绍,但本领域技术人员应当知晓,本申请不仅仅适用于LTE系统,也可以适用于其他无线通信系统,例如GSM、CDMA2000、WCDMA、TD-SCDMA以及未来新的网络系统(如5G)等,此处不做限定。Although the LTE system is used as an example above, those skilled in the art should know that this application is not only applicable to the LTE system, but also applicable to other wireless communication systems, such as GSM, CDMA2000, WCDMA, TD-SCDMA and future new wireless communication systems. The network system (such as 5G), etc., is not limited here.
基于上述智能终端硬件结构以及通信网络系统,提出本申请各个实施例。Based on the above hardware structure of the smart terminal and the communication network system, various embodiments of the present application are proposed.
可选地,本申请提供一种图像数据处理方法、智能终端及存储介质,在计算摄影的数据流中,基于预设格式保存用于解读图像的语义信息,以在计算摄影涉及的数据流中统一规范计算摄影的语义信息。Optionally, the present application provides an image data processing method, an intelligent terminal and a storage medium. In the data flow of computational photography, the semantic information for interpreting images is stored based on a preset format, so that in the data flow involved in computational photography Unified specification of semantic information for computational photography.
随着技术的发展,智能终端围绕影像功能所开发的功能越来越多。同时,为了适配这些功能,以及引入新的功能,围绕影像功能所需,搭载的设备也越来越多,如激光测距雷达、高性能NPU(Neural-network Processing Unit,神经网络处理器)、云台等。通过这些设备,可以为计算摄影提供多种高层语义(本文将其称为“语义信息”)。本申请试图针对这些语义信息,提供一种通用的描述及解读方法。With the development of technology, smart terminals have developed more and more functions around image functions. At the same time, in order to adapt these functions and introduce new functions, more and more devices are required around the image function, such as laser ranging radar, high-performance NPU (Neural-network Processing Unit, neural network processor) , Yuntai, etc. Through these devices, various high-level semantics (referred to as "semantic information" in this paper) can be provided for computational photography. This application attempts to provide a general description and interpretation method for these semantic information.
第一实施例first embodiment
图3为根据第一实施例示出的图像数据处理方法的流程示意图。本申请实施例提供一种图像数据处理方法,可选地,应用于如前所述的智能终端等智能终端。如图3所示,图像数据处理方法包括以下步骤:Fig. 3 is a schematic flowchart of an image data processing method according to the first embodiment. An embodiment of the present application provides an image data processing method, which is optionally applied to a smart terminal such as the aforementioned smart terminal. As shown in Figure 3, the image data processing method includes the following steps:
S1:基于图像信息,确定或生成图像的语义信息。S1: Based on the image information, determine or generate the semantic information of the image.
可选地,图像信息包括图像基本信息和图像数据。可选地,图像数据即图像本身。图像基本信息也可以称为图像的基本描述信息,可以包括图像描述信息标识、基本描述信息长度、图像类型标识、图像长度、图像宽度、图像色彩空间、位宽和存储方式等。可选地:Optionally, the image information includes basic image information and image data. Optionally, the image data is the image itself. The basic image information can also be called the basic description information of the image, which can include image description information identification, basic description information length, image type identification, image length, image width, image color space, bit width and storage method, etc. Optionally:
图像描述信息标识,用于标识图像“基本描述信息”字段;Image description information identifier, used to identify the "basic description information" field of the image;
基本描述信息长度,表示基本描述信息字段的总长度,包含图像描述信息标识;Basic description information length, indicating the total length of the basic description information field, including the image description information identifier;
图像类型标识,用于标识影像数据类型是单帧图像、多帧图像或视频流;Image type identification, which is used to identify whether the image data type is a single-frame image, multi-frame image or video stream;
图像长度,即图像数据本身的长度;Image length, that is, the length of the image data itself;
图像宽度,即图像数据本身的宽度;Image width, that is, the width of the image data itself;
图像色彩空间,图像数据色彩空间描述,如RGGB(Bayer filter,拜耳滤色镜,也称做RGBG或GRGB),RGBW(在原有的RGB三原色上增加了白色子像素(W)),RYYB(以两个黄色子像素(Y)代替两个绿色子像素(G))等;Image color space, image data color space description, such as RGGB (Bayer filter, Bayer filter, also known as RGBG or GRGB), RGBW (adding white sub-pixels (W) to the original RGB three primary colors), RYYB (with two A yellow sub-pixel (Y) replaces two green sub-pixels (G)), etc.;
位宽,图像每个分量的比特(bit)数;Bit width, the number of bits (bits) for each component of the image;
存储方式,图像色彩空间中每个分量的每个像素在存储空间(如内存,再如闪存,还如硬盘等)中的排列方式。Storage method, the arrangement method of each pixel of each component in the image color space in the storage space (such as memory, flash memory, or hard disk, etc.).
可选地,语义信息用于解读图像。Optionally, semantic information is used to interpret the image.
可选地,常见的语义信息,包括图像的深度信息(depth information)、场景分类信息(scene classification information)、实例分割信息(Instance segmentation information)以及目标检测信息(Object detection information)等。可选地,本申请实施例的语义信息可以包括以下至少一种:深度信息、场景分类信息、实例分割信息以及目标检测信息,但不以此为限制。Optionally, common semantic information includes image depth information, scene classification information, instance segmentation information, and object detection information. Optionally, the semantic information in this embodiment of the present application may include at least one of the following: depth information, scene classification information, instance segmentation information, and object detection information, but not limited thereto.
可选地,深度信息可以包括以下至少一种:Optionally, the depth information may include at least one of the following:
深度图像,深度图像中像素值用于表征图像中像素点与成像图像时使用的相机的距离;Depth image, the pixel value in the depth image is used to represent the distance between the pixel point in the image and the camera used to image the image;
所述距离中的最大值;the maximum of said distances;
所述距离中的最小值;the minimum of said distances;
所述最大值和所述最小值之间距离的量化范围;a quantified range of the distance between said maximum value and said minimum value;
所述图像是否包含无限远部分的指示信息,无限远部分为超过设备所能侦测的最远距离的距离。可以理解,设备能侦测的一定距离范围内的信息,该距离范围包含最近距离和最远距离,超过这个最远距离可以都用无限远对应的最大值来表示;而使用次大的值,来表示当前能检测的最远距离。Whether the image contains indication information of an infinite part, where the infinite part is a distance beyond the farthest distance that the device can detect. It can be understood that the information within a certain distance range that the device can detect includes the shortest distance and the furthest distance. If the furthest distance exceeds this distance, it can be represented by the maximum value corresponding to infinity; while using the next largest value, To indicate the farthest distance that can be detected currently.
可选地,所述最大值和所述最小值之间的距离,会被等分为256份,所有像素将被量化至256份中。随后,可以生成一张与原图像分辨率相同的深度图像,如图4所示,将作为图像的另一个通道,附着在计算摄影数据流中。需要注意的时,随着设备性能发展,2的8次方的256也可以扩展为512或者更高的范围。此时能提供的距离精度将大幅度上升。Optionally, the distance between the maximum value and the minimum value will be equally divided into 256 parts, and all pixels will be quantized into 256 parts. Subsequently, a depth image with the same resolution as the original image can be generated, as shown in Figure 4, which will be attached to the computational photography data stream as another channel of the image. It should be noted that with the development of equipment performance, 2 to the 8th power of 256 can also be expanded to a range of 512 or higher. At this time, the distance accuracy that can be provided will be greatly improved.
作为另一种示例,在成像时,还有可能针对无限远距离进行拍照,如天空等,此时将 以雷达能侦测的最远距离为极限。无法侦测距离的将被标注以最大值,能侦测到距离的部分,将被等分为255份用以表示。As another example, when imaging, it is also possible to take pictures at an infinite distance, such as the sky, etc. At this time, the farthest distance that the radar can detect will be the limit. Those that cannot detect the distance will be marked with the maximum value, and the part that can detect the distance will be divided into 255 equal parts for representation.
可选地,表1为一个带有天空的图像的深度信息表达方法:Optionally, Table 1 is an expression method of depth information of an image with sky:
表1Table 1
最大值(最远距离)Maximum (farthest distance) 30米30 meters
最小值(最近距离)Minimum value (closest distance) 3米3 meters
是否有无限远部分Is there an infinite part yes
量化范围quantization range 8比特(bit)8 bits (bit)
可选地,场景分类信息用于表征图像表达的场景。可以理解,一个图像在绝大部分情况下可以被分为多种场景。如生日聚会场景中的蛋糕图像,可以表达为聚会场景,也可以表达为食物场景。所以在表达场景的时候,将列出图像最有可能属于的5个场景的概率,如表达为:Optionally, the scene classification information is used to characterize the scene represented by the image. It can be understood that an image can be divided into multiple scenes in most cases. For example, a cake image in a birthday party scene can be expressed as a party scene or a food scene. Therefore, when expressing the scene, the probability of the five scenes that the image most likely belongs to will be listed, such as:
[{1:0.5},{22:0.2},{25:0.15},{45:0.1},{55:0.05}]。[{1:0.5},{22:0.2},{25:0.15},{45:0.1},{55:0.05}].
上述示例中,{1:0.5}表示图像属于场景ID为“1”的场景的概率为0.5;{22:0.2}表示图像属于场景ID为“22”的场景的概率为0.2;{25:0.15}表示图像属于场景ID为“25”的场景的概率为0.15;{45:0.1}表示图像属于场景ID为“45”的场景的概率为0.1;{55:0.05}表示图像属于场景ID为“55”的场景的概率为0.05。In the above example, {1:0.5} means that the probability that the image belongs to the scene whose scene ID is "1" is 0.5; {22:0.2} means that the probability that the image belongs to the scene whose scene ID is "22" is 0.2; {25:0.15 } means that the probability that the image belongs to the scene whose scene ID is "25" is 0.15; {45:0.1} means that the probability that the image belongs to the scene whose scene ID is "45" is 0.1; {55:0.05} means that the image belongs to the scene whose ID is " The probability for the 55" scenario is 0.05.
可选地,场景分类信息需要一个词典,用于解析表达为数字形式的数字,到底属于哪个场景。可选地,通过表2凸显不同场景的示例。Optionally, the scene classification information needs a dictionary, which is used to resolve the number expressed in the form of numbers to which scene it belongs to. Optionally, examples of different scenarios are highlighted by Table 2.
表2Table 2
00 草地 grassland 88 日出日落sunrise and sunset
11 宠物 pet 99 食物 food
22 海滩beach 1010 商场 shopping mall
33 街景street view 1111 文本 text
44 聚会reunion 1212 鲜花 flowers
55 蓝天blue sky 1313 天空 Sky
66 绿植green plants 1414 ……...
77 人物figure 1515 ……...
可选地,表2中第一列和第三列为场景ID,表达为数字形式的数字,到底属于哪个 场景;第二列和第四列为属于该场景中的实例。Optionally, the first and third columns in Table 2 are scene IDs, expressed as numbers in digital form, which scene they belong to; the second and fourth columns are instances belonging to this scene.
可选地,实例分割信息用于表征图像中实例的分割信息。一般的,图像中都会存在大量可被分割的实例,所以,可以使用与该图像的分辨率一致的矩阵(如800*600,可以表达为图像的一个通道)来表达图像中的实例分割信息。,如图5所示例,其中的0为背景,而2/15/35均为实例ID,即实例对应的名称信息的ID。Optionally, the instance segmentation information is used to characterize the segmentation information of the instance in the image. Generally, there are a large number of instances that can be segmented in an image, so a matrix consistent with the resolution of the image (such as 800*600, which can be expressed as a channel of the image) can be used to express the instance segmentation information in the image. , as shown in Figure 5, where 0 is the background, and 2/15/35 are the instance IDs, that is, the IDs of the name information corresponding to the instances.
为了解析表达实例的数字ID所对应的真实名称,需要一个词典来表达,如表3所示。In order to resolve the real name corresponding to the digital ID of the expression instance, a dictionary is needed to express it, as shown in Table 3.
表3table 3
11 人(person)person 4141 电视(tv)television (tv)
22 消防栓(fire hydrant)fire hydrant 4242 电冰箱(refrigerator) Refrigerator
33 大象(elephant)elephant 4343 公交车(bus) bus
44 滑雪板(skis)skis 4444 猫(cat) cat
55 网球拍(tennis racket)tennis racket 4545 雨伞(umbrella) umbrella
66 三明治(sandwich)sandwich 4646 棒球手套(baseball glove) baseball glove
77 盆栽(potted plant)potted plant 4747 刀(knife) knife
88 微波炉(microwave)microwave 4848 披萨(pizza) pizza
99 吹风机(hair drier)hair dryer 4949 笔记本电脑(laptop)laptop
1010 自行车(bicycle)bicycle 5050 书(book)book
1111 停车标志(stop sign)stop sign 5151 火车(train)train
1212 熊(bear)bear 5252 狗(dog)dog
1313 滑雪板(snowboard)snowboard 5353 手提包(handbag)handbag
1414 瓶子(bottle)bottle 5454 滑板(skateboard) skateboard
1515 橙子(orange)orange 5555 勺子(spoon)spoon
1616 床(bed)bed 5656 甜甜圈(donut)donut
1717 烤箱(oven)oven 5757 老鼠(mouse)mouse
1818 牙刷(toothbrush)toothbrush 5858 钟表(clock)clock
1919 汽车(car)car 5959 卡车(truck)truck
2020 停车收费器(parking meter)parking meter 6060 马(horse)horse
21twenty one 斑马(zebra)zebra 6161 领带(tie)tie
22twenty two 运动球(sports ball)sports ball 6262 冲浪板(surfboard)surfboard
23twenty three 酒杯(wine glass)wine glass 6363 碗(bowl)bowl
24twenty four 西蓝花(broccoli)Broccoli (broccoli) 6464 蛋糕(cake)cake
2525 餐桌(dining table)dining table 6565 远程(remote)remote
2626 烤面包机(toaster)toaster 6666 花瓶(vase)vase
2727 摩托车(motorcycle)motorcycle 6767 船(boat)boat
2828 长凳(bench)bench 6868 绵羊(sheep)sheep
2929 长颈鹿(giraffe)giraffe 6969 手提箱(suitcase)Suitcase (suitcase)
3030 风筝(kite)kite 7070 香蕉(banana)banana
3131 杯子(cup)cup 7171 椅子(chair)chair
3232 胡萝卜(carrot)carrot 7272 键盘(keyboard)keyboard
3333 马桶(toilet)toilet 7373 剪刀(scissors)scissors
3434 洗碗槽(sink)sink 7474 交通灯(traffic light)traffic light
3535 飞机(airplane)airplane 7575 牛(cow)cow (cow)
3636 鸟(bird)bird 7676 飞盘(frisbee)frisbee
3737 背包(backpack)backpack 7777 苹果(apple)apple
3838 棒球棒(baseball bat)baseball bat 7878 沙发(couch)sofa
3939 餐叉(fork)fork 7979 手机(cell phone)cell phone
4040 热狗(hot dog)hot dog 8080 泰迪熊(teddy bear)teddy bear
对于图像的目标检测信息,可以理解:对于图像中可检出的目标,可以使用{目标ID:目标坐标}的形式来保存目标列表。为了解析目标ID所对应的真实名称,需要一个字典来解析。可选地,该字典可以使用以上实例分割所使用的字典。For the target detection information of the image, it can be understood that for the target that can be detected in the image, the form of {target ID: target coordinate} can be used to save the target list. In order to resolve the real name corresponding to the target ID, a dictionary is required to resolve. Optionally, the dictionary may use the dictionary used in the above example segmentation.
可选地,在实际实现中,还可根据实际情况进行组合判断,如下表4所示。Optionally, in actual implementation, combination judgments may also be made according to actual conditions, as shown in Table 4 below.
表4Table 4
组合方案Combination plan 深度信息depth information 场景分类信息scene classification information 实例分割信息Instance segmentation information 目标检测信息target detection information
组合示例1Combination Example 1 yes yes yes no
组合示例2Combination example 2 yes yes no no
组合示例3Combination Example 3 yes no yes yes
组合示例4Combination Example 4 no yes yes yes
组合示例5Combination Example 5 yes yes yes yes
……... ……... ……... ……... ……...
可选地,对于组合示例1,语义信息包括深度信息、场景分类信息和实例分割信息。Optionally, for combination example 1, the semantic information includes depth information, scene classification information and instance segmentation information.
再如,对于组合示例2,语义信息包括深度信息和场景分类信息。As another example, for combination example 2, the semantic information includes depth information and scene classification information.
还如,对于组合示例3,语义信息包括深度信息、实例分割信息和目标检测信息。Also, for combination example 3, the semantic information includes depth information, instance segmentation information and object detection information.
再如,对于组合示例4,语义信息包括场景分类信息、实例分割信息和目标检测信息。As another example, for combination example 4, the semantic information includes scene classification information, instance segmentation information and object detection information.
还如,对于组合示例5,语义信息包括深度信息、场景分类信息、实例分割信息和目标检测信息。Also, for combination example 5, the semantic information includes depth information, scene classification information, instance segmentation information and object detection information.
通过组合方案,可以根据不同需求在计算摄影的数据流中携带不同的语义信息,进而在统一规范计算摄影的语义信息的同时,针对不同应用场景提供区别的数据流,提升用户体验。Through the combination scheme, different semantic information can be carried in the computational photography data stream according to different needs, and then while standardizing the semantic information of computational photography, different data streams can be provided for different application scenarios to improve user experience.
以上所列举的仅为参考示例,为了避免冗余,这里不再一一列举,实际开发或运用中,可以根据实际需要灵活组合,但任一组合均属于本申请的技术方案,也就覆盖在本申请的保护范围之内。The above listed are only reference examples. In order to avoid redundancy, they will not be listed one by one here. In actual development or application, they can be combined flexibly according to actual needs, but any combination belongs to the technical solution of this application, which also covers the Within the protection scope of this application.
示例性地,数据流中的语义信息可以包括实例分割信息,从而使用实例分割信息来获得图像中的具体实例,从而使用单摄做到实例虚化的效果;又或者,数据流中的语义信息可以包括目标检测信息,以使相机针对性的调整自动对焦的对象,从而使焦点聚焦在想要拍摄的目标上,提升成像质量;又或者,数据流中的语义信息可以包括场景分类信息,以针对该场景进行3A参数的调整。Exemplarily, the semantic information in the data stream may include instance segmentation information, so that the instance segmentation information is used to obtain specific instances in the image, so that a single shot can be used to achieve the effect of blurring instances; or, the semantic information in the data stream Target detection information can be included, so that the camera can adjust the auto-focus object in a targeted manner, so that the focus can be focused on the target to be photographed, and the imaging quality can be improved; or, the semantic information in the data stream can include scene classification information, to Adjust the 3A parameters for this scene.
可选地,3A参数即自动对焦(AF)、自动曝光(AE)和自动白平衡(AWB)。3A数字成像技术利用了自动对焦算法、自动曝光算法及自动白平衡算法来实现图像对比度最大、改善主体拍摄物过曝光或曝光不足情况、使画面在不同光线照射下的色差得到补偿,从而呈现较高画质的图像信息。采用了3A数字成像技术的摄像机能够很好的保障图像精准的色彩还原度,呈现完美的日夜监控效果。Optionally, the 3A parameters are auto focus (AF), auto exposure (AE) and auto white balance (AWB). 3A digital imaging technology utilizes auto-focus algorithm, auto-exposure algorithm and auto-white balance algorithm to maximize the image contrast, improve the over-exposure or under-exposure of the main subject, and compensate the chromatic aberration of the picture under different light irradiation, so as to present a brighter image. High-quality image information. The camera adopting 3A digital imaging technology can well guarantee the accurate color reproduction of the image, presenting a perfect day and night monitoring effect.
随着技术发展,语义信息可以不仅仅是上述四项,也可以有更多的其他语义信息,所以该信息头需要留下足够长度。With the development of technology, the semantic information can be not only the above four items, but also more other semantic information, so the information header needs to leave a sufficient length.
可选地,语义信息包括深度信息。此时,S1步骤可以包括:基于图像信息,通过激光测距雷达和/或深度信息解析网络获得所述深度信息。可选地,深度信息解析网络用于解析图像信息以生成深度信息。Optionally, the semantic information includes depth information. At this time, step S1 may include: based on the image information, obtaining the depth information through a laser ranging radar and/or a depth information analysis network. Optionally, the depth information parsing network is used to parse image information to generate depth information.
可选地,语义信息包括场景分类信息。此时,S1步骤可以包括:基于图像信息,提 取图像的图像场景特征;根据图像场景特征,确定或生成图像的场景分类信息。可选地,上述根据图像场景特征,确定或生成图像的场景分类信息,包括:将图像场景特征输入场景分类模型,得到场景分类模型输出的所述图像对应至少一种场景的概率;在图像对应至少一种场景的概率中,确定最大概率对应的场景为图像的场景分类信息。可选地,场景分类模型用于确定图像对应至少一种场景的概率。Optionally, the semantic information includes scene classification information. At this point, step S1 may include: extracting image scene features of the image based on the image information; determining or generating scene classification information of the image according to the image scene features. Optionally, determining or generating the scene classification information of the image according to the scene characteristics of the image includes: inputting the scene characteristics of the image into the scene classification model, and obtaining the probability that the image output by the scene classification model corresponds to at least one scene; Among the probabilities of at least one scene, it is determined that the scene corresponding to the maximum probability is the scene classification information of the image. Optionally, the scene classification model is used to determine the probability that the image corresponds to at least one scene.
S2:基于预设格式,在图像的数据流中保存语义信息。S2: Preserve semantic information in the image data stream based on a preset format.
可选地,该步骤包括:基于预设格式,在图像的数据流的预留字段中填充语义信息。可选地,预设格式为至少一种语义信息的任意组合。Optionally, this step includes: based on a preset format, filling semantic information in a reserved field of the image data stream. Optionally, the preset format is any combination of at least one semantic information.
可选地,不同类型的语义信息对应的预设格式是不同的。Optionally, the preset formats corresponding to different types of semantic information are different.
可选地,预设格式包括表格、单通道位图、矩阵和键值对中的至少一种。可选地,表格如表1所示,矩阵如图5所示。Optionally, the preset format includes at least one of table, single-channel bitmap, matrix and key-value pair. Optionally, the table is shown in Table 1, and the matrix is shown in FIG. 5 .
可选地,预设格式还包括信息头,信息头用于表征图像的数据流中是否包含语义信息,和/或,信息头用于表征图像的数据流中所包含的语义信息的类型。可选地,语义信息均为可选,即在信息流中可以只包含信息头,但是没有信息本体。可选地,信息头可以表达为:Frame semantic info include:0 0 0 1,即只包含第四项;信息头也可以表达为:Frame semantic info include:0 0 0 0。信息头具体对应的包含语义信息的可以查阅表,如:Optionally, the preset format further includes an information header, which is used to indicate whether the image data stream contains semantic information, and/or, the information header is used to indicate the type of semantic information contained in the image data stream. Optionally, all the semantic information is optional, that is, the information flow may only contain information headers, but no information ontology. Optionally, the information header can be expressed as: Frame semantic info include:0 0 0 1, that is, only the fourth item is included; the information header can also be expressed as: Frame semantic info include:0 0 0 0. The information header specifically corresponds to a lookup table containing semantic information, such as:
深度信息;depth information;
场景分类信息;Scene classification information;
实例分割信息;Instance segmentation information;
目标检测信息;target detection information;
……。....
语音信息应当均为可选,具体信息头表达形式参见上文。所以信息头这一字段为一个变长字段,当信息头表述某个语义信息为存在时,即表达为0 0 0 1这样的表达形式时,该字段为对应的语义信息留出足够长度用于表述该语义信息。Voice information should be optional, see the above for the specific information header expression form. Therefore, the field of the information header is a variable-length field. When the information header indicates that a certain semantic information exists, that is, when it is expressed in the form of 0 0 0 1, this field reserves enough length for the corresponding semantic information to be used for Express this semantic information.
可选地,S2步骤之前,图像数据处理方法还可以包括:根据预设对应关系,确定语义信息对应的标识信息。可选地,语义信息包括场景分类信息、实例分割信息以及目标检测信息中的至少一种;标识信息对应包括场景ID、实例ID和目标ID中的至少一种,预设对应关系包含场景名称和场景ID的对应关系、实例名称和实例ID的对应关系以及目标名称和目标ID的对应关系中的至少一种。可选地,预设对应关系可以具体呈现为字典形式, 但本申请实施例不以此为限制,具体可根据实际需求进行相应设置。Optionally, before step S2, the image data processing method may further include: determining identification information corresponding to the semantic information according to a preset correspondence relationship. Optionally, the semantic information includes at least one of scene classification information, instance segmentation information, and target detection information; the identification information includes at least one of scene ID, instance ID, and target ID, and the preset correspondence includes scene name and At least one of the corresponding relationship between scene IDs, the corresponding relationship between instance names and instance IDs, and the corresponding relationship between target names and target IDs. Optionally, the preset correspondence relationship may be specifically presented in the form of a dictionary, but this embodiment of the present application is not limited thereto, and may be set accordingly according to actual needs.
本申请实施例的图像数据处理方法,基于图像信息,确定或生成图像的语义信息,该语义信息用于解读图像;基于预设格式,在图像的数据流中保存语义信息。通过上述方式,基于预设格式在图像的数据流中保存用于解读图像的语义信息,以在计算摄影涉及的数据流中统一规范计算摄影的语义信息。The image data processing method of the embodiment of the present application determines or generates semantic information of the image based on image information, and the semantic information is used to interpret the image; based on a preset format, the semantic information is stored in the data stream of the image. Through the above method, based on the preset format, the semantic information for interpreting the image is stored in the data stream of the image, so as to uniformly standardize the semantic information of computational photography in the data stream involved in computational photography.
可选地,在上述基础上,图像数据处理方法可以进一步包括:基于预设格式,从图像的数据流中获取语义信息;根据语义信息进行预设处理。对于这两个步骤的详细说明,可参考第二实施例,此处不再赘述。Optionally, based on the above, the image data processing method may further include: acquiring semantic information from the image data stream based on a preset format; and performing preset processing according to the semantic information. For the detailed description of these two steps, reference may be made to the second embodiment, which will not be repeated here.
第二实施例second embodiment
图6为根据第二实施例示出的图像数据处理方法的流程示意图。本申请实施例提供一种图像数据处理方法,应用于如前所述的智能终端等智能终端的计算摄影。如图6所示,图像数据处理方法包括以下步骤:Fig. 6 is a schematic flowchart of an image data processing method according to a second embodiment. An embodiment of the present application provides an image data processing method, which is applied to computational photography of a smart terminal such as the aforementioned smart terminal. As shown in Figure 6, the image data processing method includes the following steps:
S10:基于预设格式,从图像的数据流中获取语义信息。S10: Obtain semantic information from image data streams based on a preset format.
可选地,语义信息用于解读图像。Optionally, semantic information is used to interpret the image.
可选地,常见的语义信息,包括图像的深度信息(depth information)、场景分类信息(scene classification information)、实例分割信息(Instance segmentation information)以及目标检测信息(Object detection information)等。可选地,本申请实施例的语义信息可以包括以下至少一种:深度信息、场景分类信息、实例分割信息以及目标检测信息,但不以此为限制。Optionally, common semantic information includes image depth information, scene classification information, instance segmentation information, and object detection information. Optionally, the semantic information in this embodiment of the present application may include at least one of the following: depth information, scene classification information, instance segmentation information, and object detection information, but not limited thereto.
可选地,深度信息可以包括以下至少一种:Optionally, the depth information may include at least one of the following:
深度图像,深度图像中像素值用于表征图像中像素点与成像图像时使用的相机的距离;Depth image, the pixel value in the depth image is used to represent the distance between the pixel point in the image and the camera used to image the image;
所述距离中的最大值;the maximum of said distances;
所述距离中的最小值;the minimum of said distances;
所述最大值和所述最小值之间距离的量化范围;a quantified range of the distance between said maximum value and said minimum value;
所述图像是否包含无限远部分的指示信息,无限远部分为超过设备所能侦测的最远距离的距离。可以理解,设备能侦测的一定距离范围内的信息,该距离范围包含最近距离和最远距离,超过这个最远距离可以都用无限远对应的最大值来表示;而使用次大的值,来表示当前能检测的最远距离。Whether the image contains indication information of an infinite part, where the infinite part is a distance beyond the farthest distance that the device can detect. It can be understood that the information within a certain distance range that the device can detect includes the shortest distance and the furthest distance. If the furthest distance exceeds this distance, it can be represented by the maximum value corresponding to infinity; while using the next largest value, To indicate the farthest distance that can be detected currently.
可选地,所述最大值和所述最小值之间的距离,会被等分为256份,所有像素将被量 化至256份中。随后,可以生成一张与原图像分辨率相同的深度图像,如图4所示,将作为图像的另一个通道,附着在计算摄影数据流中。需要注意的时,随着设备性能发展,2的8次方的256也可以扩展为512或者更高的范围。此时能提供的距离精度将大幅度上升。Optionally, the distance between the maximum value and the minimum value will be equally divided into 256 parts, and all pixels will be quantized into 256 parts. Subsequently, a depth image with the same resolution as the original image can be generated, as shown in Figure 4, which will be attached to the computational photography data stream as another channel of the image. It should be noted that with the development of equipment performance, 2 to the 8th power of 256 can also be expanded to a range of 512 or higher. At this time, the distance accuracy that can be provided will be greatly improved.
作为另一种示例,在成像时,还有可能针对无限远距离进行拍照,如天空等,此时将以雷达能侦测的最远距离为极限。无法侦测距离的将被标注以最大值,能侦测到距离的部分,将被等分为255份用以表示。As another example, when imaging, it is also possible to take pictures at an infinite distance, such as the sky, etc. At this time, the farthest distance that the radar can detect will be the limit. Those that cannot detect the distance will be marked with the maximum value, and the part that can detect the distance will be divided into 255 equal parts for representation.
可选地,表1为一个带有天空的图像的深度信息表达方法。Optionally, Table 1 shows a method for expressing depth information of an image with sky.
可选地,场景分类信息用于表征图像表达的场景。可以理解,一个图像在绝大部分情况下可以被分为多种场景。如生日聚会场景中的蛋糕图像,可以表达为聚会场景,也可以表达为食物场景。所以在表达场景的时候,将列出图像最有可能属于的5个场景的概率,如表达为:Optionally, the scene classification information is used to characterize the scene represented by the image. It can be understood that an image can be divided into multiple scenes in most cases. For example, a cake image in a birthday party scene can be expressed as a party scene or a food scene. Therefore, when expressing the scene, the probability of the five scenes that the image most likely belongs to will be listed, such as:
[{1:0.5},{22:0.2},{25:0.15},{45:0.1},{55:0.05}]。[{1:0.5},{22:0.2},{25:0.15},{45:0.1},{55:0.05}].
上述示例中,{1:0.5}表示图像属于场景ID为“1”的场景的概率为0.5;{22:0.2}表示图像属于场景ID为“22”的场景的概率为0.2;{25:0.15}表示图像属于场景ID为“25”的场景的概率为0.15;{45:0.1}表示图像属于场景ID为“45”的场景的概率为0.1;{55:0.05}表示图像属于场景ID为“55”的场景的概率为0.05。In the above example, {1:0.5} means that the probability that the image belongs to the scene whose scene ID is "1" is 0.5; {22:0.2} means that the probability that the image belongs to the scene whose scene ID is "22" is 0.2; {25:0.15 } means that the probability that the image belongs to the scene whose scene ID is "25" is 0.15; {45:0.1} means that the probability that the image belongs to the scene whose scene ID is "45" is 0.1; {55:0.05} means that the image belongs to the scene whose ID is " The probability for the 55" scenario is 0.05.
可选地,场景分类信息需要一个词典,用于解析表达为数字形式的数字,到底属于哪个场景,如表达为以上所示的表2。Optionally, the scene classification information requires a dictionary for parsing the number expressed in digital form to which scene it belongs to, as shown in Table 2 above.
可选地,表2中第一列和第三列为场景ID,表达为数字形式的数字,到底属于哪个场景;第二列和第四列为属于该场景中的实例。Optionally, the first column and the third column in Table 2 are scene IDs, expressed as numbers in digital form, which scene they belong to; the second column and fourth column are instances belonging to the scene.
可选地,实例分割信息用于表征图像中实例的分割信息。一般的,图像中都会存在大量可被分割的实例,所以,可以使用与该图像的分辨率一致的矩阵(如800*600,可以表达为图像的一个通道)来表达图像中的实例分割信息。,如图5所示例,其中的0为背景,而2/15/35均为实例ID,即实例对应的名称信息的ID。Optionally, the instance segmentation information is used to characterize the segmentation information of the instance in the image. Generally, there are a large number of instances that can be segmented in an image, so a matrix consistent with the resolution of the image (such as 800*600, which can be expressed as a channel of the image) can be used to express the instance segmentation information in the image. , as shown in Figure 5, where 0 is the background, and 2/15/35 are the instance IDs, that is, the IDs of the name information corresponding to the instances.
为了解析表达实例的数字ID所对应的真实名称,需要一个词典来表达,如表3所示。In order to resolve the real name corresponding to the digital ID of the expression instance, a dictionary is needed to express it, as shown in Table 3.
对于图像的目标检测信息,可以理解:对于图像中可检出的目标,可以使用{目标ID:目标坐标}的形式来保存目标列表。为了解析目标ID所对应的真实名称,需要一个字典来解析。可选地,该字典可以使用以上实例分割所使用的字典。For the target detection information of the image, it can be understood that for the target that can be detected in the image, the form of {target ID: target coordinate} can be used to save the target list. In order to resolve the real name corresponding to the target ID, a dictionary is required to resolve. Optionally, the dictionary may use the dictionary used in the above example segmentation.
可选地,在实际实现中,还可根据实际情况进行组合判断,如上表4所示。Optionally, in actual implementation, combination judgments may also be made according to actual conditions, as shown in Table 4 above.
可选地,对于组合示例1,语义信息包括深度信息、场景分类信息和实例分割信息。Optionally, for combination example 1, the semantic information includes depth information, scene classification information and instance segmentation information.
可选地,对于组合示例2,语义信息包括深度信息和场景分类信息。Optionally, for combination example 2, the semantic information includes depth information and scene classification information.
可选地,对于组合示例3,语义信息包括深度信息、实例分割信息和目标检测信息。Optionally, for combination example 3, the semantic information includes depth information, instance segmentation information and object detection information.
可选地,对于组合示例4,语义信息包括场景分类信息、实例分割信息和目标检测信息。Optionally, for combination example 4, the semantic information includes scene classification information, instance segmentation information and object detection information.
可选地,对于组合示例5,语义信息包括深度信息、场景分类信息、实例分割信息和目标检测信息。Optionally, for combination example 5, the semantic information includes depth information, scene classification information, instance segmentation information and object detection information.
通过组合方案,可以根据不同需求在计算摄影的数据流中携带不同的语义信息,进而在统一规范计算摄影的语义信息的同时,针对不同应用场景提供区别的数据流,提升用户体验。Through the combination scheme, different semantic information can be carried in the computational photography data stream according to different needs, and then while standardizing the semantic information of computational photography, different data streams can be provided for different application scenarios to improve user experience.
以上所列举的仅为参考示例,为了避免冗余,这里不再一一列举,实际开发或运用中,可以根据实际需要灵活组合,但任一组合均属于本申请的技术方案,也就覆盖在本申请的保护范围之内。The above listed are only reference examples. In order to avoid redundancy, they will not be listed one by one here. In actual development or application, they can be combined flexibly according to actual needs, but any combination belongs to the technical solution of this application, which also covers the Within the protection scope of this application.
随着技术发展,语义信息可以不仅仅是上述四项,也可以有更多的其他语义信息,所以该信息头需要留下足够长度。With the development of technology, the semantic information can be not only the above four items, but also more other semantic information, so the information header needs to leave a sufficient length.
可选地,预设格式为至少一种语义信息的任意组合。Optionally, the preset format is any combination of at least one semantic information.
可选地,不同类型的语义信息对应的预设格式是不同的。Optionally, the preset formats corresponding to different types of semantic information are different.
可选地,预设格式包括表格、单通道位图、矩阵和键值对中的至少一种。可选地,表格如表1所示,矩阵如图5所示。Optionally, the preset format includes at least one of table, single-channel bitmap, matrix and key-value pair. Optionally, the table is shown in Table 1, and the matrix is shown in FIG. 5 .
可选地,预设格式还包括信息头,信息头用于表征图像的数据流中是否包含语义信息,和/或,信息头用于表征图像的数据流中所包含的语义信息的类型。可选地,语义信息均为可选,即在信息流中可以只包含信息头,但是没有信息本体。可选地,信息头可以表达为:Frame semantic info include:0 0 0 1,即只包含第四项;信息头也可以表达为:Frame semantic info include:0 0 0 0。信息头具体对应的包含语义信息的可以查阅表,如:Optionally, the preset format further includes an information header, which is used to indicate whether the image data stream contains semantic information, and/or, the information header is used to indicate the type of semantic information contained in the image data stream. Optionally, all the semantic information is optional, that is, the information flow may only contain information headers, but no information ontology. Optionally, the information header can be expressed as: Frame semantic info include:0 0 0 1, that is, only the fourth item is included; the information header can also be expressed as: Frame semantic info include:0 0 0 0. The information header specifically corresponds to a lookup table containing semantic information, such as:
深度信息;depth information;
场景分类信息;Scene classification information;
实例分割信息;Instance segmentation information;
目标检测信息;target detection information;
……。....
语音信息应当均为可选,具体信息头表达形式参见上文。所以信息头这一字段为一个 变长字段,当信息头表述某个语义信息为存在时,即表达为0 0 0 1这样的表达形式时,该字段为对应的语义信息留出足够长度用于表述该语义信息。Voice information should be optional, see the above for the specific information header expression form. Therefore, the field of the information header is a variable-length field. When the information header indicates that a certain semantic information exists, that is, when it is expressed in the form of 0 0 0 1, this field reserves enough length for the corresponding semantic information to be used for Express this semantic information.
可选地,一种具体实现中,该S10步骤,包括:基于预设格式,在图像的数据流的预留字段中读取语义信息。Optionally, in a specific implementation, the step S10 includes: reading semantic information in a reserved field of the image data stream based on a preset format.
S20:根据语义信息进行预设处理。S20: Perform preset processing according to the semantic information.
可选地,语义信息包含场景分类信息。该情况下,S20步骤可以包括:根据场景分类信息,调整相机在对应场景下的目标参数。可选地,目标参数包括3A参数、显示查找表或其他涉及成像质量的参数中的至少一种。Optionally, the semantic information includes scene classification information. In this case, step S20 may include: adjusting the target parameters of the camera in the corresponding scene according to the scene classification information. Optionally, the target parameters include at least one of 3A parameters, a display lookup table, or other parameters related to imaging quality.
可选地,3A参数即自动对焦(AF)参数、自动曝光(AE)参数和自动白平衡(AWB)参数。3A数字成像技术利用了自动对焦算法、自动曝光算法及自动白平衡算法来实现图像对比度最大、改善主体拍摄物过曝光或曝光不足情况、使画面在不同光线照射下的色差得到补偿,从而呈现较高画质的图像信息。采用了3A数字成像技术的摄像机能够很好的保障图像精准的色彩还原度,呈现完美的日夜监控效果。Optionally, the 3A parameters are auto focus (AF) parameters, auto exposure (AE) parameters and auto white balance (AWB) parameters. 3A digital imaging technology utilizes auto-focus algorithm, auto-exposure algorithm and auto-white balance algorithm to maximize the image contrast, improve the over-exposure or under-exposure of the main subject, and compensate the chromatic aberration of the picture under different light irradiation, so as to present a brighter image. High-quality image information. The camera adopting 3A digital imaging technology can well guarantee the accurate color reproduction of the image, presenting a perfect day and night monitoring effect.
可选地,语义信息包含实例分割信息。该情况下,S20步骤可以包括:根据实例分割信息,获得图像中的目标实例;根据目标实例,对图像进行预设处理。可选地,预设处理可以包括实例虚化、实例变形、实例留色、针对实例进行映射处理以及和针对背景进行映射处理等处理中的至少一种。在语义信息包含实例分割信息时,可以使用实例分割信息来获得图像中的具体实例,从而做到实例虚化、实例变形、实例留色、针对实例进行映射处理以及和针对背景进行映射处理等处理后的效果。Optionally, the semantic information includes instance segmentation information. In this case, step S20 may include: obtaining the target instance in the image according to the instance segmentation information; and performing preset processing on the image according to the target instance. Optionally, the preset processing may include at least one of processing such as instance blurring, instance deformation, instance color retention, mapping processing for the instance, and mapping processing for the background. When the semantic information includes instance segmentation information, the instance segmentation information can be used to obtain specific instances in the image, so as to achieve instance blurring, instance deformation, instance color retention, mapping processing for instances, and mapping processing for backgrounds, etc. after effect.
可选地,语义信息包含目标检测信息。该情况下,S20步骤可以包括:根据目标检测信息,调整相机自动对焦的目标,以使相机针对性的调整自动对焦的对象,从而使焦点聚焦在想要拍摄的目标上,提升成像质量。Optionally, the semantic information includes target detection information. In this case, step S20 may include: adjusting the camera's auto-focus target according to the target detection information, so that the camera can specifically adjust the auto-focus target, so that the focus is on the target to be photographed, and the imaging quality is improved.
本申请实施例的图像数据处理方法,基于预设格式,从图像的数据流中获取语义信息,该语义信息用于解读图像;根据语义信息进行预设处理。通过上述方式,基于预设格式在图像的数据流中保存用于解读图像的语义信息,一方面,可以在计算摄影涉及的数据流中统一规范计算摄影的语义信息;另一方面,可以在通过计算摄影进行相关处理时,对于图像的不同应用,只需从数据流中获取对应语义信息,而无需再对图像进行重复的相同处理,从而避免计算资源的浪费。In the image data processing method of the embodiment of the present application, semantic information is obtained from an image data stream based on a preset format, and the semantic information is used to interpret the image; preset processing is performed according to the semantic information. Through the above method, the semantic information used to interpret the image is stored in the image data stream based on the preset format. On the one hand, the semantic information of computational photography can be uniformly standardized in the data stream involved in computational photography; When computational photography performs related processing, for different applications of images, it only needs to obtain corresponding semantic information from the data stream, without repeating the same processing on images, thereby avoiding the waste of computing resources.
可选地,在上述基础上,在S10步骤之前,图像数据处理方法还可以进一步包括:基于图像信息,确定或生成该图像的语义信息;基于预设格式,在图像的数据流中保存语义 信息。对于这两个步骤的详细说明,可参考如前所述的第一实施例,此处不再赘述。Optionally, on the basis of the above, before step S10, the image data processing method may further include: determining or generating the semantic information of the image based on the image information; storing the semantic information in the data stream of the image based on a preset format . For the detailed description of these two steps, reference may be made to the aforementioned first embodiment, which will not be repeated here.
第三实施例third embodiment
图7为根据第三实施例示出的图像数据处理装置的结构示意图。本申请实施例提供一种图像数据处理装置。如图7所示,图像数据处理装置70包括:Fig. 7 is a schematic structural diagram of an image data processing device according to a third embodiment. An embodiment of the present application provides an image data processing device. As shown in Figure 7, the image data processing device 70 includes:
处理模块71,用于基于图像信息,确定或生成图像的语义信息;A processing module 71, configured to determine or generate semantic information of the image based on the image information;
保存模块72,用于基于预设格式,在图像的数据流中保存语义信息。The saving module 72 is configured to save semantic information in the image data stream based on a preset format.
可选地,图像信息包括图像基本信息和图像数据。可选地,图像数据即图像本身。图像基本信息也可以称为图像的基本描述信息,可以包括图像描述信息标识、基本描述信息长度、图像类型标识、图像长度、图像宽度、图像色彩空间、位宽和存储方式等。可选地:Optionally, the image information includes basic image information and image data. Optionally, the image data is the image itself. The basic image information can also be called the basic description information of the image, which can include image description information identification, basic description information length, image type identification, image length, image width, image color space, bit width and storage method, etc. Optionally:
图像描述信息标识,用于标识图像“基本描述信息”字段;Image description information identifier, used to identify the "basic description information" field of the image;
基本描述信息长度,表示基本描述信息字段的总长度,包含图像描述信息标识;Basic description information length, indicating the total length of the basic description information field, including the image description information identifier;
图像类型标识,用于标识影像数据类型是单帧图像、多帧图像或视频流;Image type identification, which is used to identify whether the image data type is a single-frame image, multi-frame image or video stream;
图像长度,即图像数据本身的长度;Image length, that is, the length of the image data itself;
图像宽度,即图像数据本身的宽度;Image width, that is, the width of the image data itself;
图像色彩空间,图像数据色彩空间描述,如RGGB(也称做RGBG或GRGB),RGBW,RYYB等;Image color space, image data color space description, such as RGGB (also called RGBG or GRGB), RGBW, RYYB, etc.;
位宽,图像每个分量的比特数;Bit width, the number of bits per component of the image;
存储方式,图像色彩空间中每个分量的每个像素在存储空间(如内存,再如闪存,还如硬盘等)中的排列方式。Storage method, the arrangement method of each pixel of each component in the image color space in the storage space (such as memory, flash memory, or hard disk, etc.).
可选地,语义信息用于解读图像。Optionally, semantic information is used to interpret the image.
可选地,语义信息包括以下至少一种:深度信息、场景分类信息、实例分割信息以及目标检测信息。Optionally, the semantic information includes at least one of the following: depth information, scene classification information, instance segmentation information, and object detection information.
可选地,场景分类信息用于表征图像表达的场景。Optionally, the scene classification information is used to characterize the scene represented by the image.
可选地,实例分割信息用于表征图像中实例分割信息。Optionally, the instance segmentation information is used to represent the instance segmentation information in the image.
可选地,深度信息包括以下至少一种:Optionally, the depth information includes at least one of the following:
深度图像,深度图像中像素值用于表征图像中像素点与成像图像时使用的相机的距离;Depth image, the pixel value in the depth image is used to represent the distance between the pixel point in the image and the camera used to image the image;
所述距离中的最大值;the maximum of said distances;
所述距离中的最小值;the minimum of said distances;
所述最大值和所述最小值之间距离的量化范围;a quantified range of the distance between said maximum value and said minimum value;
图像是否包含无限远部分的指示信息,无限远部分为超过设备所能侦测的最远距离的距离。可以理解,设备能侦测的一定距离范围内的信息,该距离范围包含最近距离和最远距离,超过这个最远距离可以都用无限远对应的最大值来表示;而使用次大的值,来表示当前能检测的最远距离。An indication of whether the image contains an infinity portion, which is a distance beyond the furthest the device can detect. It can be understood that the information within a certain distance range that the device can detect includes the shortest distance and the furthest distance. If the furthest distance exceeds this distance, it can be represented by the maximum value corresponding to infinity; while using the next largest value, To indicate the farthest distance that can be detected currently.
可选地,语义信息包括深度信息,处理模块71具体用于:基于图像信息,通过激光测距雷达和/或深度信息解析网络获得所述深度信息。Optionally, the semantic information includes depth information, and the processing module 71 is specifically configured to: obtain the depth information based on the image information through a laser ranging radar and/or a depth information analysis network.
可选地,深度信息解析网络用于解析图像信息以生成深度信息。Optionally, the depth information parsing network is used to parse image information to generate depth information.
可选地,语义信息包括场景分类信息,处理模块71还用于:基于图像信息,提取图像的图像场景特征;根据图像场景特征,确定或生成图像的场景分类信息。Optionally, the semantic information includes scene classification information, and the processing module 71 is further configured to: extract the image scene features of the image based on the image information; determine or generate the scene classification information of the image according to the image scene features.
可选地,处理模块71还用于:将图像场景特征输入场景分类模型,得到场景分类模型输出的所述图像对应至少一种场景的概率;在图像对应至少一种场景的概率中,确定最大概率对应的场景为图像的场景分类信息。可选地,场景分类模型用于确定图像对应至少一种场景的概率。Optionally, the processing module 71 is further configured to: input image scene features into the scene classification model to obtain the probability that the image output by the scene classification model corresponds to at least one scene; among the probability that the image corresponds to at least one scene, determine the maximum The scene corresponding to the probability is the scene classification information of the image. Optionally, the scene classification model is used to determine the probability that the image corresponds to at least one scene.
可选地,保存模块72具体用于:基于预设格式,在图像的数据流的预留字段中填充语义信息。可选地,预设格式为至少一种语义信息的任意组合。Optionally, the saving module 72 is specifically configured to: fill semantic information in a reserved field of the image data stream based on a preset format. Optionally, the preset format is any combination of at least one semantic information.
可选地,不同类型的语义信息对应的预设格式是不同的。Optionally, the preset formats corresponding to different types of semantic information are different.
可选地,预设格式包括表格、单通道位图、矩阵和键值对中的至少一种。Optionally, the preset format includes at least one of table, single-channel bitmap, matrix and key-value pair.
可选地,预设格式还包括信息头,信息头用于表征图像的数据流中是否包含语义信息,和/或,信息头用于表征图像的数据流中所包含的语义信息的类型。Optionally, the preset format further includes an information header, which is used to indicate whether the image data stream contains semantic information, and/or, the information header is used to indicate the type of semantic information contained in the image data stream.
可选地,保存模块还用于:基于预设格式,在图像的数据流中保存语义信息之前,根据预设对应关系,确定语义信息对应的标识信息。可选地,语义信息包括场景分类信息、实例分割信息以及目标检测信息中的至少一种;标识信息对应包括场景ID、实例ID和目标ID中的至少一种,预设对应关系包含场景名称和场景ID的对应关系、实例名称和实例ID的对应关系以及目标名称和目标ID的对应关系中的至少一种。Optionally, the saving module is further configured to: based on a preset format, before saving the semantic information in the image data stream, determine the identification information corresponding to the semantic information according to a preset correspondence. Optionally, the semantic information includes at least one of scene classification information, instance segmentation information, and target detection information; the identification information includes at least one of scene ID, instance ID, and target ID, and the preset correspondence includes scene name and At least one of the corresponding relationship between scene IDs, the corresponding relationship between instance names and instance IDs, and the corresponding relationship between target names and target IDs.
可选地,在上述基础上,处理模块71还可以用于:基于预设格式,从图像的数据流中获取语义信息;根据语义信息进行预设处理。对于这两个步骤的详细说明,可参考第二实施例,此处不再赘述。Optionally, based on the above, the processing module 71 may also be configured to: acquire semantic information from the image data stream based on a preset format; perform preset processing according to the semantic information. For the detailed description of these two steps, reference may be made to the second embodiment, which will not be repeated here.
第四实施例Fourth embodiment
图8为根据第四实施例示出的图像数据处理装置的结构示意图。本申请实施例提供一种图像数据处理装置。如图8所示,图像数据处理装置80包括:Fig. 8 is a schematic structural diagram of an image data processing device according to a fourth embodiment. An embodiment of the present application provides an image data processing device. As shown in Figure 8, the image data processing device 80 includes:
获取模块81,用于基于预设格式,从图像的数据流中获取语义信息;An acquisition module 81, configured to acquire semantic information from the image data stream based on a preset format;
处理模块82,用于根据语义信息进行预设处理。The processing module 82 is configured to perform preset processing according to the semantic information.
可选地,语义信息用于解读图像。Optionally, semantic information is used to interpret the image.
可选地,语义信息包括以下至少一种:Optionally, the semantic information includes at least one of the following:
深度图像,深度图像中像素值用于表征图像中像素点与成像图像时使用的相机的距离;Depth image, the pixel value in the depth image is used to represent the distance between the pixel point in the image and the camera used to image the image;
所述距离中的最大值;the maximum of said distances;
所述距离中的最小值;the minimum of said distances;
所述最大值和所述最小值之间距离的量化范围;a quantified range of the distance between said maximum value and said minimum value;
图像是否包含无限远部分的指示信息,无限远部分为超过设备所能侦测的最远距离的距离;Whether the image contains an indication of an infinity part, which is a distance beyond the furthest distance that the device can detect;
场景分类信息;Scene classification information;
实例分割信息;Instance segmentation information;
目标检测信息。Target detection information.
可选地,场景分类信息用于表征图像表达的场景。Optionally, the scene classification information is used to characterize the scene represented by the image.
可选地,实例分割信息用于表征图像中实例分割信息。Optionally, the instance segmentation information is used to represent the instance segmentation information in the image.
可选地,获取模块81具体用于:基于预设格式,在图像的数据流的预留字段中读取语义信息。可选地,预设格式为至少一种语义信息的任意组合。Optionally, the obtaining module 81 is specifically configured to: read semantic information in a reserved field of the image data stream based on a preset format. Optionally, the preset format is any combination of at least one semantic information.
可选地,不同类型的语义信息对应的预设格式是不同的。Optionally, the preset formats corresponding to different types of semantic information are different.
可选地,预设格式包括表格、单通道位图、矩阵和键值对中的至少一种。Optionally, the preset format includes at least one of table, single-channel bitmap, matrix and key-value pair.
可选地,预设格式还包括信息头,信息头用于表征图像的数据流中是否包含语义信息,和/或,信息头用于表征图像的数据流中所包含的语义信息的类型。Optionally, the preset format further includes an information header, which is used to indicate whether the image data stream contains semantic information, and/or, the information header is used to indicate the type of semantic information contained in the image data stream.
可选地,处理模块82具体用于以下至少一种:Optionally, the processing module 82 is specifically used for at least one of the following:
语义信息包含场景分类信息,根据场景分类信息,调整相机在对应场景下的目标参数;Semantic information includes scene classification information, according to the scene classification information, adjust the target parameters of the camera in the corresponding scene;
语义信息包含目标检测信息,根据目标检测信息,调整相机自动对焦的目标;Semantic information includes target detection information, according to the target detection information, adjust the target of the camera's automatic focus;
语义信息包含实例分割信息,根据实例分割信息,获得图像中的目标实例;根据目标实例,对图像进行预设处理。Semantic information includes instance segmentation information, according to the instance segmentation information, the target instance in the image is obtained; according to the target instance, the image is preset.
可选地,预设处理包括实例虚化、实例变形、实例留色、针对实例进行lut映射以及和针对背景进行lut映射中的至少一种。Optionally, the preset processing includes at least one of instance blurring, instance deformation, instance color retention, performing lut mapping on instances, and performing lut mapping on backgrounds.
可选地,目标参数包括自动曝光参数、显示查找表、自动对焦参数和白平衡参数中的 至少一种。Optionally, the target parameters include at least one of automatic exposure parameters, display lookup tables, automatic focus parameters and white balance parameters.
可选地,在上述基础上,处理模块82还用于:基于图像信息,确定或生成该图像的语义信息;基于预设格式,在图像的数据流中保存语义信息。对于这两个步骤的详细说明,可参考如前所述的第一实施例,此处不再赘述。Optionally, on the basis of the above, the processing module 82 is further configured to: determine or generate semantic information of the image based on the image information; and save the semantic information in the data stream of the image based on a preset format. For the detailed description of these two steps, reference may be made to the aforementioned first embodiment, which will not be repeated here.
第五实施例fifth embodiment
图9为根据第五实施例示出的智能终端的结构示意图。本申请实施例提供一种智能终端,如图9所示,智能终端90包括存储器91和处理器92,存储器91上存储有图像数据处理程序,图像数据处理程序被处理器92执行时实现上述任一实施例中的图像数据处理方法的步骤,其实现原理以及有益效果类似,此处不再进行赘述。Fig. 9 is a schematic structural diagram of a smart terminal according to a fifth embodiment. An embodiment of the present application provides an intelligent terminal. As shown in FIG. 9 , an intelligent terminal 90 includes a memory 91 and a processor 92. An image data processing program is stored in the memory 91. When the image data processing program is executed by the processor 92, any of the above-mentioned The steps of the image data processing method in an embodiment have similar implementation principles and beneficial effects, and will not be repeated here.
可选地,上述智能终端90还包括通信接口93,该通信接口93可以通过总线94与处理器92连接。处理器92可以控制通信接口93来实现智能终端90的接收和发送的功能。Optionally, the above-mentioned smart terminal 90 further includes a communication interface 93 , and the communication interface 93 may be connected to the processor 92 through the bus 94 . The processor 92 can control the communication interface 93 to implement the receiving and sending functions of the smart terminal 90 .
上述以软件功能模块的形式实现的集成的模块,可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(英文:processor)执行本申请各个实施例方法的部分步骤。The above-mentioned integrated modules implemented in the form of software function modules can be stored in a computer-readable storage medium. The above-mentioned software function modules are stored in a storage medium, and include several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or a processor (English: processor) to execute the methods of the various embodiments of the present application. partial steps.
本申请实施例还提供一种计算机可读存储介质,存储介质上存储有图像数据处理程序,图像数据处理程序被处理器执行时实现上述任一实施例中的图像数据处理方法的步骤。An embodiment of the present application further provides a computer-readable storage medium, on which an image data processing program is stored, and when the image data processing program is executed by a processor, the steps of the image data processing method in any of the foregoing embodiments are implemented.
在本申请提供的智能终端和计算机可读存储介质的实施例中,可以包含任一上述图像数据处理方法实施例的全部技术特征,说明书拓展和解释内容与上述方法的各实施例基本相同,在此不再做赘述。In the embodiment of the smart terminal and the computer-readable storage medium provided in this application, all the technical features of any of the above image data processing method embodiments may be included, and the expansion and explanation of the description are basically the same as the embodiments of the above methods. This will not be repeated here.
本申请实施例还提供一种计算机程序产品,计算机程序产品包括计算机程序代码,当计算机程序代码在计算机上运行时,使得计算机执行如上各种可能的实施方式中的方法。An embodiment of the present application further provides a computer program product, the computer program product includes computer program code, and when the computer program code is run on the computer, the computer is made to execute the methods in the above various possible implementation manners.
本申请实施例还提供一种芯片,包括存储器和处理器,存储器用于存储计算机程序,处理器用于从存储器中调用并运行计算机程序,使得安装有芯片的设备执行如上各种可能的实施方式中的方法。The embodiment of the present application also provides a chip, including a memory and a processor. The memory is used to store a computer program, and the processor is used to call and run the computer program from the memory, so that the device installed with the chip executes the above various possible implementation modes. Methods.
可以理解,上述场景仅是作为示例,并不构成对于本申请实施例提供的技术方案的应用场景的限定,本申请的技术方案还可应用于其他场景。可选地,本领域普通技术人员可知,随着系统架构的演变和新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。It can be understood that the above scenario is only an example, and does not constitute a limitation on the application scenario of the technical solution provided by the embodiment of the present application, and the technical solution of the present application can also be applied to other scenarios. Optionally, those skilled in the art know that with the evolution of the system architecture and the emergence of new business scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the above embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments.
本申请实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减。The steps in the methods of the embodiments of the present application can be adjusted, combined and deleted according to actual needs.
本申请实施例设备中的单元可以根据实际需要进行合并、划分和删减。Units in the device in the embodiment of the present application may be combined, divided and deleted according to actual needs.
在本申请中,对于相同或相似的术语概念、技术方案和/或应用场景描述,一般只在第一次出现时进行详细描述,后面再重复出现时,为了简洁,一般未再重复阐述,在理解本申请技术方案等内容时,对于在后未详细描述的相同或相似的术语概念、技术方案和/或应用场景描述等,可以参考其之前的相关详细描述。In this application, descriptions of the same or similar terms, concepts, technical solutions and/or application scenarios are generally only described in detail when they appear for the first time, and when they appear repeatedly later, for the sake of brevity, they are generally not repeated. When understanding the technical solutions and other contents of the present application, for the same or similar term concepts, technical solutions and/or application scenario descriptions that are not described in detail later, you can refer to the previous relevant detailed descriptions.
在本申请中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In this application, the description of each embodiment has its own emphasis. For the parts that are not detailed or recorded in a certain embodiment, please refer to the relevant descriptions of other embodiments.
本申请技术方案的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本申请记载的范围。The various technical features of the technical solution of the present application can be combined arbitrarily. For the sake of concise description, all possible combinations of the various technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, all It should be regarded as the scope described in this application.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,被控终端,或者网络设备等)执行本申请每个实施例的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or in other words, the part that contributes to the prior art, and the computer software product is stored in one of the above storage media (such as ROM/RAM, magnetic CD, CD), including several instructions to make a terminal device (which may be a mobile phone, computer, server, controlled terminal, or network device, etc.) execute the method of each embodiment of the present application.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络,或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,可选地,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质,(可选地,软盘、存储盘、磁带)、光介质(可选地,DVD),或者半导体介质(例如固态存储盘Solid State Disk(SSD))等。In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. A computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. The computer can be a general purpose computer, special purpose computer, a computer network, or other programmable apparatus. Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, alternatively, computer instructions may be transferred from a website site, computer, server or data center via a wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server, a data center, etc. integrated with one or more available media. Usable media may be magnetic media, (optionally floppy disks, memory disks, tapes), optical media (optionally, DVD), or semiconductor media (eg Solid State Disk (SSD)), etc.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only preferred embodiments of the present application, and are not intended to limit the patent scope of the present application. All equivalent structures or equivalent process transformations made by using the description of the application and the accompanying drawings are directly or indirectly used in other related technical fields. , are all included in the patent protection scope of the present application in the same way.

Claims (15)

  1. 一种图像数据处理方法,其中,包括以下步骤:A kind of image data processing method, wherein, comprise the following steps:
    S1:基于图像信息,确定或生成所述图像的语义信息;S1: Determine or generate semantic information of the image based on the image information;
    S2:基于预设格式,在所述图像的数据流中保存所述语义信息。S2: Based on a preset format, save the semantic information in the data stream of the image.
  2. 根据权利要求1所述的方法,其中,所述语义信息包括以下至少一种:The method according to claim 1, wherein the semantic information includes at least one of the following:
    深度信息、场景分类信息、实例分割信息以及目标检测信息。Depth information, scene classification information, instance segmentation information, and object detection information.
  3. 根据权利要求2所述的方法,其中,所述深度信息包括以下至少一种:The method according to claim 2, wherein the depth information includes at least one of the following:
    深度图像,所述深度图像中像素值用于表征所述图像中像素点与成像所述图像时使用的相机的距离;Depth image, the pixel value in the depth image is used to represent the distance between the pixel point in the image and the camera used when imaging the image;
    所述距离中的最大值;the maximum of said distances;
    所述距离中的最小值;the minimum of said distances;
    所述最大值和所述最小值之间距离的量化范围;a quantified range of the distance between said maximum value and said minimum value;
    所述图像是否包含无限远部分的指示信息。An indication of whether the image contains an infinite portion.
  4. 根据权利要求2所述的方法,其中,所述语义信息包括所述深度信息,所述S1步骤,包括:The method according to claim 2, wherein the semantic information includes the depth information, and the S1 step includes:
    基于图像信息,通过激光测距雷达和/或深度信息解析网络获得所述深度信息。Based on the image information, the depth information is obtained through a laser ranging radar and/or a depth information parsing network.
  5. 根据权利要求2所述的方法,其中,所述语义信息包括所述场景分类信息,所述S1步骤,包括:The method according to claim 2, wherein the semantic information includes the scene classification information, and the S1 step includes:
    基于图像信息,提取所述图像的图像场景特征;Based on the image information, extracting image scene features of the image;
    根据所述图像场景特征,确定或生成所述图像的场景分类信息。Determine or generate scene classification information of the image according to the scene feature of the image.
  6. 根据权利要求5所述的方法,其中,所述根据所述图像场景特征,确定或生成所述图像的场景分类信息,包括:The method according to claim 5, wherein said determining or generating the scene classification information of the image according to the scene features of the image comprises:
    将所述图像场景特征输入场景分类模型,得到所述场景分类模型输出的所述图像对应至少一种场景的概率;Inputting the image scene features into a scene classification model to obtain the probability that the image output by the scene classification model corresponds to at least one scene;
    在所述图像对应至少一种场景的概率中,确定最大概率对应的场景为所述图像的场景分类信息。Among the probabilities that the image corresponds to at least one scene, it is determined that the scene corresponding to the highest probability is the scene classification information of the image.
  7. 根据权利要求1至6中任一项所述的方法,其中,所述S2步骤,包括:The method according to any one of claims 1 to 6, wherein the S2 step comprises:
    基于所述预设格式,在所述图像的数据流的预留字段中填充所述语义信息。Based on the preset format, the semantic information is filled in a reserved field of the image data stream.
  8. 根据权利要求1至6中任一项所述的方法,其中,不同类型的语义信息对应的预设格式是不同的。The method according to any one of claims 1 to 6, wherein the preset formats corresponding to different types of semantic information are different.
  9. 根据权利要求1至6中任一项所述的方法,其中,所述S2步骤之前,还包括:The method according to any one of claims 1 to 6, wherein, before the step S2, further comprising:
    根据预设对应关系,确定所述语义信息对应的标识信息。Identifying information corresponding to the semantic information is determined according to a preset corresponding relationship.
  10. 一种图像数据处理方法,其中,包括以下步骤:A kind of image data processing method, wherein, comprise the following steps:
    S10:基于预设格式,从图像的数据流中获取语义信息;S10: Obtain semantic information from the data stream of the image based on the preset format;
    S20:根据所述语义信息进行预设处理。S20: Perform preset processing according to the semantic information.
  11. 根据权利要求10所述的方法,其中,所述语义信息包括以下至少一种:The method according to claim 10, wherein the semantic information includes at least one of the following:
    深度图像,所述深度图像中像素值用于表征所述图像中像素点与成像所述图像时使用的相机的距离;Depth image, the pixel value in the depth image is used to represent the distance between the pixel point in the image and the camera used when imaging the image;
    所述距离中的最大值;the maximum of said distances;
    所述距离中的最小值;the minimum of said distances;
    所述最大值和所述最小值之间距离的量化范围;a quantified range of the distance between said maximum value and said minimum value;
    所述图像是否包含无限远部分的指示信息;an indication of whether the image contains an infinite portion;
    场景分类信息;Scene classification information;
    目标检测信息;target detection information;
    实例分割信息。Instance segmentation information.
  12. 根据权利要求10或11所述的方法,其中,所述S10步骤,包括:The method according to claim 10 or 11, wherein the S10 step comprises:
    基于所述预设格式,在所述图像的数据流的预留字段中读取所述语义信息。Based on the preset format, the semantic information is read in a reserved field of the image data stream.
  13. 根据权利要求10或11所述的方法,其中,所述S20步骤,包括以下至少一种:The method according to claim 10 or 11, wherein the S20 step includes at least one of the following:
    所述语义信息包含场景分类信息,根据场景分类信息,调整相机在对应场景下的目标参数;The semantic information includes scene classification information, and adjusts the target parameters of the camera in the corresponding scene according to the scene classification information;
    所述语义信息包含目标检测信息,根据目标检测信息,调整相机自动对焦的目标;The semantic information includes target detection information, and according to the target detection information, adjust the target of the camera's automatic focus;
    所述语义信息包含实例分割信息,根据实例分割信息,获得所述图像中的目标实例;根据所述目标实例,对所述图像进行预设处理。The semantic information includes instance segmentation information, and according to the instance segmentation information, the target instance in the image is obtained; according to the target instance, preset processing is performed on the image.
  14. 一种智能终端,其中,所述智能终端包括:存储器、处理器,其中,所述存储器上存储有图像数据处理程序,所述图像数据处理程序被所述处理器执行时实现如权利要求1或10所述的图像数据处理方法的步骤。An intelligent terminal, wherein the intelligent terminal comprises: a memory and a processor, wherein an image data processing program is stored on the memory, and when the image data processing program is executed by the processor, the implementation of claim 1 or Steps of the image data processing method described in 10.
  15. 一种计算机可读存储介质,其中,所述存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1或10所述的图像数据处理方法的步骤。A computer-readable storage medium, wherein a computer program is stored on the storage medium, and when the computer program is executed by a processor, the steps of the image data processing method according to claim 1 or 10 are realized.
PCT/CN2021/137246 2021-12-10 2021-12-10 Image data processing method, intelligent terminal, and storage medium WO2023102935A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/137246 WO2023102935A1 (en) 2021-12-10 2021-12-10 Image data processing method, intelligent terminal, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/137246 WO2023102935A1 (en) 2021-12-10 2021-12-10 Image data processing method, intelligent terminal, and storage medium

Publications (1)

Publication Number Publication Date
WO2023102935A1 true WO2023102935A1 (en) 2023-06-15

Family

ID=86729485

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/137246 WO2023102935A1 (en) 2021-12-10 2021-12-10 Image data processing method, intelligent terminal, and storage medium

Country Status (1)

Country Link
WO (1) WO2023102935A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103053166A (en) * 2010-11-08 2013-04-17 索尼公司 Stereoscopic image data transmission device, stereoscopic image data transmission method, and stereoscopic image data reception device
US20170131090A1 (en) * 2015-11-06 2017-05-11 Intel Corporation Systems, methods, and apparatuses for implementing maximum likelihood image binarization in a coded light range camera
CN111866032A (en) * 2019-04-11 2020-10-30 阿里巴巴集团控股有限公司 Data processing method and device and computing equipment
CN113487705A (en) * 2021-07-14 2021-10-08 上海传英信息技术有限公司 Image annotation method, terminal and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103053166A (en) * 2010-11-08 2013-04-17 索尼公司 Stereoscopic image data transmission device, stereoscopic image data transmission method, and stereoscopic image data reception device
US20170131090A1 (en) * 2015-11-06 2017-05-11 Intel Corporation Systems, methods, and apparatuses for implementing maximum likelihood image binarization in a coded light range camera
CN111866032A (en) * 2019-04-11 2020-10-30 阿里巴巴集团控股有限公司 Data processing method and device and computing equipment
CN113487705A (en) * 2021-07-14 2021-10-08 上海传英信息技术有限公司 Image annotation method, terminal and storage medium

Similar Documents

Publication Publication Date Title
US11941883B2 (en) Video classification method, model training method, device, and storage medium
WO2021036715A1 (en) Image-text fusion method and apparatus, and electronic device
WO2022166765A1 (en) Image processing method, mobile terminal and storage medium
CN113556492B (en) Thumbnail generation method, mobile terminal and readable storage medium
CN111737520B (en) Video classification method, video classification device, electronic equipment and storage medium
CN112181564A (en) Wallpaper generation method, mobile terminal and storage medium
WO2023010705A1 (en) Data processing method, mobile terminal, and storage medium
CN107743198B (en) Photographing method, terminal and storage medium
CN113347372A (en) Shooting light supplement method, mobile terminal and readable storage medium
WO2023108444A1 (en) Image processing method, intelligent terminal, and storage medium
WO2023102935A1 (en) Image data processing method, intelligent terminal, and storage medium
WO2023284218A1 (en) Photographing control method, and mobile terminal and storage medium
CN113286106B (en) Video recording method, mobile terminal and storage medium
WO2022095752A1 (en) Frame demultiplexing method, electronic device and storage medium
CN112532786B (en) Image display method, terminal device, and storage medium
CN114298883A (en) Image processing method, intelligent terminal and storage medium
CN114723645A (en) Image processing method, intelligent terminal and storage medium
CN114092366A (en) Image processing method, mobile terminal and storage medium
CN113901245A (en) Picture searching method, intelligent terminal and storage medium
WO2023108443A1 (en) Image processing method, smart terminal and storage medium
WO2023097446A1 (en) Video processing method, smart terminal, and storage medium
WO2023108442A1 (en) Image processing method, smart terminal, and storage medium
WO2023050413A1 (en) Image processing method, intelligent terminal, and storage medium
CN113840062B (en) Camera control method, mobile terminal and readable storage medium
CN114125151B (en) Image processing method, mobile terminal and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21966853

Country of ref document: EP

Kind code of ref document: A1