WO2023102935A1

WO2023102935A1 - Image data processing method, intelligent terminal, and storage medium

Info

Publication number: WO2023102935A1
Application number: PCT/CN2021/137246
Authority: WO
Inventors: 应贲
Original assignee: 深圳传音控股股份有限公司
Priority date: 2021-12-10
Filing date: 2021-12-10
Publication date: 2023-06-15

Abstract

The present application provides an image data processing method, an intelligent terminal, and a storage medium. The image data processing method comprises: determining or generating semantic information of an image on the basis of image information; and storing the semantic information in a data stream of the image on the basis of a preset format. The semantic information of photography is calculated in a unified and standard manner in a data stream involved in computational photography.

Description

Image data processing method, intelligent terminal and storage medium

technical field

The present application relates to the technical field of image data processing, and in particular to an image data processing method, an intelligent terminal and a storage medium.

Background technique

Computational photography refers to digital image capture and processing techniques that use digital computation rather than optical processing. Computational photography can increase the capabilities of camera equipment, or introduce more features than film-based photography, or reduce the cost or size of camera elements.

During the process of conceiving and implementing this application, the inventors found at least the following problems: there is no unified standard for regulating the data flow involved in computational photography.

The foregoing description is provided to provide general background information and does not necessarily constitute prior art.

technical solution

In view of the above technical problems, the present application provides an image data processing method, a smart terminal and a storage medium, so as to uniformly standardize the semantic information of computational photography in the data flow involved in computational photography.

In order to solve the above technical problems, the present application provides a method for processing image data, comprising the following steps:

S1: Based on the image information, determine or generate the semantic information of the image;

S2: Based on the preset format, the data stream of the image contains semantic information.

Optionally, the image information includes basic image information and image data.

Optionally, the image data is the image itself.

Optionally, basic image information can also be referred to as basic image description information, which can include image description information identification, basic description information length, image type identification, image length, image width, image color space, bit width, and storage mode.

Optionally, the image description information identifier is used to identify the "basic description information" field of the image.

Optionally, the length of the basic description information indicates the total length of the basic description information field, including the image description information identifier.

Optionally, the image type identifier is used to identify whether the image data type is a single-frame image, multi-frame image or video stream.

Optionally, the image length is the length of the image data itself.

Optionally, the image width, that is, the width of the image data itself.

Optionally, image color space, image data color space description, such as RGGB, RGBW, RYYB, etc.

Optionally, bitwidth, the number of bits per component of the image.

Optionally, the storage mode refers to the arrangement mode of each pixel of each component in the image color space in a storage space (such as memory, flash memory, or hard disk, etc.).

Optionally, semantic information is used to interpret the image.

Optionally, the semantic information includes at least one of the following: depth information, scene classification information, instance segmentation information, and object detection information.

Optionally, the scene classification information is used to characterize the scene represented by the image.

Optionally, the instance segmentation information is used to characterize the segmentation information of the instance in the image.

Optionally, the depth information includes at least one of the following:

Depth image, the pixel value in the depth image is used to represent the distance between the pixel point in the image and the camera used to image the image;

the maximum of said distances;

the minimum of said distances;

a quantified range of the distance between said maximum value and said minimum value;

Whether the image contains indication information of an infinite part, where the infinite part is a distance beyond the farthest distance that the device can detect.

Optionally, the semantic information includes depth information, and the step S1 includes: obtaining the depth information based on the image information through a laser ranging radar and/or a depth information analysis network. Optionally, the depth information parsing network is used to parse image information to generate depth information.

Optionally, the semantic information includes scene classification information. Step S1 includes: extracting image scene features of the image based on the image information; determining or generating the scene classification information of the image according to the image scene features.

Optionally, determining or generating the scene classification information of the image according to the scene characteristics of the image includes: inputting the scene characteristics of the image into the scene classification model, and obtaining the probability that the image output by the scene classification model corresponds to at least one scene; Among the probabilities of at least one scene, it is determined that the scene corresponding to the maximum probability is the scene classification information of the image. Optionally, the scene classification model is used to determine the probability that the image corresponds to at least one scene.

Optionally, step S2 includes: based on a preset format, filling semantic information in a reserved field of the image data stream. Optionally, the preset format is any combination of at least one semantic information.

Optionally, the preset formats corresponding to different types of semantic information are different.

Optionally, the preset format includes at least one of table, single-channel bitmap, matrix and key-value pair.

Optionally, the preset format further includes an information header, which is used to indicate whether the image data stream contains semantic information, and/or, the information header is used to indicate the type of semantic information contained in the image data stream.

Optionally, before step S2, the method further includes: determining the identification information corresponding to the semantic information according to the preset corresponding relationship.

Optionally, the semantic information includes at least one of scene classification information, instance segmentation information, and target detection information; the identification information includes at least one of scene ID, instance ID, and target ID, and the preset correspondence includes scene name and At least one of the corresponding relationship between scene IDs, the corresponding relationship between instance names and instance IDs, and the corresponding relationship between target names and target IDs.

The present application also provides an image data processing method, comprising the following steps:

S10: Obtain semantic information from the data stream of the image based on the preset format;

S20: Perform preset processing according to the semantic information.

Optionally, semantic information is used to interpret the image.

Optionally, the semantic information includes at least one of the following:

the maximum of said distances;

the minimum of said distances;

Whether the image contains an indication of an infinity part, which is a distance beyond the furthest distance that the device can detect;

Scene classification information;

Instance segmentation information;

Target detection information.

Optionally, the instance segmentation information is used to represent the instance segmentation information in the image.

Optionally, step S10 includes: reading semantic information in a reserved field of the image data stream based on a preset format. Optionally, the preset format is any combination of at least one semantic information.

Optionally, step S20 includes at least one of the following:

Semantic information includes scene classification information, according to the scene classification information, adjust the target parameters of the camera in the corresponding scene;

Semantic information includes target detection information, according to the target detection information, adjust the target of the camera's automatic focus;

Semantic information includes instance segmentation information, according to the instance segmentation information, the target instance in the image is obtained; according to the target instance, the image is preset.

Optionally, the preset processing includes at least one of instance blurring, instance deformation, instance color retention, performing lut mapping on instances, and performing lut mapping on backgrounds.

Optionally, the target parameters include at least one of automatic exposure parameters, display lookup tables, automatic focus parameters and white balance parameters.

The present application also provides an image data processing device, including:

A processing module, configured to determine or generate semantic information of the image based on the image information;

The saving module is used for saving semantic information in the image data stream based on a preset format.

Optionally, the image information includes basic image information and image data. Optionally, the image data is the image itself. The basic image information can also be called the basic description information of the image, which can include image description information identification, basic description information length, image type identification, image length, image width, image color space, bit width and storage method, etc. Optionally:

Image description information identifier, used to identify the "basic description information" field of the image;

Basic description information length, indicating the total length of the basic description information field, including the image description information identifier;

Image type identification, which is used to identify whether the image data type is a single-frame image, multi-frame image or video stream;

Image length, that is, the length of the image data itself;

Image width, that is, the width of the image data itself;

Image color space, image data color space description, such as RGGB, RGBW, RYYB, etc.;

Bit width, the number of bits per component of the image;

Storage method, the arrangement method of each pixel of each component in the image color space in the storage space (such as memory, flash memory, or hard disk, etc.).

Optionally, semantic information is used to interpret the image.

Optionally, the depth information includes at least one of the following:

Depth image, the pixel value in the depth image is used to represent the distance between the pixel point in the image and the camera used when imaging the image;

the maximum of said distances;

the minimum of said distances;

An indication of whether the image contains an infinity portion, which is a distance beyond the furthest the device can detect.

Optionally, the semantic information includes depth information, and the processing module is specifically configured to: obtain the depth information through a laser ranging radar and/or a depth information analysis network based on image information.

Optionally, the depth information parsing network is used to parse image information to generate depth information.

Optionally, the semantic information includes scene classification information, and the processing module is further configured to: extract the image scene features of the image based on the image information; determine or generate the scene classification information of the image according to the image scene features.

Optionally, the processing module is also used to: input the image scene features into the scene classification model to obtain the probability that the image output by the scene classification model corresponds to at least one scene; among the probability that the image corresponds to at least one scene, determine the maximum probability The corresponding scene is scene classification information of the image. Optionally, the scene classification model is used to determine the probability that the image corresponds to at least one scene.

Optionally, the saving module is specifically configured to: fill semantic information in a reserved field of the image data stream based on a preset format. Optionally, the preset format is any combination of at least one semantic information.

Optionally, the saving module is further configured to: based on a preset format, before saving the semantic information in the image data stream, determine the identification information corresponding to the semantic information according to a preset correspondence. Optionally, the semantic information includes at least one of scene classification information, instance segmentation information, and target detection information; the identification information includes at least one of scene ID, instance ID, and target ID, and the preset correspondence includes scene name and At least one of the corresponding relationship between scene IDs, the corresponding relationship between instance names and instance IDs, and the corresponding relationship between target names and target IDs.

An acquisition module, configured to acquire semantic information from image data streams based on a preset format;

The processing module is used for performing preset processing according to the semantic information.

Optionally, semantic information is used to interpret the image.

Optionally, the semantic information includes at least one of the following:

the maximum of said distances;

the minimum of said distances;

Scene classification information;

Instance segmentation information;

Target detection information.

Optionally, the obtaining module is specifically configured to: read semantic information in a reserved field of the image data stream based on a preset format. Optionally, the preset format is any combination of at least one semantic information.

Optionally, the processing module is specifically used for at least one of the following:

The present application also provides an intelligent terminal, including: a memory and a processor, wherein an image data processing program is stored in the memory, and when the image data processing program is executed by the processor, the steps of any one of the above image data processing methods are implemented.

The present application also provides a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, the steps of any one of the above-mentioned image data processing methods are realized.

The present application also provides a computer program product, the computer program product includes a computer program; when the computer program is executed, the steps of any one of the image data processing methods above are realized.

As mentioned above, the image data processing method of the present application determines or generates the semantic information of the image based on the image information, and the semantic information is used to interpret the image; based on the preset format, the semantic information is stored in the data stream of the image. Through the above method, based on the preset format, the semantic information for interpreting the image is stored in the data stream of the image, so as to uniformly standardize the semantic information of computational photography in the data stream involved in computational photography.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description serve to explain the principles of the application. In order to more clearly illustrate the technical solutions of the embodiments of the present application, the accompanying drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, for those of ordinary skill in the art, the Under the premise, other drawings can also be obtained based on these drawings.

FIG. 1 is a schematic diagram of a hardware structure of an intelligent terminal implementing various embodiments of the present application;

FIG. 2 is a system architecture diagram of a communication network provided by an embodiment of the present application;

Fig. 3 is a schematic flowchart of an image data processing method according to a first embodiment;

FIG. 4 is an example diagram of a depth image shown in an embodiment of the present application;

FIG. 5 is an example diagram of instance segmentation information shown in an embodiment of the present application;

Fig. 6 is a schematic flowchart of an image data processing method according to a second embodiment;

Fig. 7 is a schematic structural diagram of an image data processing device according to a third embodiment;

Fig. 8 is a schematic structural diagram of an image data processing device according to a fourth embodiment;

Fig. 9 is a schematic structural diagram of a smart terminal according to a fifth embodiment.

The realization, functional features and advantages of the present application will be further described in conjunction with the embodiments and with reference to the accompanying drawings. By means of the above drawings, specific embodiments of the present application have been shown, which will be described in more detail hereinafter. These drawings and text descriptions are not intended to limit the scope of the concept of the application in any way, but to illustrate the concept of the application for those skilled in the art by referring to specific embodiments.

Embodiment of this application

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present application as recited in the appended claims.

It should be noted that, in this document, the term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or device. Without further limitations, an element defined by the statement "comprising a..." does not exclude the presence of other identical elements in the process, method, article, or device that includes the element. In addition, different implementations of the present application Components, features, and elements with the same name in the example may have the same meaning, or may have different meanings, and the specific meaning shall be determined based on the explanation in the specific embodiment or further combined with the context in the specific embodiment.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of this document, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, the word "if" as used herein may be interpreted as "at" or "when" or "in response to a determination". Furthermore, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It should be further understood that the terms "comprising", "comprising" indicate the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not exclude one or more other features, steps, operations, The existence, occurrence or addition of an element, component, item, species, and/or group. The terms "or", "and/or", "comprising at least one of" and the like used in this application may be interpreted as inclusive, or mean any one or any combination. For example, "including at least one of the following: A, B, C" means "any of the following: A; B; C; A and B; A and C; B and C; A and B and C", another example, " A, B or C" or "A, B and/or C" means "any of the following: A; B; C; A and B; A and C; B and C; A and B and C". Exceptions to this definition will only arise when combinations of elements, functions, steps or operations are inherently mutually exclusive in some way.

It should be understood that although the various steps in the flow chart in the embodiment of the present application are displayed sequentially as indicated by the arrows, these steps are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some of the steps in the figure may include multiple sub-steps or multiple stages, these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the execution order is not necessarily sequential Instead, it may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.

Depending on the context, the words "if", "if" as used herein may be interpreted as "at" or "when" or "in response to determining" or "in response to detecting". Similarly, depending on the context, the phrases "if determined" or "if detected (the stated condition or event)" could be interpreted as "when determined" or "in response to the determination" or "when detected (the stated condition or event) )" or "in response to detection of (a stated condition or event)".

It should be noted that in this article, step codes such as S1, S2, S10, and S20 are used for the purpose of expressing the corresponding content more clearly and concisely, and do not constitute a substantive limitation on the order. During implementation, S2 may be executed first and then S1, or S20 may be executed first and then S10, etc., but these should be within the protection scope of the present application.

It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.

In the following description, the use of suffixes such as 'module', 'part' or 'unit' for denoting elements is only for facilitating the description of the present application and has no specific meaning by itself. Therefore, 'module', 'part' or 'unit' may be mixedly used.

Smart terminals can be implemented in various forms. For example, the smart terminals described in this application may include mobile phones, tablet computers, notebook computers, palmtop computers, personal digital assistants (Personal Digital Assistant, PDA), portable media players (Portable Media Player, PMP), navigation devices, Smart terminals such as wearable devices, smart bracelets, and pedometers, as well as fixed terminals such as digital TVs and desktop computers.

In the subsequent description, a smart terminal will be taken as an example, and those skilled in the art will understand that, in addition to elements specially used for mobile purposes, the configurations according to the embodiments of the present application can also be applied to fixed-type terminals.

Please refer to FIG. 1, which is a schematic diagram of the hardware structure of a smart terminal implementing various embodiments of the present application. The smart terminal 100 may include: an RF (Radio Frequency, radio frequency) unit 101, a WiFi module 102, an audio output unit 103, an /V (audio/video) input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, memory 109, processor 110, and power supply 111 and other components. Those skilled in the art can understand that the smart terminal structure shown in Figure 1 does not constitute a limitation on the smart terminal, and the smart terminal may include more or less components than shown in the figure, or combine certain components, or different components layout.

The following is a specific introduction to each component of the smart terminal in conjunction with Figure 1:

The radio frequency unit 101 can be used for sending and receiving information or receiving and sending signals during a call. Specifically, after receiving the downlink information of the base station, it is processed by the processor 110; in addition, the uplink data is sent to the base station. Generally, the radio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 101 can also communicate with the network and other devices through wireless communication. The above wireless communication can use any communication standard or protocol, including but not limited to GSM (Global System of Mobile communication, Global System for Mobile Communications), GPRS (General Packet Radio Service, General Packet Radio Service), CDMA2000 (Code Division Multiple Access 2000 , Code Division Multiple Access 2000), WCDMA (Wideband Code Division Multiple Access, Wideband Code Division Multiple Access), TD-SCDMA (Time Division-Synchronous Code Division Multiple Access, Time Division Synchronous Code Division Multiple Access), FDD-LTE (Frequency Division Duplexing-Long Term Evolution, frequency division duplex long-term evolution), TDD-LTE (Time Division Duplexing-Long Term Evolution, time-division duplex long-term evolution) and 5G, etc.

WiFi is a short-distance wireless transmission technology. Through the WiFi module 102, the smart terminal can help users send and receive emails, browse web pages, and access streaming media, etc., and it provides users with wireless broadband Internet access. Although Fig. 1 shows the WiFi module 102, it can be understood that it is not an essential component of the smart terminal, and can be completely omitted as required without changing the essence of the invention.

The audio output unit 103 can store the information received by the radio frequency unit 101 or the WiFi module 102 or stored in the memory 109 when the smart terminal 100 is in a call signal receiving mode, a call mode, a recording mode, a voice recognition mode, a broadcast receiving mode, or the like. The audio data is converted into an audio signal and output as sound. Moreover, the audio output unit 103 can also provide audio output related to specific functions performed by the smart terminal 100 (optionally, call signal receiving sound, message receiving sound, etc.). The audio output unit 103 may include a speaker, a buzzer, and the like.

The A/V input unit 104 is used to receive audio or video signals. The A/V input unit 104 may include a graphics processor (Graphics Processing Unit, GPU) 1041 and a microphone 1042, and the graphics processor 1041 is used for still images obtained by an image capture device (such as a camera) or The image data of the video is processed. The processed image can be displayed on the display unit 106 . The image processed by the graphics processor 1041 may be stored in the memory 109 (or other storage medium) or sent via the radio frequency unit 101 or the WiFi module 102 . The microphone 1042 can receive sound (audio data) via the microphone 1042 in a phone call mode, a recording mode, a voice recognition mode, and the like operating modes, and can process such sound as audio data. The processed audio (voice) data can be converted into a format transmittable to a mobile communication base station via the radio frequency unit 101 for output in case of a phone call mode. The microphone 1042 may implement various types of noise cancellation (or suppression) algorithms to cancel (or suppress) noise or interference generated in the process of receiving and transmitting audio signals.

The smart terminal 100 also includes at least one sensor 105, such as a light sensor, a motion sensor, and other sensors. Optionally, the light sensor includes an ambient light sensor and a proximity sensor. Optionally, the ambient light sensor can adjust the brightness of the display panel 1061 according to the brightness of the ambient light, and the proximity sensor can turn off the display when the smart terminal 100 moves to the ear. panel 1061 and/or backlight. As a kind of motion sensor, the accelerometer sensor can detect the magnitude of acceleration in various directions (generally three axes), and can detect the magnitude and direction of gravity when it is stationary, and can be used for applications that recognize the posture of mobile phones (such as horizontal and vertical screen switching, related Games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tap), etc.; as for mobile phones, fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, Other sensors such as thermometers and infrared sensors will not be described in detail here.

The display unit 106 is used to display information input by the user or information provided to the user. The display unit 106 may include a display panel 1061, and the display panel 1061 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an organic light-emitting diode (Organic Light-Emitting Diode, OLED), or the like.

The user input unit 107 can be used to receive input numbers or character information, and generate key signal input related to user settings and function control of the smart terminal. Optionally, the user input unit 107 may include a touch panel 1071 and other input devices 1072 . The touch panel 1071, also referred to as a touch screen, can collect touch operations of the user on or near it (for example, the user uses any suitable object or accessory such as a finger or a stylus on the touch panel 1071 or near the touch panel 1071). operation), and drive the corresponding connection device according to the preset program. The touch panel 1071 may include two parts, a touch detection device and a touch controller. Optionally, the touch detection device detects the user's touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into contact coordinates , and then sent to the processor 110, and can receive the command sent by the processor 110 and execute it. In addition, the touch panel 1071 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the touch panel 1071 , the user input unit 107 may also include other input devices 1072 . Optionally, other input devices 1072 may include, but are not limited to, one or more of physical keyboards, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, etc., which are not specifically described here. limited.

Optionally, the touch panel 1071 may cover the display panel 1061. When the touch panel 1071 detects a touch operation on or near it, it transmits to the processor 110 to determine the type of the touch event, and then the processor 110 determines the touch event according to the touch event. The corresponding visual output is provided on the display panel 1061 . Although in FIG. 1, the touch panel 1071 and the display panel 1061 are used as two independent components to realize the input and output functions of the smart terminal, in some embodiments, the touch panel 1071 and the display panel 1061 can be integrated. The implementation of the input and output functions of the smart terminal is not specifically limited here.

The interface unit 108 is used as an interface through which at least one external device can be connected with the smart terminal 100 . Optionally, the external device may include a wired or wireless headset port, an external power (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device with an identification module, an audio input /Output (I/O) ports, video I/O ports, headphone ports, and more. The interface unit 108 may be used to receive input from an external device (optionally, data information, power, etc.) and transmit the received input to one or more components within the smart terminal 100 or may be used to transfer data to and from external devices.

The memory 109 can be used to store software programs as well as various data. The memory 109 can mainly include a storage program area and a storage data area. Optionally, the storage program area can store an operating system, at least one function required application program (such as a sound playback function, an image playback function, etc.) etc.; the storage data area can be Store data (such as audio data, phone book, etc.) created according to the use of the mobile phone. In addition, the memory 109 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage devices.

The processor 110 is the control center of the smart terminal, and uses various interfaces and lines to connect various parts of the whole smart terminal, by running or executing software programs and/or modules stored in the memory 109, and calling data stored in the memory 109 , execute various functions of the smart terminal and process data, so as to monitor the smart terminal as a whole. The processor 110 may include one or more processing units; preferably, the processor 110 may integrate an application processor and a modem processor. Optionally, the application processor mainly processes operating systems, user interfaces, and application programs, etc. The demodulation processor mainly handles wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 110 .

The smart terminal 100 can also include a power supply 111 (such as a battery) for supplying power to various components. Preferably, the power supply 111 can be logically connected to the processor 110 through a power management system, so as to manage charging, discharging, and power consumption through the power management system. and other functions.

Although not shown in FIG. 1 , the smart terminal 100 may also include a Bluetooth module, etc., which will not be repeated here.

In order to facilitate understanding of the embodiments of the present application, the following describes the communication network system on which the smart terminal of the present application is based.

Please refer to FIG. 2. FIG. 2 is a structure diagram of a communication network system provided by an embodiment of the present application. The communication network system is an LTE system of general mobile communication technology. ) 201, E-UTRAN (Evolved UMTS Terrestrial Radio Access Network, Evolved UMTS Terrestrial Radio Access Network) 202, EPC (Evolved Packet Core, Evolved Packet Core Network) 203 and the operator's IP service 204.

Optionally, the UE 201 may be the above-mentioned terminal 100, which will not be repeated here.

E-UTRAN 202 includes eNodeB 2021 and other eNodeB 2022 and so on. Optionally, the eNodeB 2021 can be connected to other eNodeB 2022 through a backhaul (for example, X2 interface), the eNodeB 2021 is connected to the EPC 203 , and the eNodeB 2021 can provide access from the UE 201 to the EPC 203 .

EPC203 may include MME (Mobility Management Entity, Mobility Management Entity) 2031, HSS (Home Subscriber Server, Home Subscriber Server) 2032, other MME2033, SGW (Serving Gate Way, Serving Gateway) 2034, PGW (PDN Gate Way, packet data Network Gateway) 2035 and PCRF (Policy and Charging Rules Function, Policy and Charging Functional Entity) 2036, etc. Optionally, MME2031 is a control node that handles signaling between UE201 and EPC203, and provides bearer and connection management. HSS2032 is used to provide some registers to manage functions such as home location register (not shown in the figure), and save some user-specific information about service features and data rates. All user data can be sent through SGW2034, PGW2035 can provide UE 201 IP address allocation and other functions, PCRF2036 is the policy and charging control policy decision point of service data flow and IP bearer resources, it is the policy and charging execution function A unit (not shown) selects and provides available policy and charging control decisions.

The IP service 204 may include Internet, Intranet, IMS (IP Multimedia Subsystem, IP Multimedia Subsystem) or other IP services.

Although the LTE system is used as an example above, those skilled in the art should know that this application is not only applicable to the LTE system, but also applicable to other wireless communication systems, such as GSM, CDMA2000, WCDMA, TD-SCDMA and future new wireless communication systems. The network system (such as 5G), etc., is not limited here.

Based on the above hardware structure of the smart terminal and the communication network system, various embodiments of the present application are proposed.

Optionally, the present application provides an image data processing method, an intelligent terminal and a storage medium. In the data flow of computational photography, the semantic information for interpreting images is stored based on a preset format, so that in the data flow involved in computational photography Unified specification of semantic information for computational photography.

With the development of technology, smart terminals have developed more and more functions around image functions. At the same time, in order to adapt these functions and introduce new functions, more and more devices are required around the image function, such as laser ranging radar, high-performance NPU (Neural-network Processing Unit, neural network processor) , Yuntai, etc. Through these devices, various high-level semantics (referred to as "semantic information" in this paper) can be provided for computational photography. This application attempts to provide a general description and interpretation method for these semantic information.

first embodiment

Fig. 3 is a schematic flowchart of an image data processing method according to the first embodiment. An embodiment of the present application provides an image data processing method, which is optionally applied to a smart terminal such as the aforementioned smart terminal. As shown in Figure 3, the image data processing method includes the following steps:

S1: Based on the image information, determine or generate the semantic information of the image.

Image length, that is, the length of the image data itself;

Image width, that is, the width of the image data itself;

Image color space, image data color space description, such as RGGB (Bayer filter, Bayer filter, also known as RGBG or GRGB), RGBW (adding white sub-pixels (W) to the original RGB three primary colors), RYYB (with two A yellow sub-pixel (Y) replaces two green sub-pixels (G)), etc.;

Bit width, the number of bits (bits) for each component of the image;

Optionally, semantic information is used to interpret the image.

Optionally, common semantic information includes image depth information, scene classification information, instance segmentation information, and object detection information. Optionally, the semantic information in this embodiment of the present application may include at least one of the following: depth information, scene classification information, instance segmentation information, and object detection information, but not limited thereto.

Optionally, the depth information may include at least one of the following:

the maximum of said distances;

the minimum of said distances;

Whether the image contains indication information of an infinite part, where the infinite part is a distance beyond the farthest distance that the device can detect. It can be understood that the information within a certain distance range that the device can detect includes the shortest distance and the furthest distance. If the furthest distance exceeds this distance, it can be represented by the maximum value corresponding to infinity; while using the next largest value, To indicate the farthest distance that can be detected currently.

Optionally, the distance between the maximum value and the minimum value will be equally divided into 256 parts, and all pixels will be quantized into 256 parts. Subsequently, a depth image with the same resolution as the original image can be generated, as shown in Figure 4, which will be attached to the computational photography data stream as another channel of the image. It should be noted that with the development of equipment performance, 2 to the 8th power of 256 can also be expanded to a range of 512 or higher. At this time, the distance accuracy that can be provided will be greatly improved.

As another example, when imaging, it is also possible to take pictures at an infinite distance, such as the sky, etc. At this time, the farthest distance that the radar can detect will be the limit. Those that cannot detect the distance will be marked with the maximum value, and the part that can detect the distance will be divided into 255 equal parts for representation.

Optionally, Table 1 is an expression method of depth information of an image with sky:

Table 1

最大值(最远距离)Maximum (farthest distance)	30米30 meters
最小值(最近距离)Minimum value (closest distance)	3米3 meters
是否有无限远部分Is there an infinite part	是 yes

量化范围quantization range	8比特(bit)8 bits (bit)

Optionally, the scene classification information is used to characterize the scene represented by the image. It can be understood that an image can be divided into multiple scenes in most cases. For example, a cake image in a birthday party scene can be expressed as a party scene or a food scene. Therefore, when expressing the scene, the probability of the five scenes that the image most likely belongs to will be listed, such as:

[{1:0.5},{22:0.2},{25:0.15},{45:0.1},{55:0.05}].

In the above example, {1:0.5} means that the probability that the image belongs to the scene whose scene ID is "1" is 0.5; {22:0.2} means that the probability that the image belongs to the scene whose scene ID is "22" is 0.2; {25:0.15 } means that the probability that the image belongs to the scene whose scene ID is "25" is 0.15; {45:0.1} means that the probability that the image belongs to the scene whose scene ID is "45" is 0.1; {55:0.05} means that the image belongs to the scene whose ID is " The probability for the 55" scenario is 0.05.

Optionally, the scene classification information needs a dictionary, which is used to resolve the number expressed in the form of numbers to which scene it belongs to. Optionally, examples of different scenarios are highlighted by Table 2.

Table 2

00	草地 grassland	88	日出日落sunrise and sunset
11	宠物 pet	99	食物 food
22	海滩beach	1010	商场 shopping mall
33	街景street view	1111	文本 text
44	聚会reunion	1212	鲜花 flowers
55	蓝天blue sky	1313	天空 Sky
66	绿植green plants	1414	……...
77	人物figure	1515	……...

Optionally, the first and third columns in Table 2 are scene IDs, expressed as numbers in digital form, which scene they belong to; the second and fourth columns are instances belonging to this scene.

Optionally, the instance segmentation information is used to characterize the segmentation information of the instance in the image. Generally, there are a large number of instances that can be segmented in an image, so a matrix consistent with the resolution of the image (such as 800*600, which can be expressed as a channel of the image) can be used to express the instance segmentation information in the image. , as shown in Figure 5, where 0 is the background, and 2/15/35 are the instance IDs, that is, the IDs of the name information corresponding to the instances.

In order to resolve the real name corresponding to the digital ID of the expression instance, a dictionary is needed to express it, as shown in Table 3.

table 3

11	人(person)person	4141	电视(tv)television (tv)
22	消防栓(fire hydrant)fire hydrant	4242	电冰箱(refrigerator) Refrigerator
33	大象(elephant)elephant	4343	公交车(bus) bus
44	滑雪板(skis)skis	4444	猫(cat) cat
55	网球拍(tennis racket)tennis racket	4545	雨伞(umbrella) umbrella
66	三明治(sandwich)sandwich	4646	棒球手套(baseball glove) baseball glove
77	盆栽(potted plant)potted plant	4747	刀(knife) knife
88	微波炉(microwave)microwave	4848	披萨(pizza) pizza
99	吹风机(hair drier)hair dryer	4949	笔记本电脑(laptop)laptop
1010	自行车(bicycle)bicycle	5050	书(book)book
1111	停车标志(stop sign)stop sign	5151	火车(train)train
1212	熊(bear)bear	5252	狗(dog)dog
1313	滑雪板(snowboard)snowboard	5353	手提包(handbag)handbag
1414	瓶子(bottle)bottle	5454	滑板(skateboard) skateboard
1515	橙子(orange)orange	5555	勺子(spoon)spoon
1616	床(bed)bed	5656	甜甜圈(donut)donut
1717	烤箱(oven)oven	5757	老鼠(mouse)mouse
1818	牙刷(toothbrush)toothbrush	5858	钟表(clock)clock
1919	汽车(car)car	5959	卡车(truck)truck
2020	停车收费器(parking meter)parking meter	6060	马(horse)horse
21twenty one	斑马(zebra)zebra	6161	领带(tie)tie
22twenty two	运动球(sports ball)sports ball	6262	冲浪板(surfboard)surfboard

23twenty three	酒杯(wine glass)wine glass	6363	碗(bowl)bowl
24twenty four	西蓝花(broccoli)Broccoli (broccoli)	6464	蛋糕(cake)cake
2525	餐桌(dining table)dining table	6565	远程(remote)remote
2626	烤面包机(toaster)toaster	6666	花瓶(vase)vase
2727	摩托车(motorcycle)motorcycle	6767	船(boat)boat
2828	长凳(bench)bench	6868	绵羊(sheep)sheep
2929	长颈鹿(giraffe)giraffe	6969	手提箱(suitcase)Suitcase (suitcase)
3030	风筝(kite)kite	7070	香蕉(banana)banana
3131	杯子(cup)cup	7171	椅子(chair)chair
3232	胡萝卜(carrot)carrot	7272	键盘(keyboard)keyboard
3333	马桶(toilet)toilet	7373	剪刀(scissors)scissors
3434	洗碗槽(sink)sink	7474	交通灯(traffic light)traffic light
3535	飞机(airplane)airplane	7575	牛(cow)cow (cow)
3636	鸟(bird)bird	7676	飞盘(frisbee)frisbee
3737	背包(backpack)backpack	7777	苹果(apple)apple
3838	棒球棒(baseball bat)baseball bat	7878	沙发(couch)sofa
3939	餐叉(fork)fork	7979	手机(cell phone)cell phone
4040	热狗(hot dog)hot dog	8080	泰迪熊(teddy bear)teddy bear

For the target detection information of the image, it can be understood that for the target that can be detected in the image, the form of {target ID: target coordinate} can be used to save the target list. In order to resolve the real name corresponding to the target ID, a dictionary is required to resolve. Optionally, the dictionary may use the dictionary used in the above example segmentation.

Optionally, in actual implementation, combination judgments may also be made according to actual conditions, as shown in Table 4 below.

Table 4

组合方案Combination plan	深度信息depth information	场景分类信息scene classification information	实例分割信息Instance segmentation information	目标检测信息target detection information
组合示例1Combination Example 1	是yes	是yes	是yes	否no
组合示例2Combination example 2	是yes	是yes	否no	否no
组合示例3Combination Example 3	是yes	否no	是yes	是yes
组合示例4Combination Example 4	否no	是yes	是yes	是yes

组合示例5Combination Example 5	是yes	是yes	是yes	是yes
……...	……...	……...	……...	……...

Optionally, for combination example 1, the semantic information includes depth information, scene classification information and instance segmentation information.

As another example, for combination example 2, the semantic information includes depth information and scene classification information.

Also, for combination example 3, the semantic information includes depth information, instance segmentation information and object detection information.

As another example, for combination example 4, the semantic information includes scene classification information, instance segmentation information and object detection information.

Also, for combination example 5, the semantic information includes depth information, scene classification information, instance segmentation information and object detection information.

Through the combination scheme, different semantic information can be carried in the computational photography data stream according to different needs, and then while standardizing the semantic information of computational photography, different data streams can be provided for different application scenarios to improve user experience.

The above listed are only reference examples. In order to avoid redundancy, they will not be listed one by one here. In actual development or application, they can be combined flexibly according to actual needs, but any combination belongs to the technical solution of this application, which also covers the Within the protection scope of this application.

Exemplarily, the semantic information in the data stream may include instance segmentation information, so that the instance segmentation information is used to obtain specific instances in the image, so that a single shot can be used to achieve the effect of blurring instances; or, the semantic information in the data stream Target detection information can be included, so that the camera can adjust the auto-focus object in a targeted manner, so that the focus can be focused on the target to be photographed, and the imaging quality can be improved; or, the semantic information in the data stream can include scene classification information, to Adjust the 3A parameters for this scene.

Optionally, the 3A parameters are auto focus (AF), auto exposure (AE) and auto white balance (AWB). 3A digital imaging technology utilizes auto-focus algorithm, auto-exposure algorithm and auto-white balance algorithm to maximize the image contrast, improve the over-exposure or under-exposure of the main subject, and compensate the chromatic aberration of the picture under different light irradiation, so as to present a brighter image. High-quality image information. The camera adopting 3A digital imaging technology can well guarantee the accurate color reproduction of the image, presenting a perfect day and night monitoring effect.

With the development of technology, the semantic information can be not only the above four items, but also more other semantic information, so the information header needs to leave a sufficient length.

Optionally, the semantic information includes depth information. At this time, step S1 may include: based on the image information, obtaining the depth information through a laser ranging radar and/or a depth information analysis network. Optionally, the depth information parsing network is used to parse image information to generate depth information.

Optionally, the semantic information includes scene classification information. At this point, step S1 may include: extracting image scene features of the image based on the image information; determining or generating scene classification information of the image according to the image scene features. Optionally, determining or generating the scene classification information of the image according to the scene characteristics of the image includes: inputting the scene characteristics of the image into the scene classification model, and obtaining the probability that the image output by the scene classification model corresponds to at least one scene; Among the probabilities of at least one scene, it is determined that the scene corresponding to the maximum probability is the scene classification information of the image. Optionally, the scene classification model is used to determine the probability that the image corresponds to at least one scene.

S2: Preserve semantic information in the image data stream based on a preset format.

Optionally, this step includes: based on a preset format, filling semantic information in a reserved field of the image data stream. Optionally, the preset format is any combination of at least one semantic information.

Optionally, the preset format includes at least one of table, single-channel bitmap, matrix and key-value pair. Optionally, the table is shown in Table 1, and the matrix is shown in FIG. 5 .

Optionally, the preset format further includes an information header, which is used to indicate whether the image data stream contains semantic information, and/or, the information header is used to indicate the type of semantic information contained in the image data stream. Optionally, all the semantic information is optional, that is, the information flow may only contain information headers, but no information ontology. Optionally, the information header can be expressed as: Frame semantic info include:0 0 0 1, that is, only the fourth item is included; the information header can also be expressed as: Frame semantic info include:0 0 0 0. The information header specifically corresponds to a lookup table containing semantic information, such as:

depth information;

Scene classification information;

Instance segmentation information;

target detection information;

....

Voice information should be optional, see the above for the specific information header expression form. Therefore, the field of the information header is a variable-length field. When the information header indicates that a certain semantic information exists, that is, when it is expressed in the form of 0 0 0 1, this field reserves enough length for the corresponding semantic information to be used for Express this semantic information.

Optionally, before step S2, the image data processing method may further include: determining identification information corresponding to the semantic information according to a preset correspondence relationship. Optionally, the semantic information includes at least one of scene classification information, instance segmentation information, and target detection information; the identification information includes at least one of scene ID, instance ID, and target ID, and the preset correspondence includes scene name and At least one of the corresponding relationship between scene IDs, the corresponding relationship between instance names and instance IDs, and the corresponding relationship between target names and target IDs. Optionally, the preset correspondence relationship may be specifically presented in the form of a dictionary, but this embodiment of the present application is not limited thereto, and may be set accordingly according to actual needs.

The image data processing method of the embodiment of the present application determines or generates semantic information of the image based on image information, and the semantic information is used to interpret the image; based on a preset format, the semantic information is stored in the data stream of the image. Through the above method, based on the preset format, the semantic information for interpreting the image is stored in the data stream of the image, so as to uniformly standardize the semantic information of computational photography in the data stream involved in computational photography.

Optionally, based on the above, the image data processing method may further include: acquiring semantic information from the image data stream based on a preset format; and performing preset processing according to the semantic information. For the detailed description of these two steps, reference may be made to the second embodiment, which will not be repeated here.

second embodiment

Fig. 6 is a schematic flowchart of an image data processing method according to a second embodiment. An embodiment of the present application provides an image data processing method, which is applied to computational photography of a smart terminal such as the aforementioned smart terminal. As shown in Figure 6, the image data processing method includes the following steps:

S10: Obtain semantic information from image data streams based on a preset format.

Optionally, semantic information is used to interpret the image.

Optionally, the depth information may include at least one of the following:

the maximum of said distances;

the minimum of said distances;

Optionally, Table 1 shows a method for expressing depth information of an image with sky.

[{1:0.5},{22:0.2},{25:0.15},{45:0.1},{55:0.05}].

Optionally, the scene classification information requires a dictionary for parsing the number expressed in digital form to which scene it belongs to, as shown in Table 2 above.

Optionally, the first column and the third column in Table 2 are scene IDs, expressed as numbers in digital form, which scene they belong to; the second column and fourth column are instances belonging to the scene.

Optionally, in actual implementation, combination judgments may also be made according to actual conditions, as shown in Table 4 above.

Optionally, for combination example 2, the semantic information includes depth information and scene classification information.

Optionally, for combination example 3, the semantic information includes depth information, instance segmentation information and object detection information.

Optionally, for combination example 4, the semantic information includes scene classification information, instance segmentation information and object detection information.

Optionally, for combination example 5, the semantic information includes depth information, scene classification information, instance segmentation information and object detection information.

Optionally, the preset format is any combination of at least one semantic information.

depth information;

Scene classification information;

Instance segmentation information;

target detection information;

....

Optionally, in a specific implementation, the step S10 includes: reading semantic information in a reserved field of the image data stream based on a preset format.

S20: Perform preset processing according to the semantic information.

Optionally, the semantic information includes scene classification information. In this case, step S20 may include: adjusting the target parameters of the camera in the corresponding scene according to the scene classification information. Optionally, the target parameters include at least one of 3A parameters, a display lookup table, or other parameters related to imaging quality.

Optionally, the 3A parameters are auto focus (AF) parameters, auto exposure (AE) parameters and auto white balance (AWB) parameters. 3A digital imaging technology utilizes auto-focus algorithm, auto-exposure algorithm and auto-white balance algorithm to maximize the image contrast, improve the over-exposure or under-exposure of the main subject, and compensate the chromatic aberration of the picture under different light irradiation, so as to present a brighter image. High-quality image information. The camera adopting 3A digital imaging technology can well guarantee the accurate color reproduction of the image, presenting a perfect day and night monitoring effect.

Optionally, the semantic information includes instance segmentation information. In this case, step S20 may include: obtaining the target instance in the image according to the instance segmentation information; and performing preset processing on the image according to the target instance. Optionally, the preset processing may include at least one of processing such as instance blurring, instance deformation, instance color retention, mapping processing for the instance, and mapping processing for the background. When the semantic information includes instance segmentation information, the instance segmentation information can be used to obtain specific instances in the image, so as to achieve instance blurring, instance deformation, instance color retention, mapping processing for instances, and mapping processing for backgrounds, etc. after effect.

Optionally, the semantic information includes target detection information. In this case, step S20 may include: adjusting the camera's auto-focus target according to the target detection information, so that the camera can specifically adjust the auto-focus target, so that the focus is on the target to be photographed, and the imaging quality is improved.

In the image data processing method of the embodiment of the present application, semantic information is obtained from an image data stream based on a preset format, and the semantic information is used to interpret the image; preset processing is performed according to the semantic information. Through the above method, the semantic information used to interpret the image is stored in the image data stream based on the preset format. On the one hand, the semantic information of computational photography can be uniformly standardized in the data stream involved in computational photography; When computational photography performs related processing, for different applications of images, it only needs to obtain corresponding semantic information from the data stream, without repeating the same processing on images, thereby avoiding the waste of computing resources.

Optionally, on the basis of the above, before step S10, the image data processing method may further include: determining or generating the semantic information of the image based on the image information; storing the semantic information in the data stream of the image based on a preset format . For the detailed description of these two steps, reference may be made to the aforementioned first embodiment, which will not be repeated here.

third embodiment

Fig. 7 is a schematic structural diagram of an image data processing device according to a third embodiment. An embodiment of the present application provides an image data processing device. As shown in Figure 7, the image data processing device 70 includes:

A processing module 71, configured to determine or generate semantic information of the image based on the image information;

The saving module 72 is configured to save semantic information in the image data stream based on a preset format.

Image length, that is, the length of the image data itself;

Image width, that is, the width of the image data itself;

Image color space, image data color space description, such as RGGB (also called RGBG or GRGB), RGBW, RYYB, etc.;

Bit width, the number of bits per component of the image;

Optionally, semantic information is used to interpret the image.

Optionally, the depth information includes at least one of the following:

the maximum of said distances;

the minimum of said distances;

An indication of whether the image contains an infinity portion, which is a distance beyond the furthest the device can detect. It can be understood that the information within a certain distance range that the device can detect includes the shortest distance and the furthest distance. If the furthest distance exceeds this distance, it can be represented by the maximum value corresponding to infinity; while using the next largest value, To indicate the farthest distance that can be detected currently.

Optionally, the semantic information includes depth information, and the processing module 71 is specifically configured to: obtain the depth information based on the image information through a laser ranging radar and/or a depth information analysis network.

Optionally, the semantic information includes scene classification information, and the processing module 71 is further configured to: extract the image scene features of the image based on the image information; determine or generate the scene classification information of the image according to the image scene features.

Optionally, the processing module 71 is further configured to: input image scene features into the scene classification model to obtain the probability that the image output by the scene classification model corresponds to at least one scene; among the probability that the image corresponds to at least one scene, determine the maximum The scene corresponding to the probability is the scene classification information of the image. Optionally, the scene classification model is used to determine the probability that the image corresponds to at least one scene.

Optionally, the saving module 72 is specifically configured to: fill semantic information in a reserved field of the image data stream based on a preset format. Optionally, the preset format is any combination of at least one semantic information.

Optionally, based on the above, the processing module 71 may also be configured to: acquire semantic information from the image data stream based on a preset format; perform preset processing according to the semantic information. For the detailed description of these two steps, reference may be made to the second embodiment, which will not be repeated here.

Fourth embodiment

Fig. 8 is a schematic structural diagram of an image data processing device according to a fourth embodiment. An embodiment of the present application provides an image data processing device. As shown in Figure 8, the image data processing device 80 includes:

An acquisition module 81, configured to acquire semantic information from the image data stream based on a preset format;

The processing module 82 is configured to perform preset processing according to the semantic information.

Optionally, semantic information is used to interpret the image.

Optionally, the semantic information includes at least one of the following:

the maximum of said distances;

the minimum of said distances;

Scene classification information;

Instance segmentation information;

Target detection information.

Optionally, the obtaining module 81 is specifically configured to: read semantic information in a reserved field of the image data stream based on a preset format. Optionally, the preset format is any combination of at least one semantic information.

Optionally, the processing module 82 is specifically used for at least one of the following:

Optionally, on the basis of the above, the processing module 82 is further configured to: determine or generate semantic information of the image based on the image information; and save the semantic information in the data stream of the image based on a preset format. For the detailed description of these two steps, reference may be made to the aforementioned first embodiment, which will not be repeated here.

fifth embodiment

Fig. 9 is a schematic structural diagram of a smart terminal according to a fifth embodiment. An embodiment of the present application provides an intelligent terminal. As shown in FIG. 9 , an intelligent terminal 90 includes a memory 91 and a processor 92. An image data processing program is stored in the memory 91. When the image data processing program is executed by the processor 92, any of the above-mentioned The steps of the image data processing method in an embodiment have similar implementation principles and beneficial effects, and will not be repeated here.

Optionally, the above-mentioned smart terminal 90 further includes a communication interface 93 , and the communication interface 93 may be connected to the processor 92 through the bus 94 . The processor 92 can control the communication interface 93 to implement the receiving and sending functions of the smart terminal 90 .

The above-mentioned integrated modules implemented in the form of software function modules can be stored in a computer-readable storage medium. The above-mentioned software function modules are stored in a storage medium, and include several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or a processor (English: processor) to execute the methods of the various embodiments of the present application. partial steps.

An embodiment of the present application further provides a computer-readable storage medium, on which an image data processing program is stored, and when the image data processing program is executed by a processor, the steps of the image data processing method in any of the foregoing embodiments are implemented.

In the embodiment of the smart terminal and the computer-readable storage medium provided in this application, all the technical features of any of the above image data processing method embodiments may be included, and the expansion and explanation of the description are basically the same as the embodiments of the above methods. This will not be repeated here.

An embodiment of the present application further provides a computer program product, the computer program product includes computer program code, and when the computer program code is run on the computer, the computer is made to execute the methods in the above various possible implementation manners.

The embodiment of the present application also provides a chip, including a memory and a processor. The memory is used to store a computer program, and the processor is used to call and run the computer program from the memory, so that the device installed with the chip executes the above various possible implementation modes. Methods.

It can be understood that the above scenario is only an example, and does not constitute a limitation on the application scenario of the technical solution provided by the embodiment of the present application, and the technical solution of the present application can also be applied to other scenarios. Optionally, those skilled in the art know that with the evolution of the system architecture and the emergence of new business scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.

The serial numbers of the above embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments.

The steps in the methods of the embodiments of the present application can be adjusted, combined and deleted according to actual needs.

Units in the device in the embodiment of the present application may be combined, divided and deleted according to actual needs.

In this application, descriptions of the same or similar terms, concepts, technical solutions and/or application scenarios are generally only described in detail when they appear for the first time, and when they appear repeatedly later, for the sake of brevity, they are generally not repeated. When understanding the technical solutions and other contents of the present application, for the same or similar term concepts, technical solutions and/or application scenario descriptions that are not described in detail later, you can refer to the previous relevant detailed descriptions.

In this application, the description of each embodiment has its own emphasis. For the parts that are not detailed or recorded in a certain embodiment, please refer to the relevant descriptions of other embodiments.

The various technical features of the technical solution of the present application can be combined arbitrarily. For the sake of concise description, all possible combinations of the various technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, all It should be regarded as the scope described in this application.

Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or in other words, the part that contributes to the prior art, and the computer software product is stored in one of the above storage media (such as ROM/RAM, magnetic CD, CD), including several instructions to make a terminal device (which may be a mobile phone, computer, server, controlled terminal, or network device, etc.) execute the method of each embodiment of the present application.

In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. A computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. The computer can be a general purpose computer, special purpose computer, a computer network, or other programmable apparatus. Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, alternatively, computer instructions may be transferred from a website site, computer, server or data center via a wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server, a data center, etc. integrated with one or more available media. Usable media may be magnetic media, (optionally floppy disks, memory disks, tapes), optical media (optionally, DVD), or semiconductor media (eg Solid State Disk (SSD)), etc.

The above are only preferred embodiments of the present application, and are not intended to limit the patent scope of the present application. All equivalent structures or equivalent process transformations made by using the description of the application and the accompanying drawings are directly or indirectly used in other related technical fields. , are all included in the patent protection scope of the present application in the same way.

Claims

A kind of image data processing method, wherein, comprise the following steps:

S1: Determine or generate semantic information of the image based on the image information;

S2: Based on a preset format, save the semantic information in the data stream of the image.
The method according to claim 1, wherein the semantic information includes at least one of the following:

Depth information, scene classification information, instance segmentation information, and object detection information.
The method according to claim 2, wherein the depth information includes at least one of the following:

Depth image, the pixel value in the depth image is used to represent the distance between the pixel point in the image and the camera used when imaging the image;

the maximum of said distances;

the minimum of said distances;

a quantified range of the distance between said maximum value and said minimum value;

An indication of whether the image contains an infinite portion.
The method according to claim 2, wherein the semantic information includes the depth information, and the S1 step includes:

Based on the image information, the depth information is obtained through a laser ranging radar and/or a depth information parsing network.
The method according to claim 2, wherein the semantic information includes the scene classification information, and the S1 step includes:

Based on the image information, extracting image scene features of the image;

Determine or generate scene classification information of the image according to the scene feature of the image.
The method according to claim 5, wherein said determining or generating the scene classification information of the image according to the scene features of the image comprises:

Inputting the image scene features into a scene classification model to obtain the probability that the image output by the scene classification model corresponds to at least one scene;

Among the probabilities that the image corresponds to at least one scene, it is determined that the scene corresponding to the highest probability is the scene classification information of the image.
The method according to any one of claims 1 to 6, wherein the S2 step comprises:

Based on the preset format, the semantic information is filled in a reserved field of the image data stream.
The method according to any one of claims 1 to 6, wherein the preset formats corresponding to different types of semantic information are different.
The method according to any one of claims 1 to 6, wherein, before the step S2, further comprising:

Identifying information corresponding to the semantic information is determined according to a preset corresponding relationship.
A kind of image data processing method, wherein, comprise the following steps:

S10: Obtain semantic information from the data stream of the image based on the preset format;

S20: Perform preset processing according to the semantic information.
The method according to claim 10, wherein the semantic information includes at least one of the following:

Depth image, the pixel value in the depth image is used to represent the distance between the pixel point in the image and the camera used when imaging the image;

the maximum of said distances;

the minimum of said distances;

a quantified range of the distance between said maximum value and said minimum value;

an indication of whether the image contains an infinite portion;

Scene classification information;

target detection information;

Instance segmentation information.
The method according to claim 10 or 11, wherein the S10 step comprises:

Based on the preset format, the semantic information is read in a reserved field of the image data stream.
The method according to claim 10 or 11, wherein the S20 step includes at least one of the following:

The semantic information includes scene classification information, and adjusts the target parameters of the camera in the corresponding scene according to the scene classification information;

The semantic information includes target detection information, and according to the target detection information, adjust the target of the camera's automatic focus;

The semantic information includes instance segmentation information, and according to the instance segmentation information, the target instance in the image is obtained; according to the target instance, preset processing is performed on the image.
An intelligent terminal, wherein the intelligent terminal comprises: a memory and a processor, wherein an image data processing program is stored on the memory, and when the image data processing program is executed by the processor, the implementation of claim 1 or Steps of the image data processing method described in 10.
A computer-readable storage medium, wherein a computer program is stored on the storage medium, and when the computer program is executed by a processor, the steps of the image data processing method according to claim 1 or 10 are realized.