WO2023029559A1 - 一种数据处理方法以及装置 - Google Patents

一种数据处理方法以及装置 Download PDF

Info

Publication number
WO2023029559A1
WO2023029559A1 PCT/CN2022/091839 CN2022091839W WO2023029559A1 WO 2023029559 A1 WO2023029559 A1 WO 2023029559A1 CN 2022091839 W CN2022091839 W CN 2022091839W WO 2023029559 A1 WO2023029559 A1 WO 2023029559A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
network
frame
information
target
Prior art date
Application number
PCT/CN2022/091839
Other languages
English (en)
French (fr)
Inventor
吴华珍
季军
占鹏超
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023029559A1 publication Critical patent/WO2023029559A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/803Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/625License plates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Definitions

  • the present application relates to the field of artificial intelligence, in particular to a data processing method and device.
  • Video imaging is an important means of current traffic management, agricultural management and industrial production. With the development of imaging technology, current imaging equipment can obtain better imaging results under ideal conditions. However, as the imaging conditions crack, the imaging results will have problems such as low resolution, poor contrast, and loss of image details.
  • the commonly used image signal processor (ISP) imaging system involves a large number of parameters, which need to be tuned separately according to the characteristics of each imaging scene, and the tuning workload is huge. If subsequent image enhancement is required, the image enhancement is performed based on the image output by the ISP.
  • the tuning workload of the ISP output image is huge, and the tuning effect is limited by tuning manual experience, and there may be information loss. Therefore, how to obtain more accurate images has become an urgent problem to be solved.
  • the present application provides a data processing method and device for using a neural network to perform target enhancement on original raw data, so as to obtain an output image with better enhancement effect while reducing the workload of enhancement.
  • the present application provides a data processing method, including: acquiring the first frame of data, the first frame of data is one of the frames of the original data collected by the image sensor; acquiring from the first frame of data The data corresponding to the compact frame is obtained to obtain the first data, the range covered by the compact frame in the first frame data includes the target object detected from the first frame data; the data corresponding to the loose frame is obtained from the first frame data, Obtain the second data, the range covered by the loose frame in the first frame data includes and is greater than the range covered by the tight frame in the first frame data; the first data and the second data are respectively used as the input of the target network to obtain the output image , the target network is used to extract the information of multiple channels in the input data, and obtain the output image according to the information of multiple channels.
  • the information of the region covering the target object is deducted through the compact frame, and used as the input of the neural network to obtain the enhancement result of the compact region
  • the information of the loose region is extracted through the loose frame, and used as the input of the neural network , to obtain the enhancement result of the loose area, and then fuse the enhancement result of the compact area and the enhancement result of the loose area to obtain the enhancement result of the target object. Therefore, there is no need to go through the commonly used ISP tuning parameters, reducing the workload of parameter tuning, and obtaining target enhancement results efficiently and accurately.
  • the commonly used ISP processing method can be replaced by the neural network, without upgrading the hardware, and without increasing the cost, better enhancement of the original (raw) data is realized, and an image with a better enhancement effect is obtained.
  • the target network may include a first network and a second network; using the first data and the second data as inputs to the target network respectively to obtain an output image may include: using the first data as the first The input of the network obtains the first enhanced information, and the first network is used to extract the information of the brightness channel in the input data; the second data is used as the input of the second network to obtain the second enhanced information, and the second network is used to extract the input Information of multiple channels of data; fusing the first enhanced information and the second enhanced information to obtain an output image.
  • the target network can include two parts, which respectively process the information corresponding to the tight frame and the loose frame, and enhance the target network based on the overall raw, and can realize parallel processing through different sub-networks. Processing, and can reduce overhead and improve work efficiency.
  • the above method may further include: performing target detection on the first frame data to obtain position information of the target object in the first frame data; generating tight frames and loose frames according to the position information of the target object.
  • the present application can identify the target object in the raw data through object detection, so that the position of the target object can be found quickly and accurately, and then the position and size of the tight frame and the loose frame can be accurately determined.
  • the above-mentioned acquisition of the first frame data may include: receiving user input data, and extracting the first frame data from the original data according to the user input data; or, for each frame in the original data Perform target detection, and extract the first frame of data from the original data according to the detection results.
  • the user can select the frames that need to be enhanced, so that the object can be enhanced according to the needs of the user, and the user experience can be improved.
  • the target detection can also be used to filter out the frames that need to be enhanced from the raw data, so that it can more accurately identify which frames need to be enhanced and obtain a more accurate and clear output image.
  • the target network is obtained by training in combination with the recognition network and the training set, and the recognition network is used to obtain the semantic information in the input image, wherein, in the process of training the target network, the recognition network The output of is used as a constraint to update the target network.
  • the output of the recognition network when training the target network, can be used as a constraint, so that the output image of the constrained target network can be recognized with higher accuracy, thereby improving the output accuracy of the target network.
  • the present application provides a neural network training method, including: obtaining a training set, which includes raw data collected by an image sensor and corresponding true value labels; using the training set as an input of the target network to obtain an enhanced result,
  • the target network is used to extract the information of the luminance channel corresponding to the tight frame from the input data, and extract the information of multiple channels corresponding to the loose frame from the input data, and fuse the information of the luminance channel and the information of multiple channels to obtain
  • the range covered by the loose box in the input data includes and is greater than the range covered by the tight box in the input data
  • the training set is used as the input of the recognition network to obtain the first recognition result, and the recognition network is used to obtain the input Semantic information in the image
  • the enhanced result is used as the input of the recognition network to obtain the second recognition result; according to the difference between the enhanced result and the true value label, and the difference between the first recognition result and the second recognition result,
  • the target network is updated to obtain the updated target network.
  • the target network may be updated by taking the output result of the recognition network as a constraint, so as to make the output of the target network more accurate.
  • the recognition network can be used to recognize both the real image and the output image of the target network, and the recognition result is used as a constraint to update the target network, so that the recognition result of the output image of the target network and the recognition result of the real image Closer, from the dimension of the accuracy of the target object in the output image to make the target network converge, and improve the probability that the output image of the target network is recognized correctly.
  • the target network includes a first network and a second network
  • the first network is used to extract the information of the brightness channel corresponding to the compact box from the data in the training set
  • the second network is used to extract the information of the brightness channel corresponding to the compact box from the input data Extract the information of multiple channels corresponding to the loose frame.
  • the target network can be divided into multiple parts, so that in the process of training and application, the data corresponding to the tight frame and the loose frame can be processed in parallel, thereby improving the output efficiency of the target network .
  • the enhancement result includes the first information output by the first network and the second information output by the second network
  • the second recognition result includes the third recognition result corresponding to the first information and the second information corresponding to The fourth recognition result
  • the target network is updated to obtain an updated target network, which can include :
  • the difference between the value labels, and the difference between the fourth recognition result and the first recognition result are used to update the second network to obtain an updated second network.
  • the first network and the second network can be trained separately, so that the sub-networks in the target network can be trained in a targeted manner, and the output accuracy of the sub-networks can be improved, thereby improving the performance of the target network. Overall output accuracy.
  • the aforementioned target network is updated according to the difference between the enhancement result and the true value label, and the difference between the first recognition result and the second recognition result to obtain the updated
  • the target network may include: obtaining a first loss value according to the difference between the enhancement result and the true label; obtaining a second loss value according to the difference between the first recognition result and the second recognition result; fusing the first loss value and the second loss value to obtain a third loss value; update the target network according to the third loss value to obtain an updated target network.
  • the entire target network may also be trained, so as to improve the overall output effect of the target network and obtain a target network with more accurate output.
  • the present application provides a data processing device, including:
  • An acquisition module configured to acquire the first frame of data, where the first frame of data is one of the frames in the raw data collected by the image sensor;
  • the compact deduction module is used to obtain the data corresponding to the compact frame from the first frame data to obtain the first data, and the range covered by the compact frame in the first frame data includes the target detected from the first frame data object;
  • the loose deduction module is used to obtain the data corresponding to the loose frame from the first frame of data, and obtain the second data.
  • the range covered by the loose frame in the first frame of data includes and is greater than that covered by the tight frame in the first frame of data scope;
  • the output module is used to use the first data and the second data as the input of the target network respectively to obtain an output image
  • the target network is used to extract information of multiple channels in the input data, and obtain an output image according to the information of multiple channels .
  • the target network includes a first network and a second network
  • the output module is specifically used to: use the first data as the input of the first network to obtain the first enhanced information, and the first network is used to extract the information of the brightness channel in the input data; use the second data as the input of the second network, The second enhanced information is obtained, and the second network is used to extract information of multiple channels of the input data; the first enhanced information and the second enhanced information are fused to obtain an output image.
  • the device further includes: a target detection module, configured to perform target detection on the first frame of data to obtain the position information of the target object in the first frame of data; generate a compact frame according to the position information of the target object and loose frame.
  • a target detection module configured to perform target detection on the first frame of data to obtain the position information of the target object in the first frame of data; generate a compact frame according to the position information of the target object and loose frame.
  • the obtaining module is specifically configured to: receive user input data, and extract the first frame of data from the original data according to the user input data; or, perform target detection on each frame in the original data, Extract the first frame data from the original data according to the detection result.
  • the target network is obtained by training in combination with the recognition network and the training set, and the recognition network is used to obtain the semantic information in the input image, wherein, in the process of training the target network, the recognition network The output of is used as a constraint to update the target network.
  • the present application provides a neural network training device, including:
  • the obtaining module is used to obtain the training set, which includes the original data collected by the image sensor and the corresponding true value labels;
  • the enhancement module is used to use the training set as the input of the target network to obtain the enhanced result.
  • the target network is used to extract the information of the luminance channel corresponding to the compact frame from the input data, and extract the information corresponding to the loose frame from the input data.
  • the information of the channel, the information of the brightness channel and the information of multiple channels are fused to obtain an enhanced result.
  • the range covered by the loose frame in the input data includes and is greater than the range covered by the tight frame in the input data;
  • the semantic segmentation module is used to use the training set as the input of the recognition network to obtain the first recognition result, and the recognition network is used to obtain the semantic information in the input image;
  • the semantic segmentation module is also used to use the enhanced result as the input of the recognition network to obtain the second recognition result;
  • the update module is configured to update the target network according to the difference between the enhancement result and the true label, and the difference between the first recognition result and the second recognition result, to obtain an updated target network.
  • the enhancement result includes the first information output by the first network and the second information output by the second network
  • the second recognition result includes the third recognition result corresponding to the first information and the second information corresponding to The fourth recognition result
  • the update module is specifically used to: update the first network according to the difference between the first information and the true value label, and the difference between the third recognition result and the first recognition result, and obtain an update After the first network; according to the difference between the second information and the true label, and the difference between the fourth recognition result and the first recognition result, update the second network to obtain the updated second network.
  • the update module is specifically configured to: obtain the first loss value according to the difference between the enhancement result and the true label; obtain the first loss value according to the difference between the first recognition result and the second recognition result second loss value; fusing the first loss value and the second loss value to obtain a third loss value; updating the target network according to the third loss value to obtain an updated target network.
  • the embodiment of the present application provides a data processing device, including: a processor and a memory, wherein the processor and the memory are interconnected through a line, and the processor calls the program code in the memory to execute any one of the above-mentioned first aspects Processing-related functions in the data processing method shown.
  • the electronic device may be a chip.
  • the embodiment of the present application provides a neural network training device, including: a processor and a memory, wherein the processor and the memory are interconnected through a line, and the processor calls the program code in the memory to execute any one of the above-mentioned first aspect.
  • the processing-related functions in the neural network training method shown in item may be a chip.
  • the embodiment of the present application provides an electronic device, which can also be called a digital processing chip or a chip.
  • the chip includes a processing unit and a communication interface.
  • the processing unit obtains program instructions through the communication interface, and the program instructions are processed by the processing unit.
  • the processing unit is configured to execute processing-related functions in any optional implementation manner of the first aspect or the second aspect above.
  • an embodiment of the present application provides a computer-readable storage medium, including instructions, which, when run on a computer, cause the computer to execute the method in any optional implementation manner of the first aspect or the second aspect above.
  • the embodiments of the present application provide a computer program product including instructions, which, when run on a computer, cause the computer to execute the method in any optional implementation manner of the first aspect or the second aspect above.
  • Fig. 1 is a schematic diagram of an artificial intelligence subject framework applied in the present application
  • FIG. 2 is a schematic diagram of a system architecture provided by the present application.
  • FIG. 3 is a schematic structural diagram of an electronic device provided by the present application.
  • Fig. 4 is a schematic flow chart of a data processing method provided by the present application.
  • Fig. 5 is a schematic flow chart of another data processing method provided by the present application.
  • FIG. 6 is a schematic diagram of an application scenario provided by the present application.
  • FIG. 7 is a schematic diagram of another application scenario provided by the present application.
  • Fig. 8 is a schematic diagram of a mask provided by the present application.
  • FIG. 9 is a schematic flowchart of another data processing method provided by the present application.
  • FIG. 10 is a schematic diagram of another application scenario provided by this application.
  • Fig. 11 is a schematic diagram of another application scenario provided by this application.
  • Fig. 12 is a schematic flow chart of a neural network training method provided by the present application.
  • Fig. 13 is a schematic diagram of training data provided by the present application.
  • Fig. 14 is a schematic flow chart of another neural network training method provided by the present application.
  • Figure 15 is a schematic diagram of another application scenario provided by this application.
  • Figure 16 is a schematic diagram of another application scenario provided by this application.
  • Figure 17 is a schematic diagram of another application scenario provided by this application.
  • FIG. 18 is a schematic structural diagram of a data processing device provided by the present application.
  • Fig. 19 is a schematic structural diagram of a neural network training device provided by the present application.
  • Fig. 20 is a schematic structural diagram of another data processing device provided by the present application.
  • Fig. 21 is a schematic structural diagram of another neural network training device provided by the present application.
  • FIG. 22 is a schematic structural diagram of a chip provided by the present application.
  • the application provides a data processing method, a neural network training method and a device, which combine the neural network to process the data collected by the image sensor, so as to obtain an output image with better enhancement effect.
  • a data processing method a neural network training method and a device, which combine the neural network to process the data collected by the image sensor, so as to obtain an output image with better enhancement effect.
  • the neural network and the electronic device including the image sensor provided in this application will be introduced separately below.
  • Figure 1 shows a schematic structural diagram of the main framework of artificial intelligence.
  • the following is from the “intelligent information chain” (horizontal axis) and “IT value chain” ( Vertical axis) to illustrate the above artificial intelligence theme framework in two dimensions.
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has undergone a condensed process of "data-information-knowledge-wisdom".
  • IT value chain reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of artificial intelligence, information (provided and processed by technology) to the systematic industrial ecological process.
  • the infrastructure provides computing power support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the basic platform.
  • computing power is provided by smart chips, such as central processing unit (central processing unit, CPU), network processing unit (neural-network processing unit, NPU), graphics processing unit (English: graphics processing unit, GPU), Application specific integrated circuit (application specific integrated circuit, ASIC) or field programmable logic gate array (field programmable gate array, FPGA) and other hardware acceleration chips
  • the basic platform includes distributed computing framework and network and other related platform protection and support, This can include cloud storage and computing, interconnection networks, and more.
  • sensors communicate with the outside to obtain data, and these data are provided to the smart chips in the distributed computing system provided by the basic platform for calculation.
  • Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, text, and IoT data of traditional equipment, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
  • machine learning and deep learning can symbolize and formalize intelligent information modeling, extraction, preprocessing, training, etc. of data.
  • Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, and using formalized information to carry out machine thinking and solve problems according to reasoning control strategies.
  • the typical functions are search and matching.
  • Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image processing identification, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is the packaging of the overall solution of artificial intelligence, which commercializes intelligent information decision-making and realizes landing applications. Its application fields mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, smart cities, etc.
  • the embodiment of the present application involves neural network and image-related applications.
  • a neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes x s and an intercept 1 as input, and the output of the operation unit can be:
  • W s is the weight of x s
  • b is the bias of the neuron unit.
  • f is the activation function of the neural unit, which is used to perform nonlinear transformation on the features obtained in the neural network, and convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer.
  • the activation function may be a sigmoid function.
  • a neural network is a network formed by connecting many of the above-mentioned single neural units, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field.
  • the local receptive field can be an area composed of several neural units.
  • Deep neural network also known as multi-layer neural network
  • DNN can be understood as a neural network with multiple hidden layers.
  • DNN is divided according to the position of different layers, and the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the layers in the middle are all hidden layers.
  • the layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
  • DNN looks complicated, it is actually not complicated in terms of the work of each layer.
  • it is the following linear relationship expression: in, is the input vector, is the output vector, Is the offset vector, W is the weight matrix (also called coefficient), and ⁇ () is the activation function.
  • Each layer is just an input vector After such a simple operation, the output vector is obtained. Due to the large number of DNN layers, the coefficient W and the offset vector The number is also higher.
  • DNN The definition of these parameters in DNN is as follows: Take the coefficient W as an example: Assume that in a three-layer DNN, the linear coefficient from the fourth neuron of the second layer to the second neuron of the third layer is defined as The superscript 3 represents the layer number of the coefficient W, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
  • the coefficient from the kth neuron of the L-1 layer to the jth neuron of the L layer is defined as
  • the input layer has no W parameter.
  • more hidden layers make the network more capable of describing complex situations in the real world. Theoretically speaking, a model with more parameters has a higher complexity and a greater "capacity", which means that it can complete more complex learning tasks.
  • Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vector W of many layers).
  • Convolutional neural network is a deep neural network with a convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a subsampling layer, which can be regarded as a filter.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can only be connected to some adjacent neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units of the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract features independent of position.
  • the convolution kernel can be formalized as a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network.
  • the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • the loss function may generally include loss functions such as error square mean square, cross entropy, logarithm, and exponential.
  • the error mean square can be used as the loss function, defined as Specifically, a specific loss function can be selected according to the actual application scenario.
  • the convolutional neural network can use the back propagation (BP) algorithm to correct the size of the parameters in the initial network model during the training process, so that the reconstruction error loss of the model becomes smaller and smaller. Specifically, passing the input signal forward until the output will generate an error loss, and updating the parameters in the initial model by backpropagating the error loss information, so that the error loss converges.
  • the backpropagation algorithm is a backpropagation movement dominated by the error loss, aiming to obtain the optimal model parameters, for example, the weight matrix.
  • the BP algorithm in the training phase, may be used to train the model to obtain the trained model.
  • Stochastic gradient The number of samples in machine learning is very large, so the loss function calculated each time is calculated from the data obtained by random sampling, and the corresponding gradient is called stochastic gradient.
  • YUV is a color space
  • Y represents the brightness (Luminance or Luma), that is, the grayscale value
  • U and “V” represent the chroma (Chrominance or Chroma), used for Specifies the color of the pixel.
  • U and “V” are the two components that make up a color.
  • the importance of using the YUV color space is that its brightness signal Y and chrominance signals U and V are separated. If there is only Y signal component but no U and V signal components, then the image represented in this way is a black and white grayscale image.
  • Raw data records the original information of the camera sensor, which is an unprocessed and uncompressed format.
  • RAW can be conceptualized as "raw image coding data” or more vividly called “digital data”. Negatives”.
  • the present application provides a neural network training method and a data processing method.
  • the neural network obtained through the neural network training method provided in the present application can be applied to the data processing method provided in the present application.
  • the data processing method provided in this application can be used to process the original raw data collected by the sensor, so as to obtain an output image.
  • an embodiment of the present application provides a system architecture 200 .
  • data collection device 260 may be used to collect training data. After the data acquisition device 260 collects the training data, the training data is stored in the database 230 , and the training device 220 obtains the target model/rule 201 based on training data maintained in the database 230 .
  • the training device 220 obtains the target model/rule 201 based on the training data.
  • the training device 220 outputs the corresponding predicted label for multiple frames of sample images, and calculates the loss between the predicted label and the original label of the sample, and updates the classification network based on the loss until the predicted label is close to the original label of the sample.
  • the difference between the label or the predicted label and the original label is less than a threshold, thereby completing the training of the target model/rule 201 .
  • the target model/rule 201 in the embodiment of the present application may specifically be a neural network. It should be noted that, in practical applications, the training data maintained in the database 230 may not all be collected by the data collection device 260, but may also be received from other devices. In addition, it should be noted that the training device 220 does not necessarily perform the training of the target model/rules 201 based entirely on the training data maintained by the database 230, and it is also possible to obtain training data from the cloud or other places for model training. Limitations of the Examples.
  • the target model/rule 201 trained according to the training device 220 can be applied to different systems or electronic devices, such as the execution device 220 shown in FIG. , laptops, augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR), vehicle-mounted terminals, TVs, etc., can also be used in various electronic devices such as servers or clouds.
  • the execution device 220 is configured with a transceiver 212, which may include an input/output (I/O) interface or other wireless or wired communication interfaces, etc., for data interaction with external devices. , taking the I/O interface as an example, the user can input data to the I/O interface through the client device 240 .
  • I/O input/output
  • the execution device 220 When the execution device 220 preprocesses the input data, or in the execution device 220 computing module 212 performs calculations and other related processing, the execution device 220 can call the data, codes, etc. in the data storage system 250 for corresponding processing , and the correspondingly processed data and instructions may also be stored in the data storage system 250 .
  • the I/O interface 212 returns the processing result to the client device 240, thereby providing it to the user.
  • the training device 220 can generate corresponding target models/rules 201 based on different training data for different goals or different tasks, and the corresponding target models/rules 201 can be used to achieve the above goals or complete above tasks, thereby providing the desired result to the user.
  • the user can manually specify the input data, and the manual specification can be operated through the interface provided by the transceiver 212 .
  • the client device 240 can automatically send the input data to the transceiver 212 . If the client device 240 is required to automatically send the input data to obtain the user's authorization, the user can set the corresponding authority in the client device 240 . The user can view the results output by the execution device 220 on the client device 240, and the specific presentation form may be specific ways such as display, sound, and action.
  • the client device 240 can also be used as a data collection terminal, collecting the input data input to the transceiver 212 as shown in the figure and the output results of the output transceiver 212 as new sample data, and storing them in the database 230 .
  • the client device 240 may not be used for collection, but the transceiver 212 directly stores the input data input into the transceiver 212 and the output result of the output transceiver 212 as new sample data into the database 230 as shown in the figure.
  • the data storage system 250 is an external memory relative to the execution device 220 , and in other cases, the data storage system 250 may also be placed in the execution device 220 .
  • the target model/rule 201 is trained according to the training device 220 , and the target model/rule 201 in the embodiment of the present application may be the neural network mentioned below in the present application.
  • the aforementioned training device shown in FIG. 2 can be used to execute the neural network training method provided in this application to obtain a trained neural network.
  • the trained neural network can be deployed in an execution device for executing the data processing method provided in this application, that is, the execution device can be the electronic device provided by this application.
  • the electronic devices provided in the embodiments of the present application may specifically include handheld devices, vehicle-mounted devices, data processing devices, computing devices, and other electronic devices that include image sensors or are connected to image sensors.
  • PDA personal digital assistant
  • MTC machine type communication
  • MTC machine type communication
  • sales terminal point of sales, POS
  • head-mounted device such as bracelet, smart watch, etc.
  • security equipment virtual reality (virtual reality) reality (VR) devices, augmented reality (augmented reality, AR) devices, and other electronic devices with imaging capabilities.
  • VR virtual reality
  • AR augmented
  • a digital camera is the abbreviation of a digital camera, which is a camera that converts optical images into digital signals by using a photoelectric sensor.
  • the sensor of a digital camera is a photosensitive charge-coupled device (CCD) or complementary metal oxide semiconductor (complementary metal oxide semiconductor). , CMOS).
  • CCD charge-coupled device
  • CMOS complementary metal oxide semiconductor
  • CMOS complementary metal oxide semiconductor
  • CMOS processing technology the functions of digital cameras have become more and more powerful, and have almost completely replaced traditional film cameras. They are widely used in consumer electronics, human-computer interaction, computer vision, and automatic driving.
  • FIG. 3 shows a schematic diagram of an electronic device provided by the present application.
  • the electronic device may include a lens (lens) group 110, an image sensor (sensor) 120, and an electrical signal processor 130.
  • the electrical signal processor 130 may include an analog-to-digital (A/D) converter 131 and a digital signal processor 132 .
  • the analog-to-digital converter 131 is an analog signal-to-digital signal converter, which is used to convert an analog electrical signal into a digital electrical signal.
  • the electronic equipment shown in FIG. 3 is not limited to include the above components, and may also include more or less other components, such as batteries, flashlights, buttons, sensors, etc., and the embodiment of the present application only uses image sensors installed 120 electronic equipment as an example for description, but the components mounted on the electronic equipment are not limited thereto.
  • the light signals reflected by the subject are collected by the lens group 110 and imaged on the image sensor 120 .
  • the image sensor 120 converts optical signals into analog electrical signals.
  • the analog electrical signal is converted into a digital electrical signal by an analog-to-digital (A/D) converter 131 in the electrical signal processor 130, and the digital electrical signal is processed by a digital signal processor 132, for example, through a series of complex mathematical arithmetic operations , optimize the data electrical signal, and finally output an image.
  • the electrical signal processor 130 may further include an analog signal pre-processor 133 for preprocessing the analog electrical signal transmitted by the image sensor and outputting it to the analog-to-digital converter 131 .
  • the performance of the image sensor 120 affects the quality of the final output image.
  • the image sensor 120 can also be called a photosensitive chip, a photosensitive element, etc., and contains hundreds of thousands to millions of photoelectric conversion elements. When irradiated by light, charges will be generated and converted into digital signals by an analog-to-digital converter chip.
  • the digital signal output by the image sensor may be referred to as raw (raw) data, that is, data without image processing.
  • the image signal processor (image signal processor, ISP) imaging system contains multiple functional modules, involving a large number of parameters, which need to be tuned separately according to the characteristics of each imaging scene, the tuning workload is huge, and the tuning The effect is limited by the experience of the tuning staff.
  • ISPs image signal processor, ISPs pay more attention to video quality optimization, but in real-world imaging applications, many scenarios pay more attention to the imaging quality of specific targets rather than video quality, such as the imaging quality of the license plate area in traffic scenarios, and the imaging of specific workpieces in industrial inspection scenarios quality. There is usually more prior information for this specific target.
  • key processing steps of a common ISP processing process may include data correction, detail processing, color adjustment, brightness adjustment, image enhancement, and the like.
  • Each step can contain multiple classic algorithms, and each algorithm has corresponding tuning parameters, and the order of each step can be adjusted according to the actual situation. Therefore, parameter tuning in the traditional ISP processing process is a complicated task that requires a lot of Larger labor costs.
  • the scene adaptability of traditional ISP processing is poor, and different scenarios need to be tuned separately, and the tuning work is huge.
  • Raw data is usually 12 (or 14 or 16) bit uncompressed data, which is converted into 8-bit image data after ISP processing, and there is a certain amount of information loss in the conversion process.
  • the ISP will introduce false textures into the image due to the accumulation of errors in multiple processing processes, and the false textures introduced by different ISP parameters are quite different.
  • Most of the existing target-based enhancement algorithms are implemented on RGB or YUV images after ISP processing. Since traditional ISP processing has introduced various pseudo-textures in images, target enhancement algorithms on RGB or YUV images The upper limit of the effect is low, and the generalization is weak, and it needs to be adapted to different ISP chip imaging styles. If the target enhancement is only achieved in the original raw data domain, it needs to cooperate with subsequent ISP tuning to obtain the final enhancement effect. The overall process is complicated, and complicated ISP tuning work cannot be avoided.
  • the present application provides a data processing method, which can improve the imaging quality of a specific target while reducing the workload of tuning and reducing the overhead without upgrading the imaging hardware.
  • the enhancement of raw data is realized through the neural network, without a lot of parameter tuning, and it is only necessary to deploy the trained neural network in the ISP chip.
  • the application and training process of the neural network are introduced respectively in the following.
  • FIG. 4 it is a schematic flowchart of a data processing method provided by the present application, as described below.
  • the raw data collected by the image sensor that is, raw data
  • the first frame of data is one of the frames of data in the raw data, that is, the data that needs image enhancement.
  • user input data can be received, and then the first frame data can be determined from the raw data according to the user input data; or, target detection can be performed on each frame in the raw data, and then the first frame can be extracted from the raw data according to the detection result.
  • Frame data can be randomly selected from the raw data as the first frame data, etc. For example, extract a frame with a clearer target from the raw data as the first frame data, or extract a frame with a more complex target texture from the raw data as the first frame data, etc., which can be selected according to the actual application scenario, of course.
  • Each frame in the raw data is enhanced, and this application exemplarily uses the first frame data as an example for illustration.
  • the first frame of data may include information of multiple channels, and the specific number of channels may be determined according to the data collected by the image sensor.
  • an image sensor can collect information on channels such as brightness and chrominance.
  • a compact frame and a loose frame can be generated based on the first frame of data, and the data corresponding to the compact frame can be extracted from the first frame of data to obtain the first data, the compact frame covers the area where the target object is located in the first frame of data, so that the first data including the information of the target object can be extracted from the first frame of data.
  • the size of the loose style is larger than and covers the range corresponding to the tight frame in the first frame of data, and the data corresponding to the loose frame is extracted from the first frame of data to obtain the second data.
  • target detection may be performed on the first frame of data, the position of the target object in the first frame of data may be identified, and then a tight frame and a loose frame may be generated based on the position of the target object.
  • the range covered by the tight frame in the first frame of data includes the information of the target object, which fits more closely with the target object, and the range covered by the loose frame in the first frame of data, in addition to including the information of the target object, Information of adjacent pixel points of the target object may also be included.
  • the target network is used to enhance the target object based on the first data and the second data, so as to obtain an enhanced image, that is, an output image.
  • the target network can include multiple parts, such as a first network and a second network, the first network can be used to extract brightness information from the input and enhance it, and the second network can be used to extract brightness information from the input
  • the information of multiple channels is extracted from the data and enhanced.
  • the first data can be used as the input of the first network to obtain the first enhanced information
  • the second data can be used as the input of the second network to obtain the second enhanced information
  • the first enhanced information and the second enhanced information can be fused to obtain the output image . Therefore, the texture details of the target object can be enhanced by compacting the frame, thereby improving the definition of the target object and making the target object clearer in the output image.
  • the first network can also extract and enhance the information of other channels.
  • This application exemplarily uses the example of extracting and enhancing brightness information from the input input. description, not limitation.
  • the information of the brightness channel in the target object can be extracted and enhanced through the tight frame, and the information of multiple channels around the target object and its vicinity can be extracted and enhanced through the loose frame, so that the target The texture of the object is enhanced, so that the target object included in the output image is clearer, and the output image of the enhanced target object is obtained.
  • This application enhances the image of the target object to improve the imaging effect of the output image.
  • its parameter adjustment workload is heavy, and there may be information loss during the processing process.
  • This application processes the original raw data including full information through the neural network to achieve end-to-end target enhancement and reduce Tuning workload.
  • the target network when training the target network, it can be trained in combination with the recognition network and the training set.
  • the recognition network can be used to obtain the semantic information in the input image, such as the information of the object in the input image can be recognized through the recognition network, Information such as the category, size or location of the object, in the process of training the target network, the output of the recognition network can be used as a constraint to update the target network to obtain an updated target network. Therefore, in the embodiment of the present application, when training the target network, the output result of the recognition network can be used as a constraint to make the recognition result of the output result of the target network more accurate, thereby improving the accuracy and clarity of the output image of the target network to improve the image enhancement effect of the target network.
  • the present application can deploy the neural network in the ISP system, so that the commonly used ISP processing system can be replaced by the neural network, thereby improving the target enhancement efficiency.
  • the flow of another data processing method provided by this application may be as shown in FIG. 5 .
  • the raw data can be collected by an image sensor, and in order to implement end-to-end target enhancement based on the uncompressed raw data, a certain amount of raw data can be cached in the ISP system. For example, after the sensor collects raw data, it can be transmitted to the ISP system for subsequent processing.
  • target enhancement can be performed on each frame, or one or more frames can be selected from the raw data for target enhancement.
  • one of the frames is selected as the target frame (that is, the aforementioned first frame data) for target enhancement as an example for illustration.
  • target detection may be performed on raw data, and one or more frames including target objects are screened out from the raw data as target frames.
  • a certain frame may be selected by the user as the target frame, and the target frame may be extracted from the raw data.
  • the target detection algorithm can be used to obtain information such as the size and definition of the target object in each frame of the video, and then a frame with a relatively larger size or higher definition of the target object can be selected from the video as the target frame. At the same time, the position of the target object in the target frame can be obtained.
  • the target frame and the position of the target object can be set by the user, such as the user sets a specific frame number as the capture frame, and at the same time, the relative position between the camera and the target object can be relationship to determine the specific position of the target object in the screen.
  • target detection can be performed on the target frame to identify the specific information of the target object in the target frame, or the information of the target object can be determined through manual setting by the user, such as the position, size or shape of the target object, etc. information.
  • the tight frame and loose frame can be generated based on the specific information of the target object.
  • the tight frame can be understood as a frame that closely fits the target object.
  • the loose frame can not only include the range covered by the tight frame, but also include the adjacent within a certain range. It can be understood that the tight frame tightly wraps the target object and fits more closely with the outline of the target object, while the loose frame is larger than and covers the compact frame, including the target object and a certain range around it.
  • this embodiment refers to the first network as Target_ISPNet, which can extract the information of the area covered by the compact frame from the target frame, and use the extracted information as the input of Target_ISPNet to output brightness enhancement information.
  • Target_ISPNet the first network as Target_ISPNet, which can extract the information of the area covered by the compact frame from the target frame, and use the extracted information as the input of Target_ISPNet to output brightness enhancement information.
  • the information covered by the compact frame 601 can be deducted from the target frame as the input of Target_ISPNet to obtain the details of the range covered by the compact frame and the information of the brightness channel after contrast enhancement, as expressed as Yt.
  • Target_ISPNet can also enhance the information of other channels such as the chroma channel.
  • This application exemplarily uses the Target_ISPNet to extract and enhance the information of the luma channel as an example for illustration.
  • this embodiment refers to the second network as Full_ISPNet, which can extract the information covered by the loose frame from the target frame, and use the extracted information as the input of Full_ISPNet, and output the multi-channel network after each channel is enhanced.
  • Channel enhancement information can be used to enhance the second network as Full_ISPNet, which can extract the information covered by the loose frame from the target frame, and use the extracted information as the input of Full_ISPNet, and output the multi-channel network after each channel is enhanced.
  • the coverage information of the loose frame 602 can be deducted from the target frame as the input of Full_ISPNet to obtain the enhancement information Yf of the brightness channel and the enhancement information UfVf of the color channel within the coverage range of the loose frame.
  • the enhancement information of the luminance channel and the enhancement information of multiple channels can be fused to obtain an enhanced output image.
  • the luminance channel output Yt of Target_ISPNet is fused with the output YfUfVf of Full_ISPNet to obtain the final target enhancement result.
  • This fusion can be achieved in a variety of ways, such as weighted fusion and Poisson fusion.
  • the textures of the tight frame and the loose frame in the tight area are the same, and the tight area is the range covered by the tight frame, which can be fused with Yt and YfUfVf using a mask.
  • the luminance information can be extracted from the area corresponding to the compact frame and enhanced. Therefore, during the fusion process, the luminance channel can fuse Yf and Yt, and UV can use UfVf.
  • Target_ISPNet also outputs the enhanced information of the UV channel
  • the information of the UV channel output by Target_ISPNet can also be fused with the information of the UV channel output by Full_ISPNet, which can be adjusted according to the actual application scenario.
  • the fusion method of each channel can be expressed as:
  • the pixel values in this mask can be used as fusion weights, as shown in Figure 8, if the center area of the compact frame is close to 1, and the area outside the compact frame is close to 0, the compact frame can use near-linear transition.
  • a license plate image enhancement in a traffic scene is taken as an example for illustration, and the overall process can refer to FIG. 9 .
  • the captured license plate image is shown in Figure 10.
  • the target detection network can be used to identify the position of the license plate in the image, and then generate a tight frame and a loose frame according to the position of the license plate.
  • the detection frame that tightly surrounds the license plate is called a tight frame.
  • a loose frame can be obtained by expanding the tight frame to a certain range.
  • the expansion ratio can be preset or set by the user. For example, the size of the part of the license plate that needs to be enhanced can be set by the user. For example, in the license plate snapshot, the size of the tight frame accounts for 30% of the loose frame.
  • the commonly used ISP system needs to realize the adjustment and enhancement of details, contrast, brightness, color and other information.
  • the tasks of detail enhancement, contrast enhancement, brightness adjustment and color adjustment are quite different. Details and contrast belong to high-frequency information, while brightness and color belong to low-frequency information. Therefore, the tasks of detail and contrast enhancement are more difficult. , need to consume more network computing instances.
  • the human eye usually only pays attention to the details and contrast enhancement in the tight frame of the target. Therefore, the target network provided by this application can include at least two parts, namely the enhanced network Target_ISPNet for the tight frame and the enhanced network Full_ISPNet for the loose frame.
  • Target_ISPNet so as to enhance the brightness channel of the detection target through Target_ISPNet, realize the details and contrast enhancement of the tight frame area, and realize the brightness adjustment and color adjustment of the loose frame area through Full_ISPNet, thereby improving the clarity of the target object from multiple dimensions , to improve the user's perception experience.
  • Target_ISPNet obtains the luminance channel Yt of the tight area after detail contrast enhancement
  • Full_ISPNet obtains the luminance of the loose area and adjusts Yf, Uf, and Vf with color, and fuses Yt and Yf to obtain the fused tight area
  • the brightness information of the compact area is then pasted back to the loose frame area to obtain the multi-channel information of the loose frame area.
  • the information of the region covering the target object is deducted through the compact frame, and used as the input of the neural network to obtain the enhancement result of the compact region
  • the information of the loose region is extracted through the loose frame, and used as the input of the neural network , to obtain the enhancement result of the loose area, and then fuse the enhancement result of the compact area and the enhancement result of the loose area to obtain the enhancement result of the target object. Therefore, there is no need to go through the commonly used ISP tuning parameters, reducing the workload of parameter tuning, and obtaining target enhancement results efficiently and accurately.
  • the commonly used ISP processing method can be replaced by a neural network, without upgrading the hardware, and without increasing the cost, better enhancement of raw data is achieved.
  • the aforementioned data processing method provided by this application is introduced, wherein the data corresponding to the tight frame and the loose frame deducted from the target frame are respectively used as the input of the target network, so as to obtain the result of target enhancement.
  • the target network can be obtained through
  • For the trained neural network refer to FIG. 12 , a neural network training method provided by the present application, as described below.
  • the training set may include a plurality of samples and the labels corresponding to each sample, and the training set may include raw data collected by the image sensor and corresponding true value images.
  • the training set may include raw data (that is, samples) collected by the image sensor and images after enhancement, that is, real-valued images.
  • the training data is related to the training method of the target network.
  • the ground-truth images in the training set can be divided into multiple types, such as the real-value images corresponding to the compact box.
  • the training set can include raw data and the corresponding enhanced image.
  • the raw data and the corresponding real-value image can be shown in Figure 13, where the raw data can include the captured license plate raw data carrying noise, and the corresponding ISP processed data such as noise reduction, detail enhancement, and contrast enhancement. true image of .
  • the target network can be used to extract the information of the luminance channel corresponding to the tight frame from the input data, and extract the information of multiple channels corresponding to the loose frame from the input data, and the information of the luminance channel and the information of multiple channels The information is fused to get enhanced results.
  • the target network includes a first network and a second network
  • the first network is used to extract the brightness channel information corresponding to the tight box from the data in the training set
  • the second network is used to extract the information corresponding to the loose box from the input data information from multiple channels.
  • the enhanced result may include the first information output by the first network and the second information output by the second network
  • the target network may refer to the target network mentioned above in FIG. 4 to FIG. 11 , which will not be repeated here.
  • the training process for the target network can be an iterative training process, and the target network can be updated iteratively for multiple times, that is, steps 1202-1206 can be executed multiple times. Therefore, only one of the iterative process is described as an example, and it is not intended as a limitation.
  • the recognition network can be used to recognize the information of the target object in the input image to obtain the first recognition result, such as information such as the position, category, and size of the target object.
  • the recognition network can specifically include a target detection network, a semantic segmentation network, or a classification network, etc., which can be used to extract the semantic information in the input image, and then perform tasks such as detection, segmentation, or classification based on the extracted semantic information to obtain the input image.
  • the specific information of the object can be used to recognize the information of the target object in the input image to obtain the first recognition result, such as information such as the position, category, and size of the target object.
  • the recognition network can specifically include a target detection network, a semantic segmentation network, or a classification network, etc., which can be used to extract the semantic information in the input image, and then perform tasks such as detection, segmentation, or classification based on the extracted semantic information to obtain the input image.
  • the specific information of the object can be used to recognize the information of the target object in the input image to obtain
  • the output result of the target network is also used as the input of the recognition network to obtain the second recognition result, which may include the information of the target object in the enhanced result recognized by the recognition network, such as the position and category of the target object , size and other information.
  • the second recognition result may include a third recognition result corresponding to the first information and a fourth recognition result corresponding to the second information.
  • the output result of the target network may include the output result of the first network and the output result of the second network, and the output result of the first network and the output result of the second network may be semantically segmented by using the recognition network, respectively. The information of the target object in the first information and the information of the target object in the second information are obtained.
  • the loss value between the enhanced result and the true value label, and the loss value between the first recognition result and the second recognition result can be calculated, and the loss value between the first recognition result and the second recognition result is used as a constraint,
  • the target network is updated using the loss value between the augmented result and the ground truth label to obtain an updated target network.
  • the target network may include the first network and the second network. If the target network is updated as a whole, the overall loss value of the target network may be calculated. If the first network and the second network are updated separately, the The output of the first network and the output of the second network are used to calculate the loss value, thereby updating the first network and the second network respectively.
  • the second recognition result may include the third recognition result and the fourth recognition result
  • the relationship between the first information and the true value label may be and the difference between the third recognition result and the first recognition result, update the first network to obtain an updated first network; according to the second information and the true value
  • the difference between the labels, and the difference between the fourth recognition result and the first recognition result are used to update the second network to obtain an updated second network, thereby implementing an update to the target network.
  • the first loss value can be obtained according to the difference between the enhancement result and the true value label; the second loss value can be obtained according to the difference between the first recognition result and the second recognition result; fusion The first loss value and the second loss value are used to obtain a third loss value; the target network is updated according to the third loss value to obtain an updated target network.
  • various methods are provided for updating the target network, and the target network can be updated with the output result of the recognition network as a constraint, so that the output of the target network is more accurate.
  • the recognition network can be used to recognize both the real image and the output image of the target network, and the recognition result is used as a constraint to update the target network, so that the recognition result of the output image of the target network and the recognition result of the real image Closer, from the dimension of the accuracy of the target object in the output image to make the target network converge, and improve the probability that the output image of the target network is recognized correctly. For example, if the target object is recognized as "dog" in the ground truth image, the process of training the target network is to make the recognition result of the output image of the target network closer to "dog", so that the output result of the target network is more accurate.
  • the updated target network After the updated target network is obtained, it can be judged whether the convergence condition is met, and if the convergence condition is met, the updated target network can be output, that is, the training of the target network is completed. If the convergence condition is not met, the updated target network can be used as a new target network, and the new target network can be continuously trained, that is, step 1202 is repeatedly executed until the convergence condition is met.
  • the convergence condition may include one or more of the following: the number of times of training the target network reaches the preset number, or the output accuracy of the target network is higher than the preset accuracy value, or the average accuracy of the target network is higher than the preset The average value, or the training time of the target network exceeds the preset time, or the recognition accuracy of the output image of the target network by the recognition network is higher than the preset accuracy, etc.
  • the training of the target network can be stopped, and the updated target network of the current iteration can be output.
  • the recognition result is used as a constraint on the loss value between the recognition result of the real image and the output image, so that the recognition result of the output image of the target network and The recognition result of the real image is closer, which is equivalent to making the target object in the output image of the target object closer to the real image, and improving the clarity and accuracy of the output image of the target network.
  • FIG. 14 Exemplarily, for ease of understanding, the detailed process is shown in FIG. 14 , and a detailed application scenario is taken as an example for illustration.
  • the training set is used as the input of the target network and the recognition network respectively.
  • the raw data in each sample in the training set can be used as the input of the target network to obtain the output image.
  • the ground-truth image (eg, Gt) in each sample is used as the input of the recognition network to obtain the first recognition result.
  • the loss value is then calculated based on the output image, the first recognition result and the second recognition result respectively, and the target network is updated using the loss value.
  • the loss value it is judged whether the target network is converged, if so, the iteration can be stopped, and if not, the iteration can be continued.
  • the entire target network can be updated, and in the process of calculating the loss value, the loss value between the output image and the true value image, as well as the first recognition result and the second recognition result can be calculated The loss value between them, and then fuse these two loss values to inversely update the target network.
  • Target_ISPNet and Full_ISPNet are updated separately, the loss values of each network can be calculated separately, and then updated separately.
  • the license plate segmentation subtask is added during the training process.
  • the ground-truth image and segmentation results of compact regions can be shown in Figure 15.
  • the loss of the segmentation subtask can be expressed as:
  • Target_ISPNet The loss for Target_ISPNet can finally be expressed as:
  • the weights of enhancement tasks and semantic segmentation tasks can be adjusted by adjusting the ⁇ parameter.
  • Yf Uf Vf represents the output of the Full_ISPNet network.
  • the loss function constrains the output image to be as similar as possible to the high-definition image in the training data to achieve license plate loose area enhancement.
  • the final result of license plate enhancement can be obtained by weighted fusion, and the fusion formula can be expressed as:
  • Yf Uf Vf and mask can be as shown in Figure 16.
  • the target is further divided into compact frame input and loose frame input to reduce network overhead.
  • the network overhead will take about 200ms.
  • the overhead of Target_ISPNet is 30ms and the overhead of Full_ISPNet is 10ms.
  • the network performance Increased by 5 times.
  • the segmentation result of the segmentation task is introduced as a constraint, so that the segmentation result of the output image of the target network can be constrained to be closer to the segmentation result of the real image, so that the target The output image of the network is more accurate. And it does not increase the overhead of the target network application, so that the target network can replace the commonly used ISP processing method, efficiently process the raw data to obtain the output image, and the output image obtained is more accurate.
  • the comparison between the imaging effect of the commonly used ISP processing method and the imaging effect of the target network can be shown in FIG. 17 .
  • the imaging effect of the license plate output through the target network is better, and the enhancement effect is better.
  • the present application provides a schematic structural diagram of a data processing device, including:
  • An acquisition module 1801 configured to acquire a first frame of data, where the first frame of data is one of the frames of raw data collected by the image sensor;
  • the compact deduction module 1802 is used to obtain the data corresponding to the compact frame from the first frame data to obtain the first data, and the range covered by the compact frame in the first frame data includes the detected data from the first frame data target;
  • the loose deduction module 1803 is used to obtain the data corresponding to the loose frame from the first frame data, and obtain the second data.
  • the range covered by the loose frame in the first frame data includes and is greater than that covered by the tight frame in the first frame data range;
  • the output module 1804 is configured to use the first data and the second data as the input of the target network respectively to obtain an output image, and the target network is used to extract the information of multiple channels in the input data, and obtain an output according to the information of multiple channels image.
  • the target network includes a first network and a second network
  • the output module 1804 is specifically used to: use the first data as the input of the first network to obtain the first enhancement information, and the first network is used to extract information of the brightness channel in the input data; use the second data as the input of the second network , to obtain the second enhanced information, the second network is used to extract the information of multiple channels of the input data; the first enhanced information and the second enhanced information are fused to obtain the output image.
  • the device further includes: a target detection module 1805, configured to perform target detection on the first frame data to obtain the position information of the target object in the first frame data; generate a compact object according to the position information of the target object box and loose box.
  • a target detection module 1805 configured to perform target detection on the first frame data to obtain the position information of the target object in the first frame data; generate a compact object according to the position information of the target object box and loose box.
  • the acquisition module 1804 is specifically configured to: receive user input data, and extract the first frame of data from the original data according to the user input data; or, perform target detection on each frame in the original data , extract the first frame data from the original data according to the detection result.
  • the target network is obtained by training in combination with the recognition network and the training set, wherein, in the process of training the target network, the output of the recognition network is used as a constraint to update the target network.
  • FIG. 19 a schematic structural diagram of a neural network training device provided by the present application, including:
  • the obtaining module 1901 is used to obtain a training set, which includes raw data collected by the image sensor and corresponding ground truth labels;
  • the enhancement module 1902 is used to use the training set as the input of the target network to obtain an enhanced result.
  • the target network is used to extract the information of the luminance channel corresponding to the tight frame from the input data, and to extract the multiple channels corresponding to the loose frame from the input data.
  • the information of each channel, the information of the brightness channel and the information of multiple channels are fused to obtain an enhanced result.
  • the range covered by the loose frame in the input data includes and is greater than the range covered by the tight frame in the input data;
  • the semantic segmentation module 1903 is used to use the training set as the input of the recognition network to obtain the first recognition result
  • the semantic segmentation module 1903 is also used to use the enhanced result as the input of the recognition network to obtain the second recognition result;
  • the update module 1904 is configured to update the target network according to the difference between the enhancement result and the true label, and the difference between the first recognition result and the second recognition result, to obtain an updated target network.
  • the enhanced result includes the first information output by the first network and the second information output by the second network
  • the second recognition result includes the third recognition result corresponding to the first information and the second information corresponding to The fourth recognition result of ;
  • the update module 1904 is specifically configured to: update the first network according to the difference between the first information and the true value label, and the difference between the third recognition result and the first recognition result, and obtain An updated first network: update the second network according to the difference between the second information and the true label, and the difference between the fourth recognition result and the first recognition result, to obtain an updated second network.
  • the update module 1904 is specifically configured to: obtain the first loss value according to the difference between the enhancement result and the true value label; obtain the first loss value according to the difference between the first recognition result and the second recognition result the second loss value; fusing the first loss value and the second loss value to obtain a third loss value; updating the target network according to the third loss value to obtain an updated target network.
  • FIG. 20 is a schematic structural diagram of another data processing device provided by the present application, as described below.
  • the data processing device may include a processor 2001 and a memory 2002 .
  • the processor 2001 and the memory 2002 are interconnected by wires. Wherein, program instructions and data are stored in the memory 2002 .
  • the memory 2002 stores program instructions and data corresponding to the steps in the above-mentioned Fig. 4-Fig. 11 .
  • the processor 2001 is configured to execute the method steps performed by the data processing apparatus shown in any one of the embodiments in FIG. 4 to FIG. 11 .
  • the data processing apparatus may further include a transceiver 2003, configured to receive or send data.
  • a transceiver 2003 configured to receive or send data.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium is stored with a program for generating the driving speed of the vehicle.
  • the illustrated embodiments describe steps in methods.
  • the aforementioned data processing device shown in FIG. 20 is a chip, such as an ISP chip.
  • the embodiment of the present application also provides a data processing device.
  • the data processing device can also be called a digital processing chip or a chip.
  • the chip includes a processing unit and a communication interface.
  • the processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit.
  • the processing unit is configured to execute the method steps performed by the data processing device shown in any one of the embodiments in FIG. 4 to FIG. 11 .
  • FIG. 21 is a schematic structural diagram of another neural network training device provided by the present application, as described below.
  • the neural network training device may include a processor 2101 and a memory 2102 .
  • the processor 2101 and the memory 2102 are interconnected by wires.
  • the memory 2102 stores program instructions and data.
  • the memory 2102 stores program instructions and data corresponding to the steps in the above-mentioned Fig. 12-Fig. 17 .
  • the processor 2101 is configured to execute the method steps executed by the neural network training device shown in any one of the above-mentioned embodiments in FIG. 12-FIG. 17 .
  • the neural network training device may also include a transceiver 2103 for receiving or sending data.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium is stored with a program for generating the driving speed of the vehicle.
  • the illustrated embodiments describe steps in methods.
  • the aforementioned neural network training device shown in FIG. 21 is a chip.
  • the embodiment of the present application also provides a neural network training device.
  • the neural network training device can also be called a digital processing chip or chip.
  • the chip includes a processing unit and a communication interface.
  • the processing unit obtains program instructions through the communication interface, and the program instructions are processed.
  • the unit executes, and the processing unit is configured to execute the method steps performed by the neural network training device shown in any one of the embodiments in FIG. 12 to FIG. 17 .
  • the embodiment of the present application also provides a digital processing chip.
  • the digital processing chip integrates a circuit and one or more interfaces for realizing the above processor 2001, 2101, or the functions of the processor 2001, 2101.
  • a memory is integrated in the digital processing chip
  • the digital processing chip can complete the method steps in any one or more of the foregoing embodiments.
  • no memory is integrated in the digital processing chip, it can be connected to an external memory through a communication interface.
  • the digital processing chip implements the actions performed by the data processing device in the above-mentioned embodiments according to the program code stored in the external memory.
  • the embodiment of the present application also provides a computer program product, which, when running on a computer, causes the computer to execute the steps in the method described in the foregoing embodiments shown in FIGS. 4-17 .
  • the data processing device provided in the embodiment of the present application may be a chip, and the chip includes: a processing unit and a communication unit.
  • the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin, or a circuit.
  • the processing unit can execute the computer-executed instructions stored in the storage unit, so that the chip in the server executes the neural network training method described in the embodiments shown in FIGS. 4-17 above.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.
  • ROM Read-only memory
  • RAM random access memory
  • the aforementioned processing unit or processor may be a central processing unit (central processing unit, CPU), a network processor (neural-network processing unit, NPU), a graphics processing unit (graphics processing unit, GPU), digital signal processing (digital signal processor, DSP), application specific integrated circuit (ASIC) or field programmable logic gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor or any conventional processor or the like.
  • FIG. 22 is a schematic structural diagram of a chip provided by an embodiment of the present application.
  • the chip may be represented as a neural network processor NPU 220, and the NPU 220 is mounted to the main CPU ( On the Host CPU), the tasks are assigned by the Host CPU.
  • the core part of the NPU is the operation circuit 2203, and the controller 2204 controls the operation circuit 2203 to extract matrix data in the memory and perform multiplication operations.
  • the operation circuit 2203 includes multiple processing units (process engine, PE).
  • arithmetic circuit 2203 is a two-dimensional systolic array.
  • the arithmetic circuit 2203 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition.
  • arithmetic circuit 2203 is a general-purpose matrix processor.
  • the operation circuit fetches the data corresponding to the matrix B from the weight memory 2202, and caches it in each PE in the operation circuit.
  • the operation circuit takes the data of matrix A from the input memory 2201 and performs matrix operation with matrix B, and the obtained partial or final results of the matrix are stored in the accumulator (accumulator) 2208 .
  • the unified memory 2206 is used to store input data and output data.
  • the weight data directly accesses the controller (direct memory access controller, DMAC) 2205 through the storage unit, and the DMAC is transferred to the weight storage 2202.
  • the input data is also transferred to the unified memory 2206 through the DMAC.
  • a bus interface unit (bus interface unit, BIU) 2210 is used for the interaction between the AXI bus, the DMAC and the instruction fetch buffer (IFB) 2209.
  • the bus interface unit 2210 (bus interface unit, BIU) is used for the instruction fetch memory 2209 to obtain instructions from the external memory, and for the storage unit access controller 2205 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • the DMAC is mainly used to move the input data in the external memory DDR to the unified memory 2206 , move the weight data to the weight memory 2202 , or move the input data to the input memory 2201 .
  • the vector calculation unit 2207 includes a plurality of calculation processing units, and if necessary, further processes the output of the calculation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on. It is mainly used for non-convolutional/fully connected layer network calculations in neural networks, such as batch normalization, pixel-level summation, and upsampling of feature planes.
  • the vector computation unit 2207 can store the vector of the processed output to unified memory 2206 .
  • the vector calculation unit 2207 may apply a linear function and/or a nonlinear function to the output of the operation circuit 2203, such as performing linear interpolation on the feature plane extracted by the convolutional layer, and for example, a vector of accumulated values to generate an activation value.
  • the vector computation unit 2207 generates normalized values, pixel-level summed values, or both.
  • the vector of processed outputs can be used as an activation input to arithmetic circuitry 2203, for example for use in subsequent layers in a neural network.
  • An instruction fetch buffer (instruction fetch buffer) 2209 connected to the controller 2204 is used to store instructions used by the controller 2204;
  • the unified memory 2206, the input memory 2201, the weight memory 2202 and the fetch memory 2209 are all On-Chip memories. External memory is private to the NPU hardware architecture.
  • each layer in the cyclic neural network can be performed by the operation circuit 2203 or the vector calculation unit 2207 .
  • the processor mentioned above can be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the program execution of the above-mentioned methods in FIGS. 4-16 .
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be A physical unit can be located in one place, or it can be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the connection relationship between the modules indicates that they have communication connections, which can be specifically implemented as one or more communication buses or signal lines.
  • the essence of the technical solution of this application or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a floppy disk of a computer , U disk, mobile hard disk, read only memory (read only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal A computer, server, or network device, etc.) executes the methods described in various embodiments of the present application.
  • a computer device which can be a personal A computer, server, or network device, etc.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server, or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • wired eg, coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless eg, infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (solid state disk, SSD)), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

一种数据处理方法以及装置,用于使用神经网络来对原始raw数据进行目标增强,在减少增强工作量的同时得到增强效果更好的输出图像。该方法包括:获取第一帧数据(401),第一帧数据是图像传感器采集到的原始数据中的其中一帧;从第一帧数据中获取紧致框对应的数据,得到第一数据(402),紧致框在第一帧数据中覆盖的范围包括从第一帧数据中检测出的目标对象;从第一帧数据中获取宽松框对应的数据,得到第二数据(403),宽松框在第一帧数据中覆盖的范围包括且大于紧致框在第一帧数据中覆盖的范围;将第一数据和第二数据分别作为目标网络的输入,得到输出图像(404),目标网络用于提取输入的数据中的多个通道的信息,并根据多个通道的信息得到输出图像。

Description

一种数据处理方法以及装置
本申请要求于2021年08月30日提交中国专利局、申请号为202111001658.0、申请名称为“一种基于目标增强的ISP系统方法”的中国专利申请的优先权,以及要求于2021年12月01日提交中国专利局、申请号为202111458295.3、申请名称为“一种数据处理方法以及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,尤其涉及一种数据处理方法以及装置。
背景技术
视频成像是当前交通管理、农业管理和工业生产的重要手段。随着成像技术的发展,当前成像设备在理想情况下,可获得较好的成像结果。但随着成像条件裂化,其成像结果会存在分辨率低,对比度差,图像细节丢失等问题。
常用的图像信号处理器(image signal processor,ISP)成像系统涉及大量参数,需要根据各个成像场景特性分别调优,调优工作量巨大。若后续需要进行图像增强,则基于ISP输出的图像进行图像增强。但ISP输出图像的调优工作量巨大,且调优效果受限于调优人工经验,可能存在信息丢失的情况。因此,如何得到更准确的图像,成为亟待解决的问题。
发明内容
本申请提供一种数据处理方法以及装置,用于使用神经网络来对原始raw数据进行目标增强,在减少增强工作量的同时得到增强效果更好的输出图像。
有鉴于此,第一方面,本申请提供一种数据处理方法,包括:获取第一帧数据,第一帧数据是图像传感器采集到的原始数据中的其中一帧;从第一帧数据中获取紧致框对应的数据,得到第一数据,紧致框在第一帧数据中覆盖的范围包括从第一帧数据中检测出的目标对象;从第一帧数据中获取宽松框对应的数据,得到第二数据,宽松框在第一帧数据中覆盖的范围包括且大于紧致框在第一帧数据中覆盖的范围;将第一数据和第二数据分别作为目标网络的输入,得到输出图像,目标网络用于提取输入的数据中的多个通道的信息,并根据多个通道的信息得到输出图像。
本申请实施方式中,通过紧致框扣取了覆盖目标对象的区域的信息,并作为神经网络的输入得到紧致区域的增强结果,通过宽松框提取了宽松区域的信息并作为神经网络的输入,得到宽松区域的增强结果,随后融合紧致区域的增强结果与宽松区域的增强结果,得到目标对象的增强结果。因此,无需通过常用的ISP调优参数,减少了调参工作量,高效准确地得到目标增强结果。并且,可以通过神经网络来替代常用的ISP处理方式,无需对硬件进行升级,在无需提高成本的情况下,实现了对原始(raw)数据更好的增强,得到增强效果更好的图像。
在一种可能的实施方式中,目标网络可以包括第一网络和第二网络;将第一数据和第二数据分别作为目标网络的输入,得到输出图像,可以包括:将第一数据作为第一网络的输入,得到第一增强信息,第一网络用于提取输入的数据中亮度通道的信息;将第二数据 作为第二网络的输入,得到第二增强信息,第二网络用于提取输入的数据的多个通道的信息;融合第一增强信息和第二增强信息,得到输出图像。
因此,本申请实施方式中,目标网络可以包括两部分,分别对紧致框和宽松框对应的信息进行处理,相对于目标网络基于整体raw进行增强,通过不同的子网络分别进行处理可以实现并行处理,且可以降低开销,提高工作效率。
在一种可能的实施方式中,上述方法还可以包括:对第一帧数据进行目标检测,得到第一帧数据中目标对象的位置信息;根据目标对象的位置信息生成紧致框和宽松框。
因此,本申请可以通过目标检测的方式来识别raw数据中的目标对象,从而可以快速准确地找到目标对象的位置,进而准确地确定紧致框和宽松框的位置和尺寸。
在一种可能的实施方式中,上述的获取第一帧数据,可以包括:接收用户输入数据,并根据用户输入数据从原始数据中提取第一帧数据;或者,对原始数据中的每一帧进行目标检测,根据检测结果从原始数据中提取第一帧数据。
因此,本申请实施方式中,可以由用户来选择需要进行目标增强的帧,从而可以针对用户的需求进行目标增强,提高用户体验。或者,也可以通过目标检测,来从raw数据中筛选出需要进行目标增强的帧,从而可以更准确地识别出哪些帧需要进行目标增强,得到更准确清晰的输出图像。
在一种可能的实施方式中,目标网络为结合识别网络以及训练集进行训练得到,识别网络用于获取输入的图像中的语义信息,其中,在对目标网络进行训练的过程中,以识别网络的输出结果作为约束对目标网络进行更新。
因此,本申请实施方式中,在对目标网络进行训练时,可以将识别网络的输出作为约束,从而约束目标网络的输出图像被识别的准确度更高,进而提高目标网络的输出准确度。
第二方面,本申请提供一种神经网络训练方法,包括:获取训练集,训练集中包括图像传感器采集到的原始数据以及对应的真值标签;将训练集作为目标网络的输入,得到增强结果,目标网络用于从输入的数据中提取紧致框对应的亮度通道的信息,以及从输入数据中提取宽松框对应的多个通道的信息,对亮度通道的信息和多个通道的信息进行融合得到增强结果,宽松框在输入的数据中覆盖的范围包括且大于紧致框在输入的数据中覆盖的范围;将训练集作为识别网络的输入,得到第一识别结果,识别网络用于获取输入的图像中的语义信息;将增强结果作为识别网络的输入,得到第二识别结果;根据增强结果和真值标签之间的差值,以及第一识别结果和第二识别结果之间的差值,对目标网络进行更新,得到更新后的目标网络。
本申请实施方式中,可以以识别网络的输出结果为约束,来对目标网络进行更新,从而使目标网络的输出更准确。可以理解为,可以使用识别网络对真值图像与目标网络的输出图像都进行识别,并将识别结果作为约束对目标网络进行更新,使目标网络的输出图像的识别结果与真值图像的识别结果更接近,从输出图像中的目标对象的准确度的维度来使目标网络收敛,提高目标网路的输出图像被识别正确的概率。
在一种可能的实施方式中,目标网络包括第一网络和第二网络,第一网络用于从训练集中的数据中提取紧致框对应的亮度通道的信息,第二网络用于从输入数据中提取宽松框 对应的多个通道的信息。
因此,本申请实施方式中,目标网络可以分为多个部分,从而在训练以及应用的过程中,可以分别对紧致框以及宽松框对应的数据并行进行处理,从而提高目标网路的输出效率。
在一种可能的实施方式中,增强结果中包括第一网络输出的第一信息以及第二网络输出的第二信息,第二识别结果包括第一信息对应的第三识别结果以及第二信息对应的第四识别结果;根据增强结果和真值标签之间的差值,以及第一识别结果和第二识别结果之间的差值,对目标网络进行更新,得到更新后的目标网络,可以包括:根据第一信息和真值标签之间的差值,以及第三识别结果和第一识别结果之间的差值,更新第一网络,得到更新后的第一网络;根据第二信息和真值标签之间的差值,以及第四识别结果和第一识别结果之间的差值,更新第二网络,得到更新后的第二网络。
因此,本申请实施方式中,可以分别对第一网络以及第二网络进行训练,从而可以有针对性地对目标网络中的子网络进行训练,提高子网络的输出准确性,从而提高目标网络的整体输出准确性。
在一种可能的实施方式中,前述的根据增强结果和真值标签之间的差值,以及第一识别结果和第二识别结果之间的差值,对目标网络进行更新,得到更新后的目标网络,可以包括:根据增强结果和真值标签之间的差值得到第一损失值;根据第一识别结果和第二识别结果之间的差值得到第二损失值;融合第一损失值和第二损失值,得到第三损失值;根据第三损失值对目标网络进行更新,得到更新后的目标网络。
因此,本申请实施方式中,也可以对目标网络整体进行训练,从而使目标网络的整体输出效果更好,得到输出更准确的目标网络。
第三方面,本申请提供一种数据处理装置,包括:
获取模块,用于获取第一帧数据,第一帧数据是图像传感器采集到的原始数据中的其中一帧;
紧致扣取模块,用于从第一帧数据中获取紧致框对应的数据,得到第一数据,紧致框在第一帧数据中覆盖的范围包括从第一帧数据中检测出的目标对象;
宽松扣取模块,用于从第一帧数据中获取宽松框对应的数据,得到第二数据,宽松框在第一帧数据中覆盖的范围包括且大于紧致框在第一帧数据中覆盖的范围;
输出模块,用于将第一数据和第二数据分别作为目标网络的输入,得到输出图像,目标网络用于提取输入的数据中的多个通道的信息,并根据多个通道的信息得到输出图像。
在一种可能的实施方式中,目标网络包括第一网络和第二网络;
输出模块,具体用于:将第一数据作为第一网络的输入,得到第一增强信息,第一网络用于提取输入的数据中亮度通道的信息;将第二数据作为第二网络的输入,得到第二增强信息,第二网络用于提取输入的数据的多个通道的信息;融合第一增强信息和第二增强信息,得到输出图像。
在一种可能的实施方式中,装置还包括:目标检测模块,用于对第一帧数据进行目标检测,得到第一帧数据中目标对象的位置信息;根据目标对象的位置信息生成紧致框和宽 松框。
在一种可能的实施方式中,获取模块,具体用于:接收用户输入数据,并根据用户输入数据从原始数据中提取第一帧数据;或者,对原始数据中的每一帧进行目标检测,根据检测结果从原始数据中提取第一帧数据。
在一种可能的实施方式中,目标网络为结合识别网络以及训练集进行训练得到,识别网络用于获取输入的图像中的语义信息,其中,在对目标网络进行训练的过程中,以识别网络的输出结果作为约束对目标网络进行更新。
第四方面,本申请提供一种神经网络训练装置,包括:
获取模块,用于获取训练集,训练集中包括图像传感器采集到的原始数据以及对应的真值标签;
增强模块,用于将训练集作为目标网络的输入,得到增强结果,目标网络用于从输入的数据中提取紧致框对应的亮度通道的信息,以及从输入数据中提取宽松框对应的多个通道的信息,对亮度通道的信息和多个通道的信息进行融合得到增强结果,宽松框在输入的数据中覆盖的范围包括且大于紧致框在输入的数据中覆盖的范围;
语义分割模块,用于将训练集作为识别网络的输入,得到第一识别结果,识别网络用于获取输入的图像中的语义信息;
语义分割模块,还用于将增强结果作为识别网络的输入,得到第二识别结果;
更新模块,用于根据增强结果和真值标签之间的差值,以及第一识别结果和第二识别结果之间的差值,对目标网络进行更新,得到更新后的目标网络。
在一种可能的实施方式中,增强结果中包括第一网络输出的第一信息以及第二网络输出的第二信息,第二识别结果包括第一信息对应的第三识别结果以及第二信息对应的第四识别结果;更新模块,具体用于:根据第一信息和真值标签之间的差值,以及第三识别结果和第一识别结果之间的差值,更新第一网络,得到更新后的第一网络;根据第二信息和真值标签之间的差值,以及第四识别结果和第一识别结果之间的差值,更新第二网络,得到更新后的第二网络。
在一种可能的实施方式中更新模块,具体用于:根据增强结果和真值标签之间的差值得到第一损失值;根据第一识别结果和第二识别结果之间的差值得到第二损失值;融合第一损失值和第二损失值,得到第三损失值;根据第三损失值对目标网络进行更新,得到更新后的目标网络。
第五方面,本申请实施例提供一种数据处理装置,包括:处理器和存储器,其中,处理器和存储器通过线路互联,处理器调用存储器中的程序代码用于执行上述第一方面任一项所示的数据处理方法中与处理相关的功能。可选地,该电子设备可以是芯片。
第六方面,本申请实施例提供一种神经网络训练装置,包括:处理器和存储器,其中,处理器和存储器通过线路互联,处理器调用存储器中的程序代码用于执行上述第一方面任一项所示的神经网络训练方法中与处理相关的功能。可选地,该电子设备可以是芯片。
第七方面,本申请实施例提供了一种电子设备,该电子设备也可以称为数字处理芯片或者芯片,芯片包括处理单元和通信接口,处理单元通过通信接口获取程序指令,程序指 令被处理单元执行,处理单元用于执行如上述第一方面或第二方面任一可选实施方式中与处理相关的功能。
第八方面,本申请实施例提供了一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面任一可选实施方式中的方法。
第九方面,本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面任一可选实施方式中的方法。
附图说明
图1是本申请应用的一种人工智能主体框架示意图;
图2是本申请提供的一种系统架构示意图;
图3是本申请提供的一种电子设备的结构示意图;
图4是本申请提供的一种数据处理方法的流程示意图;
图5是本申请提供的另一种数据处理方法的流程示意图;
图6是本申请提供的一种应用场景示意图;
图7是本申请提供的另一种应用场景示意图;
图8是本申请提供的一种掩膜示意图;
图9是本申请提供的另一种数据处理方法的流程示意图;
图10是本申请提供的另一种应用场景示意图;
图11是本申请提供的另一种应用场景示意图;
图12是本申请提供的一种神经网络训练方法的流程示意图;
图13是本申请提供的一种训练数据示意图;
图14是本申请提供的另一种神经网络训练方法的流程示意图;
图15是本申请提供的另一种应用场景示意图;
图16是本申请提供的另一种应用场景示意图;
图17是本申请提供的另一种应用场景示意图;
图18是本申请提供的一种数据处理装置的结构示意图;
图19是本申请提供的一种神经网络训练装置的结构示意图
图20是本申请提供的另一种数据处理装置的结构示意图;
图21是本申请提供的另一种神经网络训练装置的结构示意图;
图22是本申请提供的一种芯片的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请提供了一种数据处理方法、神经网络训练方法以及装置,结合了神经网络来对 图像传感器采集到的数据进行处理,从而得到增强效果更好的输出图像。为便于理解,下面首先分别对神经网络和本申请提供的包括了图像传感器的电子设备分别进行介绍。
首先对人工智能系统总体工作流程进行描述,请参见图1,图1示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片,如中央处理器(central processing unit,CPU)、网络处理器(neural-network processing unit,NPU)、图形处理器(英语:graphics processing unit,GPU)、专用集成电路(application specific integrated circuit,ASIC)或现场可编程逻辑门阵列(field programmable gate array,FPGA)等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。
(2)数据
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。
(3)数据处理
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。
(5)智能产品及行业应用
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能交通、智能医疗、自动驾驶、智慧城市等。
本申请实施例涉及了神经网络以及图像的相关应用,为了更好地理解本申请实施例的方案,下面先对本申请实施例可能涉及的神经网络的相关术语和概念进行介绍。
(1)神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以x s和截距1为输入的运算单元,该运算单元的输出可以为:
Figure PCTCN2022091839-appb-000001
其中,s=1、2、……n,n为大于1的自然数,W s为x s的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于对神经网络中获取到的特征进行非线性变换,将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数。神经网络是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(2)深度神经网络
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有多层隐含层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。
虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:
Figure PCTCN2022091839-appb-000002
其中,
Figure PCTCN2022091839-appb-000003
是输入向量,
Figure PCTCN2022091839-appb-000004
是输出向量,
Figure PCTCN2022091839-appb-000005
是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量
Figure PCTCN2022091839-appb-000006
经过如此简单的操作得到输出向量。由于DNN层数多,系数W和偏移向量
Figure PCTCN2022091839-appb-000007
的数量也比较多。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2022091839-appb-000008
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。
综上,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2022091839-appb-000009
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵 的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
(3)卷积神经网络
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取特征的方式与位置无关。卷积核可以以随机大小的矩阵的形式化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
(4)损失函数
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。该损失函数通常可以包括误差平方均方、交叉熵、对数、指数等损失函数。例如,可以使用误差均方作为损失函数,定义为
Figure PCTCN2022091839-appb-000010
具体可以根据实际应用场景选择具体的损失函数。
(5)反向传播算法
卷积神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的网络模型中的参数的大小,使得模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的模型中的参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的模型参数,例如,权重矩阵。
本申请实施方式中,在训练阶段,可以采用BP算法来对模型进行训练,得到训练后的模型。
(6)梯度:损失函数关于参数的导数向量。
(7)随机梯度:机器学习中样本数量很大,所以每次计算的损失函数都由随机采样得到的数据计算,相应的梯度称作随机梯度。
(8)YUV:YUV是一种颜色空间,Y表示明亮度(Luminance或Luma),也就是灰阶值;而“U”和“V”表示的则是色度(Chrominance或Chroma),用于指定像素的颜色。“U”和“V”是构成彩色的两个分量。采用YUV色彩空间的重要性是它的亮度信号Y和色度信号U、V是分离的。如果只有Y信号分量而没有U、V信号分量,那么这样表示的图像就是黑白的灰度图像。
(9)裸(raw)数据:Raw数据记录了相机传感器的原始信息,是未经处理、也未经压缩的格式,可以把RAW概念化为“原始图像编码数据”或更形象的称为“数字底片”。
本申请提供了一种神经网络训练方法以及数据处理方法,通过本申请提供的神经网络训练方法得到的神经网络,可以应用于本申请提供的数据处理方法中。本申请提供的数据处理方法可以用于对传感器采集到的原始raw数据进行处理,从而得到输出图像。
下面介绍本申请实施例提供的系统架构。
参见图2,本申请实施例提供了一种系统架构200。如系统架构200所示,数据采集设备260可以用于采集训练数据。在数据采集设备260采集到训练数据之后,将这些训练数据存入数据库230,训练设备220基于数据库230中维护的训练数据训练得到目标模型/规则201。
下面对训练设备220基于训练数据得到目标模型/规则201进行描述。示例性地,训练设备220对多帧样本图像进行处输出对应的预测标签,并计算预测标签和样本的原始标签之间的损失,基于该损失对分类网络进行更新,直到预测标签接近样本的原始标签或者预测标签和原始标签之间的差异小于阈值,从而完成目标模型/规则201的训练。具体描述详见后文中的训练方法。
本申请实施例中的目标模型/规则201具体可以为神经网络。需要说明的是,在实际的应用中,数据库230中维护的训练数据不一定都来自于数据采集设备260的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备220也不一定完全基于数据库230维护的训练数据进行目标模型/规则201的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。
根据训练设备220训练得到的目标模型/规则201可以应用于不同的系统或电子设备中,如应用于图2所示的执行设备220,所述执行设备220可以是终端,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR),车载终端,电视等,还可以是服务器或者云端等电多种电子设备中。在图2中,执行设备220配置有收发器212,该收发器可以包括输入/输出(input/output,I/O)接口或者其他无线或者有线的通信接口等,用于与外部设备进行数据交互,以I/O接口为例,用户可以通过客户设备240向I/O接口输入数据。
在执行设备220对输入数据进行预处理,或者在执行设备220的计算模块212执行计算等相关的处理过程中,执行设备220可以调用数据存储系统250中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统250中。
最后,I/O接口212将处理结果返回给客户设备240,从而提供给用户。
值得说明的是,训练设备220可以针对不同的目标或称不同的任务,基于不同的训练 数据生成相应的目标模型/规则201,该相应的目标模型/规则201即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。
在附图2中所示情况下,用户可以手动给定输入数据,该手动给定可以通过收发器212提供的界面进行操作。另一种情况下,客户设备240可以自动地向收发器212发送输入数据,如果要求客户设备240自动发送输入数据需要获得用户的授权,则用户可以在客户设备240中设置相应权限。用户可以在客户设备240查看执行设备220输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备240也可以作为数据采集端,采集如图所示输入收发器212的输入数据及输出收发器212的输出结果作为新的样本数据,并存入数据库230。当然,也可以不经过客户设备240进行采集,而是由收发器212直接将如图所示输入收发器212的输入数据及输出收发器212的输出结果,作为新的样本数据存入数据库230。
需要说明的是,附图2仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图2中,数据存储系统250相对执行设备220是外部存储器,在其它情况下,也可以将数据存储系统250置于执行设备220中。
如图2所示,根据训练设备220训练得到目标模型/规则201,该目标模型/规则201在本申请实施例中可以是本申请以下提及的中的神经网络。
可以理解为,前述图2中所示出的训练设备,可以用于执行本申请提供的神经网络训练方法,得到训练后的神经网络。训练后的神经网络可以部署于执行设备中,用于执行本申请提供的数据处理方法,即该执行设备可以是本申请提供的电子设备。
本申请实施例中提供的电子设备具体可以包括手持设备、车载设备、可数据处理装置、计算设备等包括图像传感器或者与图像传感器连接的电子设备。还可以包括数码相机(digital camera)、蜂窝电话(cellular phone)、相机、智能手机(smart phone)、个人数字助理(personal digital assistant,PDA)电脑、平板型电脑、膝上型电脑(laptop computer)、机器类型通信(machine type communication,MTC)终端、销售终端(point of sales,POS)、车载电脑、头戴设备、数据处理装置(如手环、智能手表等)、安防设备、虚拟现实(virtual reality,VR)设备、增强现实(augmented reality,AR)设备以及其他具有成像功能的电子设备。
以数码相机为例,数码相机是数字式照相机的简称,是一种利用光电传感器把光学影像转化成数字信号的照相机。与传统相机依靠胶卷上的感光化学物质的变化来记录图像不同,数码相机的传感器是一种光感式的电荷耦合器件(charge-coupled device,CCD)或互补金属氧化物半导体(complementary metal oxide semiconductor,CMOS)。相比于传统相机,数码相机因直接使用光电转换的图像传感器,具有更为便利,快捷,可重复,更具有及时性等优势。随着CMOS加工工艺的发展,数码相机的功能也愈发强大,已几乎全面取代传统胶片式相机,在消费电子,人机交互,计算机视觉,自动驾驶等领域有着极其广泛的应用。
示例性地,图3示出了本申请提供的一种电子设备的示意图,如图所示,电子设备可 以包括镜头(lens)组110、图像传感器(sensor)120和电信号处理器130。电信号处理器130可以包括模数(A/D)转换器131和数字信号处理器132。其中模数转换器131即模拟信号转数字信号转换器,用于将模拟电信号转换为数字电信号。
应理解,图3中示出的电子设备并不限于包括以上器件,还可以包括更多或者更少的其他器件,例如电池、闪光灯、按键、传感器等,本申请实施例仅以安装有图像传感器120的电子设备为例进行说明,但电子设备上安装的元件并不限于此。
被摄物体反射的光信号通过镜头组110汇聚,成像在图像传感器120上。图像传感器120将光信号转换为模拟电信号。模拟电信号在电信号处理器130中通过模数(A/D)转换器131转换为数字电信号,并通过数字信号处理器132对数字电信号进行处理,例如通过一系列复杂的数学算法运算,对数据电信号进行优化,最终输出图像。电信号处理器130还可以包括模拟信号预处理器133,用于将图像传感器传输的模拟电信号进行预处理后输出至模数转换器131。
图像传感器120的性能影响最终输出的图像的质量。图像传感器120也可以称为感光芯片、感光元件等,包含有几十万到几百万的光电转换元件,受到光照射时,会产生电荷,通过模数转换器芯片转换成数字信号。通常,图像传感器输出的数字信号,可以称为裸(raw)数据,即未经图像处理的数据。
一些常用的图像增强方式中,图像信号处理器(image signal processor,ISP)成像系统包含多个功能模块,涉及大量参数,需要根据各个成像场景特性分别调优,调优工作量巨大,且调优效果受限于调优工作人员的经验。此外,一些常用的ISP多关注视频质量调优,但在现实成像应用中,许多场景更关注特定目标成像质量而非视频质量,如交通场景关注车牌区域成像质量,工业检测场景关注特定工件的成像质量。该特定目标通常存在较多先验信息。
具体例如,常用的ISP处理过程的关键处理步骤可以包括数据矫正、细节处理、颜色调整、亮度调整、图像增强等。每个步骤可以包含多个经典算法,各算法都有相应的调优参数,且各步骤之间的顺序可根据实际情况调整,因此传统的ISP处理过程中参数调优是个复杂的工作,需要消耗较大的人力成本。此外,传统ISP处理的场景适应性较差,针对不同的场景需要分别调优,调优工足量巨大。raw数据通常为12(也可以是14或16)bit的无压缩数据,经过ISP处理后,转为为8bit的图像数据,该转化过程存在一定信息损失。此外,若输入的原始RAW数据质量较差,ISP会因为多个处理过程错误累积,在图像中引入伪纹理,且不同的ISP参数引入的伪纹理差别较大。现有的基于特定目标的增强算法大都是在ISP处理之后的RGB或YUV图像上实现,由于传统的ISP处理已经在图像中引入了各异的伪纹理,在RGB或YUV图像上的目标增强算法的效果上限较低,且泛化性较弱,需要分别适配到不同ISP芯片成像风格。若仅在原始Raw数据域实现目标增强,需要配合后续ISP的调优,来获得最终的增强效果,整体流程复杂,且无法避免繁杂的ISP调优工作。
此外,一些常用的图像增强方式,通常是针对ISP处理后的图像进行增强,然而在ISP处理的过程中可能丢失部分信息,导致图像增强效果不好,且参数调优工作量大,导致增强效率低。
因此,本申请提供了一种数据处理方法,可以在不升级成像硬件的前台下,提升特定目标的成像质量同时减少调优工作量,减少开销。
在本申请提供的方法中,通过神经网络来实现对raw数据的增强,无需大量的参数调优,仅需将训练后的神经网络部署于ISP芯片中即可。下面分别对神经网络的应用和训练过程分别进行介绍。
一、应用过程
参阅图4,本申请提供的一种数据处理方法的流程示意图,如下所述。
401、获取第一帧数据。
其中,在步骤401之前,可以获取图像传感器采集到的原始数据,即raw数据,第一帧数据是raw数据中的其中一帧数据,即需要进行图像增强的数据。
可选地,可以接收用户输入数据,然后根据用户输入数据从raw数据中确定第一帧数据;或者,对raw数据中的每一帧进行目标检测,然后根据检测结果从原始数据中提取第一帧数据;或者,也可以从raw数据中随机选取一帧作为第一帧数据等。如从raw数据中提取目标更清晰的一帧作为第一帧数据,或者,从raw数据中提取目标纹理更复杂的一帧作为第一帧数据等,具体可以根据实际应用场景选择,当然也可以对raw数据中的每一帧进行增强,本申请示例性地,以第一帧数据为例进行示例性说明。
通常,第一帧数据中可以包括多个通道的信息,具体的通道数量可以根据图像传感器采集到的数据来确定。例如,图像传感器可以采集亮度、色度等通道的信息。
402、从第一帧数据中获取紧致框对应的数据得到第一数据。
其中,在从raw数据中选取了第一帧数据之后,即可基于该第一帧数据生成紧致框和宽松框,从第一帧数据中提取紧致框对应的数据,即可得到第一数据,该紧致框覆盖了第一帧数据中目标对象所在的区域,从而可以从第一帧数据中提取到包括了目标对象的信息的第一数据。
403、从第一帧数据中获取宽松框对应的数据得到第二数据。
其中,宽松款的尺寸大于且覆盖紧致框在第一帧数据中对应的范围,从第一帧数据中提取宽松框对应的数据得到第二数据。
具体地,可以对第一帧数据进行目标检测,识别出第一帧数据中的目标对象的位置,然后基于目标对象的位置来生成紧致框和宽松框。可以理解为,紧致框在第一帧数据中覆盖的范围包括了目标对象的信息,与目标对象贴合更紧密,宽松框在第一帧数据中覆盖的范围,除了包括目标对象的信息,还可以包括目标对象相邻的像素点的信息。
404、将第一数据和第二数据分别作为目标网络的输入得到输出图像。
其中,目标网络用于基于第一数据和第二数据对目标对象进行增强,从而得到增强后的图像,即输出图像。
可选地,目标网络可以包括多个部分,如可以包括第一网络和第二网络,第一网络可以用于从输入的输入中提取亮度信息并进行增强,第二网络可以用于从输入的数据中分别提取多个通道的信息并进行增强。可以将第一数据作为第一网络的输入,得到第一增强信息,将第二数据作为第二网络的输入,得到第二增强信息,融合第一增强信息和第二增强 信息,从而得到输出图像。因此,可以通过紧致框来对目标对象的纹理细节进行增强,从而提高目标对象的清晰度,使输出图像中的目标对象更清晰。
当然,第一网络除了可以提取亮度信息并进行增强之外,也可以对其他通道的信息进行提取并增强,本申请示例性地,以从输入的输入中提取亮度信息并进行增强为例进行示例性说明,并不作为限定。
因此,本申请实施方式中,可以通过紧致框来提取目标对象中亮度通道的信息并进行增强,通过宽松框来提取目标对象及其附近的多个通道的信息并进行增强,从而可以对目标对象的纹理进行增强,从而使输出图像包括的目标对象更清晰,得到增强了目标对象的输出图像。如在一些成像成精中,更关注特定目标的成像质量,本申请通过对目标对象进行图像增强,从而提高输出图像的成像效果。且相对于常用的ISP,其调参工作量大,且在处理过程中可能存在信息丢失,本申请通过神经网络来对原始的包括全量信息的raw数据进行处理,实现端到端的目标增强,减少调参工作量。
此外,在对目标网络进行训练时,可以结合识别网络以及训练集进行训练,该识别网络可以用于获取输入的图像中的语义信息,如可以通过识别网络来识别输入图像中的对象的信息,如对象的类别、大小或位置等信息,在对目标网络进行训练的过程中,可以以识别网络的输出结果来作为约束对目标网络进行更新,得到更新后的目标网络。因此,本申请实施方式中,在训练目标网络时,可以以识别网络的输出结果来作为约束,使目标网络的输出结果的识别结果更准确,从而可以提高目标网络的输出图像的准确度以及清晰度,提高目标网络的图像增强效果。
可以理解为,本申请可以将神经网络部署于ISP系统中,从而可以通过神经网络来代替常用的ISP处理系统,从而提高目标增强效率。
示例性地,本申请提供的另一种数据处理方法的流程可以如图5所示。
501、获取raw数据。
通常,该raw数据可以是由图像传感器采集得到,为了基于未压缩的raw数据来实现端到端的目标增强,可以在ISP系统中缓存一定量的raw数据。例如,在传感器采集到raw数据之后,即可传输至ISP系统中,进行后续处理。
502、提取目标帧。
在得到raw数据之后,可以对其中的每一帧都进行目标增强,也可以是从raw数据中选择一帧或者多帧进行目标增强。示例性地,本实施例以选择其中一帧作为目标帧(即前述的第一帧数据)进行目标增强为例进行示例性说明。
具体地,可以对raw数据进行目标检测,从raw数据中筛选出了包括了目标对象的一帧或者多帧作为目标帧。或者,可以由用户来选择将某一帧作为目标帧,并从raw数据中提取到目标帧。
例如,可以使用目标检测算法,获取视频的每一帧中目标对象的大小、清晰度等信息,然后从视频中选择其中目标对象的大小相对更大或者清晰度更高的一帧作为目标帧,同时可以得到目标帧中目标对象的位置。
又例如,在一些场景中,如工业检测场景中,可以由用户来设定目标帧以及目标对象 的位置,如用户设置特定帧号作为抓拍帧,同时可以通过摄像机与目标对象之间的相对位置关系,确定目标对象在画面中的具体位置。
此外,在得到目标帧之后,可以对目标帧进行目标检测,识别出目标帧中的目标对象的具体信息,或者通过用户手动设置来确定目标对象的信息,如目标对象的位置、尺寸或者形状等信息。然后即可基于目标对象的具体信息生成紧致框和宽松框,紧致框可以理解为与目标对象紧密贴合的框,宽松框除了包括紧致框覆盖的范围,还可以包括目标对象邻近的一定范围内的范围。可以理解为,紧致框紧致包裹目标对象,与目标对象的轮廓更贴合,而宽松框大于且覆盖紧致框,包括了目标对象及其周围的一定范围。
503、扣取紧致框作为Target_ISPNet输入。
其中,为便于理解,本实施例将第一网络称为Target_ISPNet,可以从目标帧中提取紧致框所覆盖的范围的信息,并将提取到的信息作为Target_ISPNet的输入,输出亮度增强信息。
例如,如图6所示,可以从目标帧中扣取紧致框601覆盖的信息,作为Target_ISPNet的输入,得到紧致框覆盖的范围的细节、对比度增强后的亮度通道的信息,如表示为Yt。
当然,Target_ISPNet也可以对其他通道如色度通道的信息进行增强,本申请示例性地,以Target_ISPNet对亮度通道的信息进行提取并增强为例进行示例性说明。
504、扣取宽松框作为Full_ISPNet输入。
其中,为便于理解,本实施例将第二网络称为Full_ISPNet,可以从目标帧中提取宽松框覆盖的信息,并将提取到的信息作为Full_ISPNet的输入,输出对各个通道都进行增强后的多通道增强信息。
例如,如图7所示,可以从目标帧中扣取宽松框602覆盖范围的信息,作为Full_ISPNet的输入,得到宽松框覆盖范围内的亮度通道增强信息Yf以及颜色通道的增强信息UfVf。
505、融合增强信息得到输出图像。
在得到亮度通道增强信息以及多个通道增强信息之后,即可融合亮度通道增强信息以及多个通道增强信息,得到增强后的输出图像。
将Target_ISPNet的亮度通道输出Yt,与Full_ISPNet的输出YfUfVf融合,获得最终的目标增强结果,该融合可以通过多种方式来实现,如加权融合、泊松融合的等方式来进行融合。
示例性地,以加权融合为例,通常紧致框和宽松框在紧致区域内的纹理一致,紧致区域即紧致框覆盖的范围,可以使用掩膜(mask)与Yt和YfUfVf融合。通常可以从紧致框对应的区域中提取亮度信息并进行增强,因此在融合的过程中,亮度通道可以融合Yf和Yt,UV则可以使用UfVf。当然,若Target_ISPNet也输出UV通道的增强信息,也可以融合Target_ISPNet输出的UV通道的信息和Full_ISPNet输出的UV通道的信息融合,具体可以根据实际应用场景来调整。
如各个通道的融合方式可以表示为:
Yout=Yt*mask+Yf*(1-mask)
Uout=Uf
Vout=Vf
该掩膜中的像素值可以作为融合权重,如图8所示,如紧致框中心区域内接近1,紧致框外区域接近0,紧致框附可以使用近线性过度。
为便于理解,以交通场景中的车牌图像增强为例进行示例性说明,整体流程可以参阅图9。
首先,抓拍到的车牌图像如图10所示,可以通过目标检测网络来识别图像中车牌所在的位置,然后根据车牌的位置生成紧致框和宽松框。如图11所示,紧致包围在车牌周围的检测框称为紧致框,将紧致框外扩一定范围即可得到宽松框,外扩的比例可以预先设定,也可以由用户来设定,如可以由用户来设定需要增强的车牌部分的大小。如在车牌抓拍图中,紧致框的大小占宽松框的30%。
通常,常用的ISP系统需要实现细节、对比度、亮度、颜色等信息的调整与增强。而在ISP处理过程中,细节增强、对比度增强、亮度调整与颜色调整任务差异较大,细节与对比度属于高频信息,而亮度和颜色属于低频信息,因此,细节、对比度增强的任务难度较高,需要消耗更多网络算例。此外,人眼通常只关注目标紧致框内的细节与对比度增强,因此,本申请提供的目标网络中可以包括至少两部分,即针对紧致框的增强网络Target_ISPNet以及针对宽松框的增强网络Full_ISPNet,从而通过Target_ISPNet对检测目标的亮度通道进行增强,实现紧致框区域的细节以及对比度的增强,通过Full_ISPNet来实现宽松框区域的亮度调整和颜色调整,从而从多个维度提高目标对象的清晰度,提高用户的观感体验。
为充分挖掘目标成像对象的先验信息,从系统中获取目标帧号及对应目标位置后,从目标帧中抠取紧致框与宽松框覆盖的范围分别作为Target_ISPNet和Full_ISPNet的输入。如图9中所示,Target_ISPNet获得细节对比度增强后的紧致区域的亮度通道Yt,Full_ISPNet则获得宽松区域的亮度以颜色调整后的Yf、Uf、Vf,融合Yt和Yf,得到融合后的紧致区域的亮度信息,然后将紧致区域的亮度信息贴回宽松框区域,得到宽松框区域的多通道信息。
本申请实施方式中,通过紧致框扣取了覆盖目标对象的区域的信息,并作为神经网络的输入得到紧致区域的增强结果,通过宽松框提取了宽松区域的信息并作为神经网络的输入,得到宽松区域的增强结果,随后融合紧致区域的增强结果与宽松区域的增强结果,得到目标对象的增强结果。因此,无需通过常用的ISP调优参数,减少了调参工作量,高效准确地得到目标增强结果。并且,可以通过神经网络来替代常用的ISP处理方式,无需对硬件进行升级,在无需提高成本的情况下,实现了对raw数据更好的增强。
二、训练过程
前述对本申请提供的数据处理方法进行了介绍,其中,将从目标帧中扣取的紧致框和宽松框对应的数据分别作为目标网络的输入,从而得到目标增强的结果,目标网络可以是经过训练的神经网络,参阅图12,本申请提供的一种神经网络训练方法,如下所述。
1201、获取训练集。
其中,该训练集中可以包括多个样本以及每个样本对应的标签,该训练集中可以包括 图像传感器采集到的raw数据以及对应的真值图像。
可以理解为,该训练集中可以包括图像传感器采集到的raw数据(即样本)以及进行增强后的图像,即真值图像。
通常,训练数据与目标网络的训练方式相关,例如,若分别对目标网络的子网络,即Target_ISPNet和Full_ISPNet进行训练,则训练集中的真值图像可以分为多种,如紧致框对应的真值图像以及宽松框对应的图像,若对目标网络整体进行训练,则训练集中可以包括raw数据以及对应的经过增强后的图像。
示例性地,raw数据和对应的真值图像可以如图13所示,其中raw数据可以包括拍摄到的携带噪声的车牌raw数据,以及对应的经过降噪、细节增强、对比度增强等ISP处理后的真值图像。
1202、将训练集作为目标网络的输入,得到增强结果。
其中,该目标网络可以用于从输入的数据中提取紧致框对应的亮度通道的信息,以及从输入数据中提取宽松框对应的多个通道的信息,对亮度通道的信息和多个通道的信息进行融合得到增强结果。
可选地,目标网络包括第一网络和第二网络,第一网络用于从训练集中的数据中提取紧致框对应的亮度通道的信息,第二网络用于从输入数据中提取宽松框对应的多个通道的信息。
相应地,增强结果中可以包括第一网络输出的第一信息以及第二网络输出的第二信息
具体地,目标网络可以参阅前述图4-图11所提及的目标网络,此处不再赘述。
需要说明的是,本申请实施方式中,针对目标网络的训练过程可以是迭代训练的过程,可以对目标网络进行多次迭代更新,即步骤1202-步骤1206可以多次执行,本实施例示例性地,仅对其中一次迭代过程进行示例性说明,并不作为限定。
1203、将训练集作为识别网络的输入,得到第一识别结果。
其中,该识别网络可以用于识别输入的图像中的目标对象的信息,得到第一识别结果,如目标对象的位置、类别、大小等信息。该识别网络具体可以包括目标检测网络、语义分割网络或者分类网络等,可以用于提取输入图像中的语义信息,然后可以基于提取到的语义信息进行检测、分割或者分类等任务,得到输入图像中的对象的具体信息。
1204、将增强结果作为识别网络的输入,得到第二识别结果。
其中,将目标网络的输出结果也作为识别网络的输入,得到第二识别结果,该第二识别结果中可以包括识别网络识别出的增强结果中的目标对象的信息,如目标对象的位置、类别、大小等信息。
此外,第二识别结果中可以包括第一信息对应的第三识别结果以及第二信息对应的第四识别结果。可以理解为,目标网络的输出结果中可以包括第一网络的输出结果和第二网络的输出结果,可以使用识别网络分别对第一网络的输出结果和第二网络的输出结果进行语义分割,分别得到第一信息中目标对象的信息以及第二信息中目标对象的信息。
1205、根据增强结果和真值标签之间的差值,以及第一识别结果和第二识别结果之间的差值,对目标网络进行更新,得到更新后的目标网络。
其中,可以计算增强结果和真值标签之间的损失值,以及第一识别结果和第二识别结果之间的损失值,将第一识别结果和第二识别结果之间的损失值作为约束,使用增强结果和真值标签之间的损失值对目标网络进行更新,得到更新后的目标网络。
具体地,目标网络中可以包括第一网络和第二网络,若针对目标网络整体进行更新,则可以计算目标网络的整体损失值,若对第一网络和第二网络分别进行更新,则分别针对第一网络的输出和第二网络的输出来计算损失值,从而分别对第一网络和第二网络进行更新。
更具体地,若对第一网络和第二网络分别进行更新,第二识别结果中可以包括第三识别结果和第四识别结果,则可以根据所述第一信息和所述真值标签之间的差值,以及所述第三识别结果和所述第一识别结果之间的差值,更新所述第一网络,得到更新后的第一网络;根据所述第二信息和所述真值标签之间的差值,以及所述第四识别结果和所述第一识别结果之间的差值,更新所述第二网络,得到更新后的第二网络,从而实现对目标网络的更新。
若对目标网络整体进行更新,则可以根据增强结果和真值标签之间的差值得到第一损失值;根据第一识别结果和第二识别结果之间的差值得到第二损失值;融合第一损失值和第二损失值,得到第三损失值;根据第三损失值对目标网络进行更新,得到更新后的目标网络。
因此,本申请实施方式中,针对目标网络的更新提供了多种方式,可以以识别网络的输出结果为约束,来对目标网络进行更新,从而使目标网络的输出更准确。可以理解为,可以使用识别网络对真值图像与目标网络的输出图像都进行识别,并将识别结果作为约束对目标网络进行更新,使目标网络的输出图像的识别结果与真值图像的识别结果更接近,从输出图像中的目标对象的准确度的维度来使目标网络收敛,提高目标网路的输出图像被识别正确的概率。例如,若真值图像中识别出目标对象为“狗”,训练目标网络的过程即为使针对目标网络的输出图像的识别结果也更接近“狗”,从而使目标网络的输出结果更准确。
1206、判断是否满足收敛条件,若是,则执行步骤1207,若否,则继续执行步骤1202。
在得到更新后的目标网络之后,可以判断是否符合收敛条件,若符合收敛条件,则可以输出更新后的目标网络,即完成对目标网络的训练。若不符合收敛条件,则可以将更新后的目标网络作为新的目标网络,并继续对新的目标网络进行训练,即重复执行步骤1202,直到满足收敛条件。
其中,该收敛条件可以包括以下一项或者多项:对目标网络的训练次数达到预设次数,或者,目标网络的输出精度高于预设精度值,或者,目标网络的平均精度高于预设平均值,或者,目标网络的训练时长超过预设时长,或者,识别网络针对目标网络的输出图像的识别准确率高于预设准确率等。
1207、停止训练。
在确定目标网络满足收敛条件之后,即得到了符合期望的目标网络,则可以停止对目标网络的训练,输出当前次迭代更新后的目标网络。
因此,本申请实施方式中,在对目标网络进行训练的过程中,使用了识别结果针对真 值图像以及输出图像的识别结果之间的损失值作为约束,使目标网络的输出图像的识别结果和真值图像的识别结果更接近,相当于使目标对象的输出图像中的目标对象与真值图像更接近,提高目标网络的输出图像的清晰度以及准确度。
示例性地,为便于理解,详细流程如图14所示,以一个详细的应用场景为例进行示例性说明。
首先,将训练集中分别作为目标网络和识别网络的输入。
其中,可以将训练集中的每个样本中的raw数据作为目标网路的输入,得到输出图像。
分别将每个样本中的真值图像(如表示为Gt)作为识别网络的输入,得到第一识别结果。
以及将目标网络的输出图像作为识别网络的输出,得到第二识别结果。
然后分别基于输出图像、第一识别结果和第二识别结果计算损失值,并使用损失值更新目标网络。
根据损失值判断目标网络是否收敛,若是,则可以停止迭代,若否,则可以继续进行迭代。
在对目标网络的更新过程中,可以针对目标网络整体进行更新,则在计算损失值的过程中,可以计算输出图像和真值图像之间的损失值,以及第一识别结果和第二识别结果之间的损失值,然后融合这两种损失值来对目标网络进行反向更新。
若对Target_ISPNet和Full_ISPNet分别进行更新,则可以分别计算各个网络的损失值,然后分别进行更新。
例如,以车牌分割为例,针对紧致框增强任务,利用高质量彩色图像Gt_Y做监督训练,其对应的损失函数(Loss)可以表示为:
Figure PCTCN2022091839-appb-000011
其中P表示不同位置的像素,Yt表示Target_ISPNet网络输出。该损失函数约束输出图像与增强后的高清图像尽可能相似,实现紧致区域增强。
为了进一步提升车牌紧致区域增强效果,在训练过程中,增加车牌分割子任务。首先利分割网络获得车牌分割结果Lable_gt,作为分割子任务的Gt。紧致区域的真值图像以及分割结果可以如图15所示。
通常,仅需在Target_ISPNet网络增加一个小模块,获得分割输出Lable out,仅需在训练时执行,不增加应用侧的开销,如分割子任务的损失可以表示为:
Figure PCTCN2022091839-appb-000012
针对Target_ISPNet的损失最终可以表示为:
L=L inhance+γL semantic
可以通过调整γ参数来整增强任务与语义分割任务权重。
又例如,针对Full_ISPNet的训练过程,可以利用高质量彩色图像YUVgt做监督训练, 其对应的损失函数(Loss)可以表示为:
Figure PCTCN2022091839-appb-000013
其中P表示不同位置的像素,Yf Uf Vf表示Full_ISPNet网络的输出。该损失函数约束输出图像与训练数据中的高清图像尽可能相似,实现车牌宽松区域增强。
获取紧致框与宽松框的增强结果后,可以利用加权融合获得车牌增强的最终结果,其融合公式可以表示为:
Yout=Yt*mask+Yf*(1-mask)
Uout=Uf
Vout=Vf
示例性地,Yf Uf Vf以及mask可以如图16所示。
因此,考虑目标图像特性,进一步将目标分为紧致框输入与宽松框输入,减少网络开销。该示例中,若仅用单一网络实现车牌目标增强,为达到相同的增强效果,其网络开销大概需要200ms,而拆分为子网络后,Target_ISPNet的开销为30ms和Full_ISPNet的开销为10ms,网络性能提升了5倍。
本申请实施方式中,在对目标网络更新的过程中,引入了分割任务的分割结果作为约束,从而可以约束目标网络的输出图像的分割结果与真值图像的分割结果更接近,从而可以使目标网络的输出图像更准确。并且不增加目标网络应用时的开销,使目标网络可以替代常用的ISP处理方式,高效地对raw数据进行处理得到输出图像,且得到的输出图像更准确。
例如,常用的ISP处理方式的成像效果与目标网络的成像效果对比可以如图17所示。显然,通过目标网络输出的车牌的成像效果更优,增强效果更好。
前述对本申请提供的方法流程进行了介绍,下面对本申请提供的装置进行介绍。
参阅图18,本申请提供一种数据处理装置的结构示意图,包括:
获取模块1801,用于获取第一帧数据,第一帧数据是图像传感器采集到的原始数据中的其中一帧;
紧致扣取模块1802,用于从第一帧数据中获取紧致框对应的数据,得到第一数据,紧致框在第一帧数据中覆盖的范围包括从第一帧数据中检测出的目标对象;
宽松扣取模块1803,用于从第一帧数据中获取宽松框对应的数据,得到第二数据,宽松框在第一帧数据中覆盖的范围包括且大于紧致框在第一帧数据中覆盖的范围;
输出模块1804,用于将第一数据和第二数据分别作为目标网络的输入,得到输出图像,目标网络用于提取输入的数据中的多个通道的信息,并根据多个通道的信息得到输出图像。
在一种可能的实施方式中,目标网络包括第一网络和第二网络;
输出模块1804,具体用于:将第一数据作为第一网络的输入,得到第一增强信息,第一网络用于提取输入的数据中亮度通道的信息;将第二数据作为第二网络的输入,得到第二增强信息,第二网络用于提取输入的数据的多个通道的信息;融合第一增强信息和第二增强信息,得到输出图像。
在一种可能的实施方式中,装置还包括:目标检测模块1805,用于对第一帧数据进行目标检测,得到第一帧数据中目标对象的位置信息;根据目标对象的位置信息生成紧致框和宽松框。
在一种可能的实施方式中,获取模块1804,具体用于:接收用户输入数据,并根据用户输入数据从原始数据中提取第一帧数据;或者,对原始数据中的每一帧进行目标检测,根据检测结果从原始数据中提取第一帧数据。
在一种可能的实施方式中,目标网络为结合识别网络以及训练集进行训练得到,其中,在对目标网络进行训练的过程中,以识别网络的输出结果作为约束对目标网络进行更新。
参阅图19,本申请提供的一种神经网络训练装置的结构示意图,包括:
获取模块1901,用于获取训练集,训练集中包括图像传感器采集到的原始数据以及对应的真值标签;
增强模块1902,用于将训练集作为目标网络的输入,得到增强结果,目标网络用于从输入的数据中提取紧致框对应的亮度通道的信息,以及从输入数据中提取宽松框对应的多个通道的信息,对亮度通道的信息和多个通道的信息进行融合得到增强结果,宽松框在输入的数据中覆盖的范围包括且大于紧致框在输入的数据中覆盖的范围;
语义分割模块1903,用于将训练集作为识别网络的输入,得到第一识别结果;
语义分割模块1903,还用于将增强结果作为识别网络的输入,得到第二识别结果;
更新模块1904,用于根据增强结果和真值标签之间的差值,以及第一识别结果和第二识别结果之间的差值,对目标网络进行更新,得到更新后的目标网络。
在一种可能的实施方式中,增强结果中包括第一网络输出的第一信息以及第二网络输出的第二信息,第二识别结果包括第一信息对应的第三识别结果以及第二信息对应的第四识别结果;更新模块1904,具体用于:根据第一信息和真值标签之间的差值,以及第三识别结果和第一识别结果之间的差值,更新第一网络,得到更新后的第一网络;根据第二信息和真值标签之间的差值,以及第四识别结果和第一识别结果之间的差值,更新第二网络,得到更新后的第二网络。
在一种可能的实施方式中更新模块1904,具体用于:根据增强结果和真值标签之间的差值得到第一损失值;根据第一识别结果和第二识别结果之间的差值得到第二损失值;融合第一损失值和第二损失值,得到第三损失值;根据第三损失值对目标网络进行更新,得到更新后的目标网络。
请参阅图20,本申请提供的另一种数据处理装置的结构示意图,如下所述。
该数据处理装置可以包括处理器2001和存储器2002。该处理器2001和存储器2002通过线路互联。其中,存储器2002中存储有程序指令和数据。
存储器2002中存储了前述图4-图11中的步骤对应的程序指令以及数据。
处理器2001用于执行前述图4-图11中任一实施例所示的数据处理装置执行的方法步骤。
可选地,该数据处理装置还可以包括收发器2003,用于接收或者发送数据。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用 于生成车辆行驶速度的程序,当其在计算机上行驶时,使得计算机执行如前述图4-图11所示实施例描述的方法中的步骤。
可选地,前述的图20中所示的数据处理装置为芯片,如ISP芯片。
本申请实施例还提供了一种数据处理装置,该数据处理装置也可以称为数字处理芯片或者芯片,芯片包括处理单元和通信接口,处理单元通过通信接口获取程序指令,程序指令被处理单元执行,处理单元用于执行前述图4-图11中任一实施例所示的数据处理装置执行的方法步骤。
请参阅图21,本申请提供的另一种神经网络训练装置的结构示意图,如下所述。
该神经网络训练装置可以包括处理器2101和存储器2102。该处理器2101和存储器2102通过线路互联。其中,存储器2102中存储有程序指令和数据。
存储器2102中存储了前述图12-图17中的步骤对应的程序指令以及数据。
处理器2101用于执行前述图12-图17中任一实施例所示的神经网络训练装置执行的方法步骤。
可选地,该神经网络训练装置还可以包括收发器2103,用于接收或者发送数据。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于生成车辆行驶速度的程序,当其在计算机上行驶时,使得计算机执行如前述图12-图17所示实施例描述的方法中的步骤。
可选地,前述的图21中所示的神经网络训练装置为芯片。
本申请实施例还提供了一种神经网络训练装置,该神经网络训练装置也可以称为数字处理芯片或者芯片,芯片包括处理单元和通信接口,处理单元通过通信接口获取程序指令,程序指令被处理单元执行,处理单元用于执行前述图12-图17中任一实施例所示的神经网络训练装置执行的方法步骤。
本申请实施例还提供一种数字处理芯片。该数字处理芯片中集成了用于实现上述处理器2001、2101,或者处理器2001、2101的功能的电路和一个或者多个接口。当该数字处理芯片中集成了存储器时,该数字处理芯片可以完成前述实施例中的任一个或多个实施例的方法步骤。当该数字处理芯片中未集成存储器时,可以通过通信接口与外置的存储器连接。该数字处理芯片根据外置的存储器中存储的程序代码来实现上述实施例中数据处理装置执行的动作。
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上行驶时,使得计算机执行如前述图4-图17所示实施例描述的方法中的步骤。
本申请实施例提供的数据处理装置可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使服务器内的芯片执行上述图4-图17所示实施例描述的神经网络训练方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。
具体地,前述的处理单元或者处理器可以是中央处理器(central processing unit,CPU)、网络处理器(neural-network processing unit,NPU)、图形处理器(graphics processing unit,GPU)、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)或现场可编程逻辑门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者也可以是任何常规的处理器等。
示例性地,请参阅图22,图22为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 220,NPU 220作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路2203,通过控制器2204控制运算电路2203提取存储器中的矩阵数据并进行乘法运算。
在一些实现中,运算电路2203内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路2203是二维脉动阵列。运算电路2203还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路2203是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器2202中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器2201中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)2208中。
统一存储器2206用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(direct memory access controller,DMAC)2205,DMAC被搬运到权重存储器2202中。输入数据也通过DMAC被搬运到统一存储器2206中。
总线接口单元(bus interface unit,BIU)2210,用于AXI总线与DMAC和取指存储器(instruction fetch buffer,IFB)2209的交互。
总线接口单元2210(bus interface unit,BIU),用于取指存储器2209从外部存储器获取指令,还用于存储单元访问控制器2205从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器2206或将权重数据搬运到权重存储器2202中或将输入数据数据搬运到输入存储器2201中。
向量计算单元2207包括多个运算处理单元,在需要的情况下,对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如批归一化(batch normalization),像素级求和,对特征平面进行上采样等。
在一些实现中,向量计算单元2207能将经处理的输出的向量存储到统一存储器2206。例如,向量计算单元2207可以将线性函数和/或非线性函数应用到运算电路2203的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元2207生成归一化的值、像素级求和的值,或二者均有。在一些 实现中,处理过的输出的向量能够用作到运算电路2203的激活输入,例如用于在神经网络中的后续层中的使用。
控制器2204连接的取指存储器(instruction fetch buffer)2209,用于存储控制器2204使用的指令;
统一存储器2206,输入存储器2201,权重存储器2202以及取指存储器2209均为On-Chip存储器。外部存储器私有于该NPU硬件架构。
其中,循环神经网络中各层的运算可以由运算电路2203或向量计算单元2207执行。
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述图4-图16的方法的程序执行的集成电路。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如, DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
最后应说明的是:以上,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。

Claims (22)

  1. 一种数据处理方法,其特征在于,包括:
    获取第一帧数据,所述第一帧数据是图像传感器采集到的原始数据中的其中一帧;
    从所述第一帧数据中获取紧致框对应的数据,得到第一数据,所述紧致框在所述第一帧数据中覆盖的范围包括从所述第一帧数据中检测出的目标对象;
    从所述第一帧数据中获取宽松框对应的数据,得到第二数据,所述宽松框在所述第一帧数据中覆盖的范围包括且大于所述紧致框在所述第一帧数据中覆盖的范围;
    将所述第一数据和所述第二数据分别作为目标网络的输入,得到输出图像,所述目标网络用于提取输入的数据中的多个通道的信息,并根据所述多个通道的信息得到所述输出图像。
  2. 根据权利要求1所述的方法,其特征在于,所述目标网络包括第一网络和第二网络;
    所述将所述第一数据和所述第二数据分别作为目标网络的输入,得到输出图像,包括:
    将所述第一数据作为第一网络的输入,得到第一增强信息,所述第一网络用于提取输入的数据中亮度通道的信息;
    将所述第二数据作为第二网络的输入,得到第二增强信息,所述第二网络用于提取输入的数据的多个通道的信息;
    融合所述第一增强信息和第二增强信息,得到输出图像。
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:
    对所述第一帧数据进行目标检测,得到所述第一帧数据中所述目标对象的位置信息;
    根据所述目标对象的位置信息生成所述紧致框和所述宽松框。
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,所述获取第一帧数据,包括:
    接收用户输入数据,并根据所述用户输入数据从所述原始数据中提取所述第一帧数据;
    或者,
    对所述原始数据中的每一帧进行目标检测,根据检测结果从所述原始数据中提取所述第一帧数据。
  5. 根据权利要求1-4中任一项所述的方法,其特征在于,
    所述目标网络为结合识别网络以及训练集进行训练得到,所述识别网络用于获取输入的图像中的语义信息;
    其中,在对所述目标网络进行训练的过程中,以所述识别网络的输出结果作为约束对所述目标网络进行更新。
  6. 一种神经网络训练方法,其特征在于,包括:
    获取训练集,所述训练集中包括图像传感器采集到的原始数据以及对应的真值标签;
    将所述训练集作为目标网络的输入,得到增强结果,所述目标网络用于从输入的数据中提取紧致框对应的亮度通道的信息,以及从输入数据中提取宽松框对应的多个通道的信息,对所述亮度通道的信息和所述多个通道的信息进行融合得到所述增强结果,所述宽松框在输入的数据中覆盖的范围包括且大于所述紧致框在输入的数据中覆盖的范围;
    将所述训练集作为识别网络的输入,得到第一识别结果,识别网络用于获取输入的图像中的语义信息;
    将所述增强结果作为所述识别网络的输入,得到第二识别结果;
    根据所述增强结果和所述真值标签之间的差值,以及所述第一识别结果和所述第二识别结果之间的差值,对所述目标网络进行更新,得到更新后的目标网络。
  7. 根据权利要求6所述的方法,其特征在于,所述目标网络包括第一网络和第二网络,所述第一网络用于从所述训练集中的数据中提取紧致框对应的亮度通道的信息,所述第二网络用于从输入数据中提取宽松框对应的多个通道的信息。
  8. 根据权利要求7所述的方法,其特征在于,所述增强结果中包括所述第一网络输出的第一信息以及所述第二网络输出的第二信息,所述第二识别结果包括所述第一信息对应的第三识别结果以及所述第二信息对应的第四识别结果;
    所述根据所述增强结果和所述真值标签之间的差值,以及所述第一识别结果和所述第二识别结果之间的差值,对所述目标网络进行更新,得到更新后的目标网络,包括:
    根据所述第一信息和所述真值标签之间的差值,以及所述第三识别结果和所述第一识别结果之间的差值,更新所述第一网络,得到更新后的第一网络;
    根据所述第二信息和所述真值标签之间的差值,以及所述第四识别结果和所述第一识别结果之间的差值,更新所述第二网络,得到更新后的第二网络。
  9. 根据权利要求6或7所述的方法,其特征在于,所述根据所述增强结果和所述真值标签之间的差值,以及所述第一识别结果和所述第二识别结果之间的差值,对所述目标网络进行更新,得到更新后的目标网络,包括:
    根据所述增强结果和所述真值标签之间的差值得到第一损失值;
    根据所述第一识别结果和所述第二识别结果之间的差值得到第二损失值;
    融合所述第一损失值和所述第二损失值,得到第三损失值;
    根据所述第三损失值对所述目标网络进行更新,得到更新后的目标网络。
  10. 一种数据处理装置,其特征在于,包括:
    获取模块,用于获取第一帧数据,所述第一帧数据是图像传感器采集到的原始数据中的其中一帧;
    紧致扣取模块,用于从所述第一帧数据中获取紧致框对应的数据,得到第一数据,所述紧致框在所述第一帧数据中覆盖的范围包括从所述第一帧数据中检测出的目标对象;
    宽松扣取模块,用于从所述第一帧数据中获取宽松框对应的数据,得到第二数据,所述宽松框在所述第一帧数据中覆盖的范围包括且大于所述紧致框在所述第一帧数据中覆盖的范围;
    输出模块,用于将所述第一数据和所述第二数据分别作为目标网络的输入,得到输出图像,所述目标网络用于提取输入的数据中的多个通道的信息,并根据所述多个通道的信息得到所述输出图像。
  11. 根据权利要求10所述的装置,其特征在于,所述目标网络包括第一网络和第二网络;
    所述输出模块,具体用于:
    将所述第一数据作为第一网络的输入,得到第一增强信息,所述第一网络用于提取输入的数据中亮度通道的信息;
    将所述第二数据作为第二网络的输入,得到第二增强信息,所述第二网络用于提取输入的数据的多个通道的信息;
    融合所述第一增强信息和第二增强信息,得到输出图像。
  12. 根据权利要求10或11所述的装置,其特征在于,所述装置还包括:目标检测模块,用于:
    对所述第一帧数据进行目标检测,得到所述第一帧数据中所述目标对象的位置信息;
    根据所述目标对象的位置信息生成所述紧致框和所述宽松框。
  13. 根据权利要求10-12中任一项所述的装置,其特征在于,所述获取模块,具体用于:
    接收用户输入数据,并根据所述用户输入数据从所述原始数据中提取所述第一帧数据;
    或者,
    对所述原始数据中的每一帧进行目标检测,根据检测结果从所述原始数据中提取所述第一帧数据。
  14. 根据权利要求10-13中任一项所述的装置,其特征在于,
    所述目标网络为结合识别网络以及训练集进行训练得到,所述识别网络用于获取输入的图像中的语义信息,
    其中,在对所述目标网络进行训练的过程中,以所述识别网络的输出结果作为约束对所述目标网络进行更新。
  15. 一种神经网络训练装置,其特征在于,包括:
    获取模块,用于获取训练集,所述训练集中包括图像传感器采集到的原始数据以及对应的真值标签;
    增强模块,用于将所述训练集作为目标网络的输入,得到增强结果,所述目标网络用 于从输入的数据中提取紧致框对应的亮度通道的信息,以及从输入数据中提取宽松框对应的多个通道的信息,对所述亮度通道的信息和所述多个通道的信息进行融合得到所述增强结果,所述宽松框在输入的数据中覆盖的范围包括且大于所述紧致框在输入的数据中覆盖的范围;
    语义分割模块,用于将所述训练集作为识别网络的输入,得到第一识别结果,所述识别网络用于获取输入的图像中的语义信息;
    所述语义分割模块,还用于将所述增强结果作为所述识别网络的输入,得到第二识别结果;
    更新模块,用于根据所述增强结果和所述真值标签之间的差值,以及所述第一识别结果和所述第二识别结果之间的差值,对所述目标网络进行更新,得到更新后的目标网络。
  16. 根据权利要求15所述的装置,其特征在于,所述目标网络包括第一网络和第二网络,所述第一网络用于从所述训练集中的数据中提取紧致框对应的亮度通道的信息,所述第二网络用于从输入数据中提取宽松框对应的多个通道的信息。
  17. 根据权利要求16所述的装置,其特征在于,所述增强结果中包括所述第一网络输出的第一信息以及所述第二网络输出的第二信息,所述第二识别结果包括所述第一信息对应的第三识别结果以及所述第二信息对应的第四识别结果;
    所述更新模块,具体用于:
    根据所述第一信息和所述真值标签之间的差值,以及所述第三识别结果和所述第一识别结果之间的差值,更新所述第一网络,得到更新后的第一网络;
    根据所述第二信息和所述真值标签之间的差值,以及所述第四识别结果和所述第一识别结果之间的差值,更新所述第二网络,得到更新后的第二网络。
  18. 根据权利要求15或16所述的装置,其特征在于,所述更新模块,具体用于:
    根据所述增强结果和所述真值标签之间的差值得到第一损失值;
    根据所述第一识别结果和所述第二识别结果之间的差值得到第二损失值;
    融合所述第一损失值和所述第二损失值,得到第三损失值;
    根据所述第三损失值对所述目标网络进行更新,得到更新后的目标网络。
  19. 一种数据处理装置,其特征在于,包括一个或多个处理器,所述一个或多个处理器和存储器耦合,所述存储器存储有程序,当所述存储器存储的程序指令被所述一个或多个处理器执行时实现权利要求1至5中任一项所述的方法的步骤。
  20. 一种神经网络训练装置,其特征在于,包括一个或多个处理器,所述一个或多个处理器和存储器耦合,所述存储器存储有程序,当所述存储器存储的程序指令被所述一个或多个处理器执行时实现权利要求6-9中任一项所述的方法的步骤。
  21. 一种计算机可读存储介质,其特征在于,包括程序,当其被处理单元所执行时, 执行如权利要求1至9中任一项所述的方法。
  22. 一种计算机程序产品,包括计算机程序/指令,其特征在于,所述计算机程序/指令被处理器执行时实现如权利要求1至9中任一项所述方法的步骤。
PCT/CN2022/091839 2021-08-30 2022-05-10 一种数据处理方法以及装置 WO2023029559A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202111001658.0 2021-08-30
CN202111001658 2021-08-30
CN202111458295.3A CN115731115A (zh) 2021-08-30 2021-12-01 一种数据处理方法以及装置
CN202111458295.3 2021-12-01

Publications (1)

Publication Number Publication Date
WO2023029559A1 true WO2023029559A1 (zh) 2023-03-09

Family

ID=85292313

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/091839 WO2023029559A1 (zh) 2021-08-30 2022-05-10 一种数据处理方法以及装置

Country Status (2)

Country Link
CN (1) CN115731115A (zh)
WO (1) WO2023029559A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116055895A (zh) * 2023-03-29 2023-05-02 荣耀终端有限公司 图像处理方法及其相关设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460328A (zh) * 2018-01-15 2018-08-28 浙江工业大学 一种基于多任务卷积神经网络的套牌车检测方法
CN110807385A (zh) * 2019-10-24 2020-02-18 腾讯科技(深圳)有限公司 目标检测方法、装置、电子设备及存储介质
US20200327680A1 (en) * 2019-04-12 2020-10-15 Beijing Moviebook Science and Technology Co., Ltd. Visual target tracking method and apparatus based on deep adversarial training
CN112183353A (zh) * 2020-09-28 2021-01-05 腾讯科技(深圳)有限公司 一种图像数据处理方法、装置和相关设备
CN112766244A (zh) * 2021-04-07 2021-05-07 腾讯科技(深圳)有限公司 目标对象检测方法、装置、计算机设备和存储介质
CN113255421A (zh) * 2020-12-08 2021-08-13 四川云从天府人工智能科技有限公司 一种图像检测方法、系统、设备及介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460328A (zh) * 2018-01-15 2018-08-28 浙江工业大学 一种基于多任务卷积神经网络的套牌车检测方法
US20200327680A1 (en) * 2019-04-12 2020-10-15 Beijing Moviebook Science and Technology Co., Ltd. Visual target tracking method and apparatus based on deep adversarial training
CN110807385A (zh) * 2019-10-24 2020-02-18 腾讯科技(深圳)有限公司 目标检测方法、装置、电子设备及存储介质
CN112183353A (zh) * 2020-09-28 2021-01-05 腾讯科技(深圳)有限公司 一种图像数据处理方法、装置和相关设备
CN113255421A (zh) * 2020-12-08 2021-08-13 四川云从天府人工智能科技有限公司 一种图像检测方法、系统、设备及介质
CN112766244A (zh) * 2021-04-07 2021-05-07 腾讯科技(深圳)有限公司 目标对象检测方法、装置、计算机设备和存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116055895A (zh) * 2023-03-29 2023-05-02 荣耀终端有限公司 图像处理方法及其相关设备
CN116055895B (zh) * 2023-03-29 2023-08-22 荣耀终端有限公司 图像处理方法及其装置、芯片系统和存储介质

Also Published As

Publication number Publication date
CN115731115A (zh) 2023-03-03

Similar Documents

Publication Publication Date Title
CN110084281B (zh) 图像生成方法、神经网络的压缩方法及相关装置、设备
WO2022083536A1 (zh) 一种神经网络构建方法以及装置
WO2020192483A1 (zh) 图像显示方法和设备
WO2021043112A1 (zh) 图像分类方法以及装置
WO2022116856A1 (zh) 一种模型结构、模型训练方法、图像增强方法及设备
WO2022042713A1 (zh) 一种用于计算设备的深度学习训练方法和装置
US20220375213A1 (en) Processing Apparatus and Method and Storage Medium
WO2021147325A1 (zh) 一种物体检测方法、装置以及存储介质
US20220335583A1 (en) Image processing method, apparatus, and system
CN110717851A (zh) 图像处理方法及装置、神经网络的训练方法、存储介质
WO2022001805A1 (zh) 一种神经网络蒸馏方法及装置
CN113705769A (zh) 一种神经网络训练方法以及装置
CN113066017B (zh) 一种图像增强方法、模型训练方法及设备
CN110222718B (zh) 图像处理的方法及装置
WO2022001372A1 (zh) 训练神经网络的方法、图像处理方法及装置
CN113011562A (zh) 一种模型训练方法及装置
WO2021018251A1 (zh) 图像分类方法及装置
CN112862828B (zh) 一种语义分割方法、模型训练方法及装置
WO2021175278A1 (zh) 一种模型更新方法以及相关装置
CN113191489B (zh) 二值神经网络模型的训练方法、图像处理方法和装置
CN115081588A (zh) 一种神经网络参数量化方法和装置
WO2024002211A1 (zh) 一种图像处理方法及相关装置
CN114359289A (zh) 一种图像处理方法及相关装置
WO2022179606A1 (zh) 一种图像处理方法及相关装置
CN113284055A (zh) 一种图像处理的方法以及装置

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2022862702

Country of ref document: EP

Effective date: 20240306

NENP Non-entry into the national phase

Ref country code: DE