WO2021147316A1

WO2021147316A1 - Object recognition method and device

Info

Publication number: WO2021147316A1
Application number: PCT/CN2020/111141
Authority: WO
Inventors: 高源�; 陈士胃; 郭一民
Original assignee: 华为技术有限公司
Priority date: 2020-01-21
Filing date: 2020-08-25
Publication date: 2021-07-29
Also published as: CN111242087A; CN111242087B

Abstract

The present application relates to the technical field of images, and disclosed therein are an object recognition method and device. The method comprises: acquiring a target raw image generated by an image sensor; carrying out super resolution on a target region in the target raw image and obtaining a super resolution color image, the target region comprising at least part of a region in the target raw image, and the resolution of the super resolution color image being greater than the resolution of an image of the target region; and recognizing a target object on the basis of the super resolution color image. Thus, the accuracy of target object recognition can be increased.

Description

Object recognition method and device

This application claims the priority of a Chinese patent application with an application number of 202010071857.8 and an invention title of "Object Recognition Method and Device" filed on January 21, 2020, the entire content of which is incorporated into this application by reference.

Technical field

This application relates to the field of image technology, and in particular to an object recognition method and device.

Background technique

With the development of image technology, image technology has been applied to various fields of people's production and life, such as the field of object recognition. For example, by performing face recognition on images collected by surveillance cameras, it can assist in finding criminals.

Generally, the distance between the surveillance camera and the subject is relatively long, and the resolution of the image collected by the surveillance camera is low. It is more difficult to perform face recognition on the image. Therefore, it is usually necessary to first perform the resolution on the color image of the image. Upgrade, and then perform face recognition on the image obtained by the resolution increase.

However, the face information in the image obtained by the resolution increase is often quite different from the face information in the image collected by the surveillance camera, which results in a lower accuracy of face recognition on the image obtained by the resolution increase.

Summary of the invention

The present application provides an object recognition method and device, which can solve the problem of low accuracy of face recognition on an image obtained by increasing the resolution. The technical solution is as follows:

In a first aspect, an object recognition method is provided. The method includes: after acquiring a target image generated by an image sensor, first super-divide the image of the target region in the target image to obtain a super-divided color image, and then The target object is recognized based on the super-division color image. Wherein, the target area includes at least a part of the area in the target raw image, and the resolution of the super-division color image is greater than the resolution of the image of the target area. In the object recognition method provided in the present application, the image of the target region in the target raw image is subjected to super-division processing to enlarge and enhance the detailed information in the image of the target region. Since the super-divided image is a raw image that has not been processed by ISP, the image of the target area after the super-division processing can contain more detailed information. Therefore, the super-division color image (the color image of the image of the target area after the super-division processing) obtained by this embodiment has more detailed information than the color image in the prior art. Therefore, the recognition of the target object based on the super-division color image can improve the accuracy of the recognition of the target object.

Optionally, the target image is a Bayer mode image. It should be noted that in this application, the target image is a Bayer mode image as an example. Of course, the target image can also be any mode other than the Bayer mode (such as red, green, and blue brightness (RGBW) mode). ), this application is not limited.

Optionally, the method further includes: performing first processing on the target image to obtain a color image of the target image; and determining a first region in the color image of the target image that contains the target object The above-mentioned recognition of the target object based on the super-division color image specifically includes: first determining the second region in the super-division color image based on the first region; and then based on the super-division color image The image of the second area is used to identify the target object. After the first area in the color image of the target image is determined, the first area can be mapped to the super-division color image in a certain manner, so as to obtain the second area in the super-division color image. In addition, since the second area is an area obtained by mapping the first area, the second area also includes the target object on the premise that the first area contains the target object. After determining the second region in the super-division color image, the object recognition device may intercept the image of the second region, and perform target object recognition on the image of the second region. Moreover, since the super-division color image contains more detailed information of the target object, the image of the second region in the super-division color image also contains more detailed information of the target object, based on the image of the second region The accuracy of the recognition of the target object is high.

Optionally, the first area and the second area are both rectangular, the upper left corner coordinates of the first area are (X _A1 , Y _A1 ), and the lower right corner coordinates are (X _A2 , Y _A2 ); The coordinates of the upper left corner of the second area are (X _B1 , Y _B1 ), and the coordinates of the lower right corner are (X _B2 , Y _B2 ); where X _B1 =X _A1 *K; Y _B1 =Y _A1 *K; X _B2 = X _B1 + (X _A2- X _A1 )*K; Y _B2 = Y _B1 + (Y _A2 -Y _A1 )*K, K represents the resolution ratio between the super-division color image and the image in the target area.

For example, there are multiple implementation manners for the foregoing process of obtaining a super-division color image, which is not limited in this application. The following two implementations will be taken as examples to explain the implementation of the above process of obtaining a super-division color image.

Optionally, in the first implementation of the process of obtaining a super-division color image, the target image is a Bayer mode image, and the image of the target area in the target image is super-divided to obtain The super-division color image includes: acquiring the target four-channel image, and inputting the target four-channel image into a first model to obtain the super-division color image output by the first model; wherein, the first model It is used to super-divide the four-channel image of the unsuper-divided Bayer-mode image, and output a color image of the Bayer-mode image whose resolution has been increased. The image of the target area includes a plurality of pixel groups arranged in an array, the pixel group includes two rows and two columns of pixels, and the target four-channel image includes: The combination map, the combination map of the pixels in the first row and the second column, the combination map of the pixels in the second row and the first column, and the combination map of the pixels in the second row and the second column.

Optionally, in the second implementation manner of the process of obtaining a super-division color image, the target image is a Bayer mode image, and the image of the target area in the target image is super-divided to obtain The super-division color image includes: performing super-division on the image of the target area to obtain a Bayer mode image; and then performing a second processing on the Bayer mode image to obtain the super-division color image.

Optionally, the super-division of the image of the target area to obtain the Bayer mode image includes: acquiring the four-channel image of the target, wherein the image of the target area includes a plurality of pixel groups arranged in an array, The pixel group includes two rows and two columns of pixels, and the target four-channel image includes: a combination map of pixels in the first row and column 1, a combination map of pixels in the first row and second column, and a second row and second column of pixels in the plurality of pixel groups. The combined image of the pixels in the first column of the row and the combined image of the pixels in the second column and the second row; input the target four-channel image into the second model to obtain the target four-channel output of the second model after the resolution is increased Image; converting the target four-channel image with the increased resolution into the Bayer mode image; wherein, the second model is used to super-divide the four-channel image of the unsuper-divided Bayer mode image, and the output is improved Four-channel image after resolution.

It can be seen that among the above two implementations of the process of obtaining a super-resolution color image, the first implementation method is to directly obtain a super-resolution color image based on the first model, and the first implementation method is faster to obtain a super-resolution color image. . The second implementation method is to first super-divide the image of the target area to obtain the Bayer mode image (for example, obtain a target four-channel image with increased resolution through the second model, and then convert the target four-channel image into a Bayer mode image), After that, the Bayer mode image is processed into a super-division color image. The object recognition device can use either of these two implementations to execute the process of obtaining the super-differentiated color image, or the object recognition device can be based on the user's choice, using the implementation selected by the user in the two implementations to execute The process of obtaining super-division color images.

Optionally, the first processing includes a first distortion correction processing, the second processing includes a second distortion correction processing; the parameters of the first distortion correction processing include: a first distortion curve; the second distortion correction The processed parameters include: a second distortion curve; the _{coordinates of any sampling point (X 0} , Y ₀ ) in the second distortion curve corresponding to the pixel in the hyperdivision target image is (X _Ai , Y _Ai ), the coordinates of the pixel corresponding to any sampling point (X ₀ , Y ₀ ) in the first distortion curve in the target image are (X _Bi , Y _Bi ); where X _Ai = (X _Bi -(W _f /2+0.5))*K+(W _f *K/2+0.5); Y _Ai ＝(Y _Bi -(H _f /2+0.5))*K+(H _f *K/2+0.5 ); W _f represents the width of the target image, H _f represents the height of the target image, and K represents the resolution ratio of the super-division color image to the image of the target area.

Since the resolution of the target image and the Bayer mode image are different, the parameters for processing the target image into a color image may not be suitable for processing the Bayer mode image into a color image. If the target image is directly processed into color The processing of the Bayer mode image with the image parameters may cause problems (such as color cast, image distortion, and contrast changes) in the processed super-division color image. In the present application, the parameters of the distortion correction processing for processing the target raw image into a color image can be corrected, thus avoiding these problems in the obtained super-division color image. In this application, based on the parameters of processing the target raw image into a color image, corresponding processing is performed on the Bayer mode image to obtain a hyperdivision color image. Of course, when corresponding processing is performed on the Bayer mode image based on the above parameters, the corresponding processing on the Bayer mode image may not be performed based on the parameters for processing the target raw image into a color image. This application does not limit this.

Further, in the foregoing, the target area of the target image is taken as the entire area of the target image as an example. Optionally, the target area of the target image may be an area (such as a partial area) that contains the target object in the target image. Optionally, before the super-division of the image of the target region in the target image to obtain a super-division color image, the method further includes: determining that the target image contains the target object. target area.

Optionally, the target raw image is the m+1th frame image in the raw image video, m≥1, and before the determination of the target region containing the target object in the target raw image, the The method further includes: performing third processing on the m-th frame image in the raw image video to obtain a color image of the m-th frame image; determining that the color image of the m-th frame image contains the first image of the target object Three regions; the determining the target region in the target image that contains the target object includes: determining the target region in the target image that corresponds to the third region.

Optionally, the target area and the third area are both rectangular, the coordinates of the upper left corner of the third area are (X _D1 , Y _D1 ), and the coordinates of the lower right corner are (X _D2 , Y _D2 ); The coordinates of the upper left corner of the target area corresponding to the third area are (X _C1 , Y _C1 ), and the coordinates of the lower right corner are (X _C2 , Y _C2 ); where X _D1 =(X _C1 -(X _C1 +X _C2 )/2)*L+(X _C1 +X _C2 )/2; Y _D1 =(Y _C1 -(Y _C1 +Y _C2 )/2)*L+(Y _C1 +Y _C2 )/2; X _D2 =( X _C2 -(X _C1 +X _C2 )/2)*L+(X _C1 +X _C2 )/2; Y _D2 =(Y _C2 -(Y _C1 +Y _C2 )/2)*L+(Y _C1 +Y _C2 )/2; L>1.

Optionally, before the super-division of the image of the target region in the target image to obtain a super-division color image, the method further includes: the target image includes a plurality of target regions, and the When there are two target areas that satisfy the replacement condition among the multiple target areas, replace the two target areas with candidate target areas to obtain the updated multiple target areas; wherein, the replacement condition includes: The two target areas overlap at least partially, and the sum of the areas of the two target areas is greater than the area of the candidate target area; the coordinates of the upper left corner of one of the two target areas are (X ₁₁ , Y ₁₁ ), and the coordinates of the lower right corner are (X ₁₂ , Y ₁₂ ); the coordinates of the upper left corner of the other of the two target areas are (X ₂₁ , Y ₂₁ ), and the coordinates of the lower right corner are (X ₂₂ , Y ₂₂ ); The coordinates of the upper left corner of the candidate target area are (X _M1 , Y _M1 ), and the coordinates of the lower right corner are (X _M2 , Y _M2 ); X _M1 is the minimum value of X ₁₁ and X ₂₁ _{; Y M1} is Y The minimum value of ₁₁ and Y ₂₁ _{; X M2} is the maximum value of X ₁₂ and X ₂₂ _{; Y M2} is the maximum value of Y ₁₂ and Y ₂₂ . Since multiple target regions satisfying the replacement conditions are replaced with candidate target regions, the number of target regions in the target raw image is reduced, and the process of identifying target objects based on the target regions is simplified.

In a second aspect, an object recognition device is provided. The object recognition device includes: modules for executing the object recognition method provided in the first aspect.

Optionally, the object recognition device includes: an acquisition module for acquiring a target image generated by an image sensor, the target image being a Bayer mode image; a super-division module for acquiring a four-channel image of the target, wherein, The image of the target area in the target raw image includes a plurality of pixel groups arranged in an array, the pixel group includes two rows and two columns of pixels, and the target four-channel image includes: A combination map of pixels in a column, a combination map of pixels in a first row and a second column, a combination map of pixels in a second row and a first column, and a combination map of pixels in a second row and second column; the target area includes the target raw image The super-division module is also used to input the target four-channel image into a first model to obtain a super-division color image output by the first model; wherein, the first model is used to The four-channel image of the super-divided Bayer mode image is super-divided, and a color image of the Bayer mode image with an increased resolution is output; the resolution of the super-divided color image is greater than the resolution of the image in the target area ; Recognition module for recognizing target objects based on the super-division color image.

Optionally, the object recognition device includes: an acquisition module for acquiring a target image generated by an image sensor, the target image being a Bayer pattern image; a supramolecular module for acquiring a four-channel image of the target, wherein, The image of the target area in the target raw image includes a plurality of pixel groups arranged in an array, the pixel group includes two rows and two columns of pixels, and the target four-channel image includes: A combination map of pixels in a column, a combination map of pixels in a first row and a second column, a combination map of pixels in a second row and a first column, and a combination map of pixels in a second row and second column; the target area includes the target raw image The supramolecular module is also used to input the target four-channel image into a second model to obtain the target four-channel image output by the second model after the resolution is increased; the resolution is improved The target four-channel image after the rate is converted into a Bayer mode image; wherein, the second model is used to super-divide the four-channel image of the unsuper-divided Bayer mode image, and output the four-channel image with increased resolution The processing sub-module is used to perform the second processing on the Bayer mode image to obtain a super-division color image, the resolution of the super-division color image is greater than the resolution of the image in the target area; the recognition module is used to The super-division color image performs target object recognition.

Optionally, the object recognition device includes: an acquisition module for acquiring a target image generated by an image sensor, the target image being a Bayer pattern image; a supramolecular module for acquiring a four-channel image of the target, wherein, The image of the target area in the target raw image includes a plurality of pixel groups arranged in an array, the pixel group includes two rows and two columns of pixels, and the target four-channel image includes: A combination map of pixels in a column, a combination map of pixels in a first row and a second column, a combination map of pixels in a second row and a first column, and a combination map of pixels in a second row and second column; the target area includes the target raw image The supramolecular module is also used to input the target four-channel image into a second model to obtain the target four-channel image output by the second model after the resolution is increased; the resolution is improved The target four-channel image after the rate is converted into a Bayer mode image; wherein, the second model is used to super-divide the four-channel image of the unsuper-divided Bayer mode image, and output the four-channel image with increased resolution The processing sub-module is used to perform the second processing on the Bayer mode image to obtain a super-division color image, the resolution of the super-division color image is greater than the resolution of the image in the target area; the first processing module is used Performing first processing on the target image to obtain a color image of the target image; a first determining module for determining the first region of the target object in the color image of the target image; identifying A module for determining a second area in the super-division color image based on the first area; performing recognition of the target object based on an image of the second area in the super-division color image;

Wherein, the first processing includes a first distortion correction processing, the second processing includes a second distortion correction processing; the parameters of the first distortion correction processing include: a first distortion curve; The parameters include: a second distortion curve; the _{coordinates of a pixel corresponding to any sampling point (X 0} , Y ₀ ) in the second distortion curve in the hyperdivision target image are (X _Ai , Y _Ai ), The coordinates of a pixel corresponding to any sampling point (X ₀ , Y ₀ _{) in the first distortion curve in the target image are (X Bi} , Y _Bi ); where X _Ai = (X _Bi -( W _f /2+0.5))*K+(W _f *K/2+0.5); Y _Ai ＝(Y _Bi -(H _f /2+0.5))*K+(H _f *K/2+0.5); W _f represents the width of the target image, H _f represents the height of the target image, and K represents the resolution ratio of the super-division color image to the image of the target area.

Optionally, the object recognition device includes: an acquisition module for acquiring a target image generated by an image sensor; a second determining module for determining a target area containing the target object in the target image; a replacement module for When the target raw image includes multiple target regions, and there are two target regions that satisfy the replacement condition among the multiple target regions, the two target regions are replaced with candidate target regions to obtain updated all target regions. The multiple target regions; a super-division module for super-division of the image of the target region in the target image to obtain a super-division color image, wherein the target region includes at least part of the region in the target image , The resolution of the super-division color image is greater than the resolution of the image of the target area; and the recognition module is configured to recognize the target object based on the super-division color image. Wherein, the replacement condition includes: the two target regions overlap at least partially, and the sum of the areas of the two target regions is greater than the area of the candidate target region; The coordinates of the upper left corner are (X ₁₁ , Y ₁₁ ), and the coordinates of the lower right corner are (X ₁₂ , Y ₁₂ ); the coordinates of the upper left corner of the other of the two target areas are (X ₂₁ , Y ₂₁ ), and The coordinates of the lower right corner are (X ₂₂ , Y ₂₂ ); the coordinates of the upper left corner of the candidate target area are (X _M1 , Y _M1 ), and the coordinates of the lower right corner are (X _M2 , Y _M2 ); X _M1 is X ₁₁ and X _{21 is} the minimum value; Y _M1 is the minimum value of Y ₁₁ and Y ₂₁ ; X _M2 is the maximum value of X ₁₂ and X ₂₂ _{; Y M2} is the maximum value of Y ₁₂ and Y ₂₂ .

In a third aspect, an object recognition device is provided. The object recognition device includes a processor and an interface. The processor is used to obtain a raw image from an image sensor through the interface, and the processor is used to run a program to make The object recognition device executes the object recognition method described in the first aspect.

In a fourth aspect, a computer storage medium is provided, and a computer program is stored in the storage medium, and the computer program is used to execute the object recognition method described in the first aspect.

In a fifth aspect, a computer program product is provided. When the computer program product runs on an object recognition device, the object recognition device executes the object recognition method as described in the first aspect.

For the beneficial effects of any one of the second aspect to the fifth aspect described above, reference may be made to the beneficial effects of the first aspect described above, which will not be repeated in this application.

Description of the drawings

FIG. 1 is a schematic structural diagram of an object recognition device provided by an embodiment of this application;

FIG. 2 is a flowchart of an object recognition method provided by an embodiment of the application;

FIG. 3 is a schematic diagram of a target image provided by an embodiment of the application;

4 is a schematic diagram of a color image of a target image provided by an embodiment of the application;

FIG. 5 is a schematic diagram of a super-division color image provided by an embodiment of the application;

FIG. 6 is a flowchart of a method for obtaining a super-division color image according to an embodiment of the application;

FIG. 7 is a schematic diagram of a target four-channel image provided by an embodiment of this application;

FIG. 8 is a schematic diagram of a first model provided by an embodiment of this application;

FIG. 9 is a schematic diagram of the pixel drawing layer splicing 4 first combination pictures into one second combination picture;

FIG. 10 is a flowchart of another method for obtaining a hyperdivision color image according to an embodiment of the application;

FIG. 11 is a schematic diagram of a second model provided by an embodiment of this application;

FIG. 12 is a schematic diagram of an image obtained after the resolution of the target four-channel image in FIG. 7 is increased according to an embodiment of the application;

FIG. 13 is a schematic diagram of a Bayer mode image provided by an embodiment of the application;

FIG. 14 is a supplementary flowchart of an object recognition method provided by an embodiment of the application;

FIG. 15 is a schematic diagram of an m-th frame image provided by an embodiment of this application;

FIG. 16 is a schematic diagram of an m+1-th frame image provided by an embodiment of this application;

FIG. 17 is a schematic diagram of a target area and candidate target areas in a target image provided by an embodiment of the application;

FIG. 18 is a block diagram of an object recognition device provided by an embodiment of the application.

Detailed ways

In order to make the principles, technical solutions, and advantages of the present application clearer, the implementation manners of the present application will be described in further detail below in conjunction with the accompanying drawings.

Based on image technology, the target object in the image can be recognized. For example, the target object may be any object such as a human face, hand, clothes, table, and bench.

However, when the size of the target object in the image is small, the image cannot contain the detailed information of the target object, and it is more difficult to recognize the target object based on the image, and the accuracy rate is low. Therefore, it is usually necessary to perform resolution enhancement processing on the image. For example, the image is often a color image (compared to a raw image, a color image is more suitable for human eyes to observe), and the color image can be directly resolved. The lifting process is used to increase the size of the color image and try to enhance the detailed information of the target object in the image. However, color images are often processed through a series of image signal processing (ISP) (such as image demosaicing, image compression) on raw images (also called RAW images) directly collected by shooting devices (such as video cameras or cameras). , Image denoising). In the process of this process, some detailed information of the target object in the raw image will be eliminated, resulting in less detailed information of the target object contained in the color image itself. Even if the resolution of the color image is increased, it is more difficult to restore the color image. Detailed information of the target object.

The embodiment of the object recognition method provided in this application enlarges and enhances the detailed information of the target object in the raw image by performing resolution enhancement processing (that is, "super-resolution" processing) on the raw image. Since the super-divided image is a raw image that has not been processed by ISP, the raw image with the increased resolution can contain more detailed information of the target object. After over-resolution, the color image of the raw image with the increased resolution is obtained, so that the color image obtained in this embodiment has more detailed information than the prior art. Therefore, the recognition of the target object based on the color image can improve the accuracy of the recognition of the target object. In the embodiments of the present application, if necessary, after the color image is obtained, the ISP operation may be further performed on the color image.

It should be noted that the above-mentioned raw image is an image (also referred to as an original optical image) that is directly collected by the imaging device through its own image sensor without image signal processing. The raw image is the most primitive image information inside the shooting device, so it retains the richest high-frequency details of the image that the shooting device can obtain. However, the high-frequency details of the color image obtained by image signal processing on the raw image have been weakened or even disappeared, resulting in the resulting color image being unable to retain the details of the target object.

The embodiment of the present application provides an object recognition method, which can be used in an object recognition device. For example, as shown in FIG. 1, the object recognition device includes: a processor 101 and an interface 106. The processor 101 is connected to the interface 106, and the interface 106 is connected to an image sensor 105. The image sensor 105 is used to generate a raw image. 101 is used to obtain a raw image from the image sensor 105 through the interface 106. The processor 101 is configured to run a program, so that the object recognition apparatus executes the object recognition method provided in the embodiment of the present application.

Optionally, please continue to refer to FIG. 1. The object recognition device may also include: a communication component 102, a memory 103, and at least one communication bus 104. The processor 101, the interface 106, the communication component 102, and the memory 103 can pass through the The communication bus 104 is connected. The program used by the processor 101 to execute may be the program 1031 in the memory 103. The memory 103 may include a high-speed random access memory (RAM: Random Access Memory), and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. The communication connection between the object identification device and at least one other network element is realized through the communication component 102 (which may be wired or wireless), and the Internet, a wide area network, a local network, or a metropolitan area network may be used. It should be noted that, in the embodiment of the present application, the processor and the memory are independent of each other as an example. Of course, the memory 103 may also be integrated in the processor, which is not limited in the embodiment of the present application.

For example, FIG. 2 is a flowchart of an object recognition method provided by an embodiment of the application. As shown in FIG. 2, the object recognition method may include:

Step 201: Acquire a target image of the Bayer model.

As shown in FIG. 1, the object recognition device can obtain the target raw image in the Bayer mode generated by the image sensor 105. Optionally, in FIG. 1, the object recognition device includes an image sensor as an example. Optionally, the object recognition device may not include an image sensor, but is connected to the image sensor, and can obtain the Bayer pattern generated by the image sensor. Target student image.

For example, as shown in FIG. 3, the target raw image may include a plurality of pixels arranged in an array, and each of the pixel groups of 2 rows and 2 columns in the plurality of pixels forms a pixel group, and four pixels in each pixel group The colors of each pixel are red, green 1, green 2, and blue. Among them, green 1 and green 2 both represent green, but green 1 and green 2 respectively correspond to two pixels in the pixel group. As shown in Figure 3, the color of the pixel in the first row and the first column of the pixel group is red, the color of the pixel in the first row and second column is green 1, and the color of the pixel in the second row and first column is green 2. , The color of the pixel in the second row and second column is blue.

Of course, the distribution positions of the four pixels can also be different from the distribution positions shown in FIG. 3. For example, the color of the pixel in the first row and the first column is green 1, the color of the pixel in the first row and second column is red, the color of the pixel in the second row and first column is blue, and the color of the pixel in the second row and second column is blue. The color of the pixel is green 2; or, the color of the pixel in the first row and the first column is green 1, the color of the pixel in the first row and the second column is blue, the color of the pixel in the second row and the first column is red, and the color of the pixel in the second row and first column is red. The color of the pixel in the second column of the row is green 2. The embodiment of the application does not limit this. In addition, the embodiments of the present application do not limit the number of pixels in the target image.

It should be noted that in the embodiments of the present application, the target image is a Bayer mode image as an example. Of course, the target image can also be any mode other than the Bayer mode (such as red, green, and blue brightness (RGBW) ) Mode), which is not limited in the embodiment of the present application.

Step 202: Perform first processing on the target raw image to obtain a color image of the target raw image.

In step 202, the object recognition device needs to process the target raw image into a color image. Among them, the color image can be an image in any color mode, such as a red-green-blue (RGB) format, a luminance and chrominance (YUV) format, and so on.

For example, the first processing may include at least one of automatic white balance processing, color correction processing, gamma correction processing, and distortion correction processing. Of course, the first processing may also include processing other than these four types of processing. Other processing. In the embodiment of the present application, the first processing including automatic white balance processing, color correction processing, gamma correction processing, and distortion correction processing is taken as an example, and the embodiment of the present application does not limit the sequence of these types of processing.

Color images are, for example, images in the Joint Photographic Experts Group (JPEG) format, images in the High Efficiency Image File Format (helf), etc. The JPEG format is also called the JPG format.

Step 203: Determine the first region in the color image of the target image that contains the target object.

After obtaining the color image of the target raw image, the object recognition device can recognize the target object on the color image to obtain the first region containing the target object in the color image. Of course, the color image may include one or more first regions, which is not limited in the embodiment of the present application. In step 203, the object recognition device needs to recognize each first region in the color image.

Step 204: Perform super-division on the image of the target region in the target image to obtain a super-division color image, where the target region includes at least a part of the region in the target image, and the resolution of the super-division color image is greater than that of the target region in the target image The resolution of the image.

When the object recognition device super-divisions the image of the target area in the target image to obtain the super-division color image, it can first increase the resolution of the image (also a kind of raw image) of the target area of the target image. The image of the target area after the resolution is increased is obtained, and then the image of the target area after the resolution is increased is processed to obtain the super-division color image. Since in step 204, the resolution of the image of the target area in the target raw image is directly increased, the image of the target area after the resolution has been increased may contain more detailed information of the target object. The super-division color image obtained later can also contain more detailed information of the target object.

Step 205: Determine a second area in the super-division color image based on the first area.

After the first area in the color image of the target image is determined in step 203, the first area can be mapped to the super-division color image in a certain manner, so as to obtain the second area in the super-division color image. In addition, since the second area is an area obtained by mapping the first area, the second area also includes the target object on the premise that the first area contains the target object. It should be noted that if multiple first regions are determined in step 203, each of the multiple first regions needs to be mapped to the super-division color image in step 205, so as to obtain multiple Each first area in the first area corresponds to a second area.

For example, for each first area and a second area corresponding to the first area, both the first area and the second area may be rectangular. Among them, if the coordinates of the upper left corner of the first area are (X _A1 , Y _A1 ), and the coordinates of the lower right corner are (X _A2 , Y _A2 ), then the coordinates of the upper left corner of the second area corresponding to the first area are (X _B1 , Y _B1 ), and the coordinates of the lower right corner are (X _B2 , Y _B2 ); among them, X _B1 ＝X _A1 *K; Y _B1 ＝Y _A1 *K; X _B2 ＝X _B1 +(X _A2 -X _A1 )* K; Y _B2 = Y _B1 + (Y _A2 -Y _A1 )*K; K represents the resolution ratio of the super-division color image to the target image (can be called the resolution improvement rate), K>1.

For example, the color image of the target image is shown in Figure 4, and the super-division color image is shown in Figure 5. Assuming K=2, as shown in FIG. 4, the coordinates of the upper left corner of the first area are (3, 3), and the coordinates of the lower right corner are (6, 6). As shown in Figure 5, the coordinates of the upper left corner of the second area are (X _A1 *K, Y _A1 *K)=(3*2,3*2)=(6,6), the lower right corner of the second area The coordinates are (X _B1 +(X _A2 -X _A1 )*K, Y _B1 +(Y _A2 -Y _A1 )*K)=(6+(6-3)*2, 6+(6-3)*2 )=(12,12). It can be seen that the length and width of the first region are both 3, the length and width of the second region are both 6, and the second region is twice as large as the first region in terms of length and width.

Step 206: Recognize the target object based on the second area.

For example: license plate recognition and face recognition based on the second area. Wherein, the recognition of the target object based on the second area may refer to the recognition of the target object within the range of the second area.

After determining the second region in the super-division color image, the object recognition device may intercept the image of the second region, and perform target object recognition on the image of the second region. Moreover, since the super-division color image contains more detailed information of the target object, the image of the second region in the super-division color image also contains more detailed information of the target object, based on the image of the second region The accuracy of the recognition of the target object is high.

In addition, since the second area is the area containing the target object determined by the object recognition device, the target object can be recognized only based on the second area, instead of the area other than the second area in the super-division color image. The recognition of the target object simplifies the complexity of the recognition of the target object.

For example, there are multiple implementation manners for the foregoing step 204, which is not limited in the embodiment of the present application. The following two implementations will be taken as examples to explain the implementation of the above step 204. In addition, in the following two implementation manners, the target area of the target image in step 204 is the entire area of the target image as an example.

In the first implementation manner of step 204, as shown in FIG. 6, step 204 may include:

Step 2041a: Obtain a target four-channel image, where the image of the target area in the target raw image includes a plurality of pixel groups arranged in an array, the pixel group includes two rows and two columns of pixels, and the target four-channel image includes: the plurality of pixels The combination map of the pixels in the first row and the first column, the combination map of the pixels in the first row and the second column, the combination map of the pixels in the second row and the first column, and the combination map of the pixels in the second row and the second column in the group.

Exemplarily, as described in step 201, the image of the target area in the target raw image may include multiple pixel groups, and each pixel group includes four pixels of red, green 1, green 2, and blue. In step 2041a, the object recognition device may convert the image of the target area in the target raw image into a target four-channel image. The target four-channel image includes four combined images, and each of the four combined images includes the multiple Pixels at the same position in a pixel group.

For example, for the target image shown in FIG. 3, if the target area is all areas in the target image, the target four-channel image obtained by converting the target area in the target image may include the target four-channel image as shown in FIG. 7 The four combination diagrams shown are the combination diagram of red pixels, the combination diagram of green pixels, the combination diagram of green pixels, and the combination diagram of blue pixels. It can be seen that among the multiple pixels in Figure 3, if the number of rows and columns of the pixel are both odd, then the pixel belongs to the combined image of the red pixels in the target four-channel image; if the number of rows of the pixel is odd and the number of columns is even , The pixel belongs to the combination map of green 1-color pixels in the target four-channel image; if the number of rows of the pixel is even and the number of columns is odd, then the pixel belongs to the combination map of green 2-color pixels in the target four-channel image; if the pixel If the number of rows and the number of columns is even, the pixel belongs to the combined image of blue pixels in the target four-channel image.

Step 2042a: Input the target four-channel image into the first model to obtain the super-division color image output by the first model.

The first model is used to super-divide the four-channel image of the Bayer-mode image that is not super-divided, and output the color image of the Bayer-mode image after the resolution is increased. Therefore, after the four-channel image (called the target four-channel image) of the target area (a Bayer mode image, and the target area is also the entire target image) is input into the first model, the first model can The target four-channel image is processed, and then a super-division color image is output.

Optionally, the first model may be a neural network model.

For example, FIG. 8 is a schematic diagram of a first model provided by an embodiment of the application. As shown in FIG. 8, the first model may include: a first module, 16 second modules connected in series, and k first modules connected in series. Three modules, fourth module, fifth module, sixth module, seventh module and eighth module.

(1) The first module may include: a convolutional layer and a leaky linear rectification layer. This convolutional layer is used to perform convolution processing on all images input to the first model (such as the above-mentioned target four-channel image) by using 64 convolution kernels respectively, and output 64-channel images. Among these 64 convolution kernels The size of each convolution kernel is 3*3. The leaky linear rectification layer is used to activate the image output by the convolutional layer (such as the above-mentioned sixty-four channel image) based on the leaky linear rectification function to obtain the activation feature map of the image.

Among them, the leakage linear rectification function can be:

x _i,j is the pixel value (such as pixel intensity) of the pixel in the i-th row and j-th column in the image of the input leakage linear rectification layer, y _i,j is the pixel in the i-th row and j-th column in the activation feature map The pixel value of, a is the weakening parameter, and 1≤a≤2. It can be seen that after the activation process, the pixel value of the pixel whose pixel value is greater than or equal to zero is retained, while the pixel value of the pixel whose pixel value is less than zero is weakened. i≥1, j≥1.

(2) The second module may include: a first convolution layer, a linear rectification layer, a second convolution layer, and an addition layer. Among them, the first convolution layer can be used to: use 64 kinds of convolution kernels (3*3 size) to process the image input to the second module, and output 64-channel images; the linear rectification layer is used for linear rectification based The function activates the 64-channel image output by the first convolutional layer, and outputs the 64-channel image after the activation processing; the second convolutional layer is used to respectively use 64 convolution kernels (3*3 size) Process the 64-channel image output by the leaky linear rectification layer and output a 64-channel image; the addition layer is used to compare the 64-channel image output by the second convolutional layer with the image input to the second module Add and output the feature map of the 64-channel image.

Among them, the linear rectification function is as follows:

x _i,j is the pixel value of the pixel in the i-th row and j-th column in the image output by the first convolutional layer, and y _i,j is the pixel value of the pixel in the i-th row and jth column in the activation feature map of the image. It can be seen that after the activation process, the pixel value of the pixel whose pixel value is greater than or equal to zero is retained, and the pixel value of the pixel whose pixel value is less than zero is reduced to zero. i≥1, j≥1.

(3) The third module may include: a convolutional layer, a first leaky linear rectification layer, a pixel drawing layer, and a second leaky linear rectification layer. Among them, the convolution layer is used to process the image input to the third module with 64 types of convolution kernels (3*3 size), and output 64-channel images; the first leaky linear rectification layer is used for leak-based linear rectification The function activates the 64-channel image output by the convolutional layer, and outputs the 64-channel image after activation processing; the pixel draw layer is used to divide the 64-channel image output by the first drain linear rectification layer into 4 Copies (each including 16-channel images), and stitch these 4 images into a 16-channel image; the second leaky linear rectification layer is used to activate the 16-channel image output by the pixel drawing layer based on the leaky linear rectification function Process and output the sixteen-channel image after activation processing. Wherein, the leakage linear rectification function can refer to the leakage linear rectification function in the first module, which is not described in detail in the embodiment of the present application.

The size of the combined image (called the first combined image) in the 64-channel image (including 64 combined images) output by the first leakage linear rectification layer can be x*y, then the 16-channel output of the pixel drawing layer The size of the channel image (referred to as the second combined image) in the image (including 16 combined images) can be 2x*2y. The pixel draw layer can combine 4 first combination pictures (each first combination picture includes 16 first combination pictures), each of the m-th first combination pictures in the first combination picture, form a set of first combinations In this way, 16 groups of first combined diagrams can be obtained, and each group of first combined diagrams includes 4 first combined diagrams. After that, the pixel drawing layer can splice the 4 first combination diagrams in each group of the first combination diagram into a second combination diagram, thereby obtaining 16 second combination diagrams.

For example, FIG. 9 is a schematic diagram of the pixel drawing layer splicing four first combination diagrams in a group of first combination diagrams into one second combination diagram. As shown in Figure 9, suppose that the four first combined images are called images A, B, C, and D, the pixels in image A are called 1, the pixels in image B are called 2, and the pixels in image C are It is called 3, and the pixel in image D is called 4. Then the second combination map formed by splicing the four first combination maps may include a plurality of pixel groups arranged in an array, and each pixel group includes two rows and two columns of pixels, where the pixels in the first row and the first column Is pixel 1 from image A, the pixel in row 1 and column 2 is pixel 2 from image B, the pixel in row 2 and column 1 is pixel 3 from image C, and the pixel in row 2 and column 2 is pixel from Pixel 4 of image D.

Further, each third module can magnify the input image by 2 times. Since the first model includes k third modules, the k third modules can magnify the input image by 2k times in total. The resolution ratio of the super-division color image and the image of the target area in the target raw image is K=2k. In the embodiment of the present application, the number of third modules in the first model can be set reasonably according to the size of K.

(4) The fourth module includes a convolutional layer, and the convolutional layer is used to process the image input to the fourth module by using 64 convolution kernels (3*3 size), and output a 64-channel image.

(5) The fifth module includes a convolutional layer, and the convolutional layer is used to process the image input to the fourth module with 4 types of convolution kernels (3*3 size), and output a four-channel image. The resolution of the four-channel image output by the fifth module is greater than the resolution of the four-channel image input to the first model.

(6) The sixth module includes an amplification layer and an addition layer. Among them, the magnification layer is used to interpolate and magnify the four-channel image input to the first model by means of bilinear interpolation, and the magnification factor is also K. The addition layer is used to add the output result of the amplification layer and the output result of the fifth module to obtain the above-mentioned image of the target area after the resolution is increased.

(7) The seventh module includes: a convolutional layer and a leaky linear rectification layer. This convolutional layer is used to convolve the output of the sixth module with 64 types of convolution kernels, and output 64-channel images. Among them, the size of each of the 64 types of convolution kernels is equal to 3*3. The leaky linear rectification layer is used to activate the image output by the convolutional layer based on the leaky linear rectification function to obtain the activation feature map of the image.

Wherein, the leakage linear rectification function can refer to the leakage linear rectification function in the first module, which is not described in detail in the embodiment of the present application.

(8) The eighth module includes: convolutional layer. The convolutional layer is used to convolve the output result of the seventh module with 3 types of convolution kernels, and output the above-mentioned super-division color image (with 3 channels). Among them, each of the 3 types of convolution kernels The size of the product core is 3*3.

Optionally, in the foregoing embodiment, the size of the convolution kernel used by each convolution layer in the first model is 3*3 as an example. Optionally, the size of each convolution kernel may not be 3. *3, such as 4*4, etc., this embodiment of the application does not limit this.

In the first implementation of step 204, after acquiring the target four-channel image, the object recognition device directly uses the first model to process the target four-channel image to obtain a super-differential color image, so that the super-differential color image can be obtained. The image efficiency is higher.

Optionally, before using the first model, the object recognition device may train the initial model based on the first training data to obtain the first model. Of course, the process of training the initial model to obtain the first model may not be executed by the object recognition device, which is not limited in the embodiment of the present application.

For example, when obtaining the first training data for training the first model, the raw image of the Bayer mode (which can be any raw image) can be obtained first, and then the obtained image can be obtained according to the binning interpolation method. The raw image is interpolated to obtain a small-sized degraded image of the raw image (the degraded image can be regarded as a raw image with reduced resolution). When acquiring the first training data, the acquired raw image may also be processed to obtain a color image of the raw image. The degraded image of the raw image and the color image of the raw image can be used as the first training data to train the initial model. For example, the degraded image is used as input during training, and the output result of the initial model is compared with the color image, and then the initial model is adjusted according to the comparison result. The initial model can be trained as the first model by repeating this process many times.

Among them, the pixel value of the pixel in the degraded image satisfies the following formula:

Among them, R _i,j is the pixel value of the red pixel at the (i,j) coordinate in the _{degraded image, and GR i+1,j} is the pixel of the green pixel at the (i+1,j) coordinate in the degraded image Value, GB _i,j+1 is the pixel value of the green pixel on the (i,j+1) coordinates in the _{degraded image, and B i+1,j+1} is the (i+1,j+1) in the degraded image ) The pixel value of the blue pixel on the coordinates. f _{(i-1)×K+1+2×r, (j-1)×K+1+2×c} represents the raw image corresponding to the degraded image (the degraded image is obtained by interpolating the raw image) in (( i-1)×K+1+2×r, (j-1)×K+1+2×c) the pixel value of the pixel on the coordinate. f _{(i-1)×K+1+2×r, (j-1)×K+2+2×c} means ((i-1)×K+1+2×r in the raw image corresponding to the degraded image , The pixel value of the pixel on the (j-1)×K+2+2×c) coordinates. f _{(i-1)×K+2+2×r, (j-1)×K+1+2×c} means ((i-1)×K+2+2×r in the raw image corresponding to the degraded image , The pixel value of the pixel on the (j-1)×K+1+2×c) coordinates. f _{(i-1)×K+2+2×r, (j-1)×K+2+2×c} represents the raw image corresponding to the degraded image ((i-1)×K+2+2×r , The pixel value of the pixel on the (j-1)×K+2+2×c) coordinates. K is the degradation factor, which is equal to the resolution ratio K of the super-division color image and the target region of the target image mentioned in step 205.

There is also a model for improving the resolution of a color image in the related art, and the training data required by the model needs to be obtained based on the degraded image of the color image. In this application, the first training data used to train the first model is obtained based on the degraded image of the raw image of the Bayer mode. In addition, the process of acquiring the degraded image of the color image is more complicated, while the process of acquiring the degraded image of the raw image is simple. Therefore, the efficiency of acquiring the first training data for training the first model in this application is relatively high. Accordingly, The accuracy of the first model obtained by training is also higher.

In the second implementation manner of step 204, as shown in FIG. 10, step 204 may include:

Step 2041b. Obtain a target four-channel image, where the image of the target area in the target raw image includes a plurality of pixel groups arranged in an array, the pixel group includes two rows and two columns of pixels, and the target four-channel image includes: the plurality of pixels The combination map of the pixels in the first row and the first column, the combination map of the pixels in the first row and the second column, the combination map of the pixels in the second row and the first column, and the combination map of the pixels in the second row and the second column in the group.

For step 2041b, reference may be made to the above step 2041a, which is not described in detail in the embodiment of the present application.

Step 2042b: Input the target four-channel image into the second model to obtain the target four-channel image output by the second model after the resolution has been increased.

The second model is used to super-divide the four-channel image of the unsuper-divided Bayer mode image, and output the four-channel image after the resolution is increased. Therefore, the four-channel image (referred to as the target four-channel image) of the target area image (a Bayer mode image, the embodiment of the present application takes the target area as the entire target image as an example) in the target image After the second model, the second model can process the target four-channel image, and then output the target four-channel image with increased resolution.

Optionally, the second model may be a neural network model.

For example, FIG. 11 is a schematic diagram of a second model provided by an embodiment of the application. As shown in FIG. 11, the second model may include: a first module, 16 second modules connected in series, and k first modules connected in series. Three modules, fourth module, fifth module and sixth module. For the explanation of these modules, reference may be made to the explanation of these modules in the first model shown in FIG. 8, which is not repeated in the embodiment of the present application.

Step 2043b: Convert the target four-channel image with the increased resolution into a Bayer mode image.

After obtaining the target four-channel image with the increased resolution output by the second model, the object recognition device may obtain the four-channel image of the target region in the target image (referred to as the target four-channel image) in step 2041b. In the opposite way, the target four-channel image with the increased resolution is converted into a Bayer mode image.

For example, the Bayer mode image may include a plurality of pixel groups, and each pixel group includes four kinds of pixels of red, green 1, green 2, and blue. In step 2043b, the object recognition device may convert the target four-channel image with the increased resolution into a Bayer mode image, and the pixels at the same position in the multiple pixel groups all come from the same combined image in the target four-channel image.

For example, suppose the image obtained after the resolution of the target four-channel image in Figure 7 is increased is as shown in Figure 12 (including the combination map of red pixels, the combination map of green 1-color pixels, the combination map of green 2-color pixels, and the blue Combination map of color pixels). The Bayer mode image obtained by the conversion of the target four-channel image after the resolution is increased may be as shown in FIG. 13. In the Bayer mode image in Figure 13, if the number of rows and columns of the pixel are both odd, then the pixel comes from the combination of red pixels in Figure 12; if the number of rows and the number of columns is even, the pixel is The pixel comes from the combination map of green 1-color pixels in Figure 12; if the number of rows of the pixel is even and the number of columns is odd, then the pixel comes from the combination map of green 2-color pixels in Figure 12; if the number of rows and columns of pixels If the numbers are even numbers, the pixel comes from the combination of blue pixels in FIG. 12.

Step 2044b: Perform a second process on the Bayer mode image to obtain a super-division color image.

Optionally, in step 2044b, the object recognition apparatus may perform second processing on the Bayer mode image (that is, the image of the target area after the resolution is increased) obtained in step 2043b based on the parameters of the first processing in step 202, Obtain a hyperdivision color image.

Optionally, when performing corresponding processing on the Bayer mode image based on the parameters for processing the target raw image into a color image, if the parameter is a parameter of automatic white balance processing, color correction processing or gamma correction processing, it can be directly used This parameter processes the Bayer mode image. If this parameter is a distortion correction processing parameter, because the Bayer mode image and the target raw image are different in size and resolution, it is necessary to correct the distortion correction processing parameters, and use the corrected distortion correction processing parameters Bayer mode image processing.

For example, the first processing includes the first distortion correction processing, and the second processing includes the second distortion correction processing; both the first distortion correction processing and the second distortion correction processing may be correction processing based on the distortion curve (such as Zhang’s distortion correction). handle). It is assumed that the parameters of the first distortion correction processing include: the first distortion curve; the parameters of the second distortion correction processing include: the second distortion curve; any sampling point (X ₀ , Y ₀ ) in the second distortion curve is in the Bayer mode image The coordinates of the corresponding pixel in is (X _Ai , Y _Ai ), and the coordinates of any sampling point (X ₀ , Y ₀ ) in the first distortion curve in the target image is (X _Bi , Y _Bi ) ; Among them, X _Ai ＝(X _Bi -(W _f /2+0.5))*K+(W _f *K/2+0.5); Y _Ai ＝(Y _Bi -(H _f /2+0.5))*K+ (H _f *K/2+0.5); W _f represents the width of the target image, and H _f represents the height of the target image.

Since the resolution of the target image and the Bayer mode image are different, the parameters for processing the target image into a color image may not be suitable for processing the Bayer mode image into a color image. If the target image is directly processed into color The processing of the Bayer mode image with the image parameters may cause problems (such as color cast, image distortion, and contrast changes) in the processed super-division color image. In the embodiments of the present application, the parameters of the distortion correction processing for processing the target raw image into a color image can be corrected, thus avoiding these problems in the obtained super-division color image.

In the embodiments of the present application, based on the parameters for processing the target raw image into a color image, corresponding processing is performed on the Bayer mode image to obtain a super-division color image. Of course, when corresponding processing is performed on the Bayer mode image based on the above parameters, the corresponding processing on the Bayer mode image may not be performed based on the parameters for processing the target raw image into a color image. The embodiment of the application does not limit this.

It can be seen that among the above two implementation manners of step 204, the first implementation manner is to directly obtain the super-division color image based on the first model, and the first implementation manner obtains the super-division color image faster. The second implementation method is to first super-divide the image of the target area to obtain the Bayer mode image (for example, obtain a target four-channel image with increased resolution through the second model, and then convert the target four-channel image into a Bayer mode image), After that, the Bayer mode image is processed into a super-division color image. The object recognition apparatus may use any one of these two implementation modes to perform step 204, or the object recognition apparatus may perform step 204 based on the user's selection by adopting the implementation mode selected by the user in the two implementation modes.

Optionally, before using the second model, the object recognition device may train the initial model based on the second training data to obtain the second model. Of course, the process of training the initial model to obtain the second model may not be executed by the object recognition device, which is not limited in the embodiment of the present application.

For example, when acquiring the second training data used for training to obtain the second model, the raw image of the Bayer mode (which can be any raw image) can be obtained first, and then the obtained image can be obtained according to the binning interpolation method. The raw image is interpolated to obtain a small-sized degraded image of the raw image (the degraded image can be regarded as a raw image with reduced resolution). For these processes, reference may be made to the process of obtaining the first training data in the foregoing embodiment, which is not described in detail in the embodiment of the present application. The obtained degraded image and raw image can be used as the second training data to train the initial model. When training the initial model, you can input the degraded image into the initial model, and compare the output result of the initial model with the raw image corresponding to the degraded image, and adjust the initial model according to the comparison result; repeat the process many times Then the initial model can be trained as the second model.

Further, in the foregoing embodiment, the target area of the target image in step 204 is the entire area of the target image as an example. Optionally, the target area of the target image may be an area (such as a partial area) that contains the target object in the target image.

For example, the target raw image may be the m+1th frame image in the raw image video (in the embodiment of the application, the raw image video is the Bayer mode video), m≥1, at this time, in Fig. 2 Based on the illustrated object recognition method, as shown in FIG. 14, before step 204, the object recognition method may further include:

Step 301: Perform a third process on the m-th frame image in the video to obtain a color image of the m-th frame image.

Step 302: Determine a third region containing the target object in the color image of the m-th frame of image.

Step 303: Determine a target area corresponding to the third area in the (m+1)th frame of image.

In step 303, the object recognition device may determine the target area corresponding to the third area in the (m+1)th frame of image (that is, the target raw image). The content contained in the third area is roughly similar to the corresponding target area. The similarity of the features of these two areas is greater than the similarity threshold (such as 80%, 90%, etc.). The target area in the target image is defined by the target area. The corresponding third region in the m-th frame of image changes.

Optionally, both the target area and the third area are rectangular, the coordinates of the upper left corner of the third area are (X _D1 , Y _D1 ), and the coordinates of the lower right corner are (X _D2 , Y _D2 ); the target area corresponding to the third area The coordinates of the upper left corner of is (X _C1 , Y _C1 ), and the coordinates of the lower right corner are (X _C2 , Y _C2 ); where X _D1 =(X _C1 -(X _C1 +X _C2 )/2)*L+(X _C1 +X _C2 )/2; Y _D1 =(Y _C1 -(Y _C1 +Y _C2 )/2)*L+(Y _C1 +Y _C2 )/2; X _D2 =(X _C2 -(X _C1 +X _C2 ) /2)*L+(X _C1 +X _C2 )/2; Y _D2 =(Y _C2 -(Y _C1 +Y _C2 )/2)*L+(Y _C1 +Y _C2 )/2; L>1. L can be set by the user, such as 1.5≤L≤3.

For example, assuming L=3, the m-th frame image is shown in FIG. 15 and the m+1-th frame image is shown in FIG. 16. If the coordinates of the upper left corner of a third area in the mth frame image are (3, 3), and the coordinates of the lower right corner are (6, 6), then the third area corresponds to the target in the image frame m+1 The coordinates of the upper left corner of the area are ((X _C1 -(X _C1 +X _C2 )/2)*L+(X _C1 +X _C2 )/2, (Y _C1 -(Y _C1 +Y _C2 )/2)*L+( Y _C1 +Y _C2 )/2)=((3-(3+6)/2)*3+(3+6)/2, (3-(3+6)/2)*3+(3+ 6)/2)=(0,0), and the coordinates of the lower right corner are ((6-(3+6)/2)*3+(3+6)/2, (6-(3+6)/2 )*3+(3+6)/2)=(9,9).

Step 304: When the (m+1)th frame of image contains multiple target areas, and there are two target areas that satisfy the replacement condition in the multiple target areas, replace the two target areas with candidate target areas to obtain an updated Multiple target areas.

The replacement condition includes: the two target regions at least partially overlap, and the sum of the areas of the two target regions is greater than the area of the candidate target region. Among them, the coordinates of the upper left corner of one of the two target areas are (X ₁₁ , Y ₁₁ ), and the coordinates of the lower right corner are (X ₁₂ , Y ₁₂ ); the coordinates of the upper left corner of the other of the two target areas are (X ₂₁ , Y ₂₁ ), and the lower right corner coordinates are (X ₂₂ , Y ₂₂ ); the upper left corner coordinates of the candidate target area are (X _M1 , Y _M1 ), and the lower right corner coordinates are (X _M2 , Y _M2 ) ; X _M1 is the minimum value of X ₁₁ and X ₂₁ _{; Y M1} is the minimum value of Y ₁₁ and Y ₂₁ ; X _M2 is the maximum value of X ₁₂ and X ₂₂ _{; Y M2} is the maximum value of Y ₁₂ and Y ₂₂ .

For example, suppose that the target area in the target raw image is as shown in FIG. 17, and includes: target area 1 and target area 2. Among them, the coordinates of the upper left corner of the target area 1 are (3, 6), and the coordinates of the lower right corner are (6, 3), the coordinates of the upper left corner of the target area 2 are (0, 9), and the coordinates of the lower right corner of the target area 2 are ( 9, 0); The coordinates of the upper left corner of the candidate target area X (not belonging to the multiple target areas determined in step 301) may be (0, 6), and the coordinates of the lower right corner may be (9, 3). It can be seen that the target area 1 and the target area 2 at least partially overlap, and the sum (90) of the area (9) of the target area 1 and the area (81) of the target area 2 is greater than the area (27) of the candidate target area X. Therefore, the target area 1 and the target area 2 meet the above replacement conditions, and the target area 1 and the target area 2 of the multiple target areas can be replaced with the candidate target area X, thereby achieving the multiple targets determined in step 301 Regional update.

Since multiple target regions satisfying the replacement conditions are replaced with candidate target regions, the number of target regions in the target raw image is reduced, and the process of identifying target objects based on the target regions is simplified.

Optionally, the object recognition apparatus may sequentially use each of the multiple target areas determined in step 301 as the reference area, and execute the update process of the multiple target areas.

Wherein, the update process may include: the object recognition device sequentially determines whether the reference area and each of the multiple target areas except the area satisfy the replacement condition. Once a certain reference area and a certain other area meet the replacement conditions, the object recognition device can replace the reference area and the other area with candidate target areas corresponding to the two areas.

Alternatively, the update process may include: the object recognition device may also find a group of other areas in all other areas, and each area in the group of other areas meets the replacement condition with the reference area; after that, the object recognition device returns The area difference corresponding to each area in the group of other areas can be determined (the difference between the sum of the area of the area and the reference area and the area of the candidate target areas corresponding to these two areas), and compare the reference area with the other areas in the group. The corresponding area with the largest area difference in the area is replaced with the candidate target area corresponding to the two areas.

After updating the multiple target areas, the object recognition device can use any one of the updated multiple target areas as the target area in step 204. For example, the object recognition device can intercept each target area in the target raw image to obtain an image of each target area, and use the method shown in FIG. 2 to process the image of each target area and recognize the target object.

Further, if after the recognition of the target object based on the m+1 frame image, it is necessary to continue the recognition of the target object based on the m+2 frame image, the recognition process can refer to the recognition based on the m+1 frame image. The process of identifying the target object. In addition, the target area in the image of the m+1th frame in the above step 204 can be used as the third area in the image of the previous frame of the image of the m+2th frame, and there is no need to identify the target object based on the image of the m+2th frame. During the process, re-determine the third area in the previous frame of the m+2th frame of image.

In the foregoing embodiment, after multiple target areas are determined in step 303, the multiple target areas are also updated in step 304 as an example. Optionally, step 304 may not be performed, but the multiple target areas are determined in step 303. Step 204 is directly executed after the target area is set, which is not limited in the embodiment of the present application.

The order of the steps in the method embodiments provided in the embodiments of this application can be adjusted appropriately, and the steps can be increased or decreased accordingly according to the situation. Any person skilled in the art can easily think of changes within the technical scope disclosed in this application. The methods should all be covered in the scope of protection of this application, so I won’t repeat them here.

FIG. 18 is a block diagram of an object recognition device provided by an embodiment of this application, which can run the aforementioned object recognition method. As shown in Figure 18, the object recognition device includes:

The obtaining module 1801 is configured to obtain the target raw image generated by the image sensor; the operation performed by the obtaining module 1801 can refer to step 201 in the embodiment shown in FIG. 2, and details are not described in this embodiment of the present application.

The super-division module 1802 is used to super-division the image of the target region in the target image to obtain a super-division color image, wherein the target region includes at least a part of the target image, and the resolution of the super-division color image is greater than that of the target region The resolution of the image; the operation performed by the super-division module 1802 can refer to step 204 in the embodiment shown in FIG. 2, and details are not described in this embodiment of the present application.

The recognition module 1803 is used for recognizing the target object based on the hyperdivision color image. For operations performed by the identification module 1803, reference may be made to step 205 and step 206 in the embodiment shown in FIG. 2, and details are not described in the embodiment of the present application.

Optionally, the target image is a Bayer mode image.

Optionally, the object recognition device further includes:

The first processing module (not shown in FIG. 18) is used to perform the first processing on the target raw image to obtain a color image of the target raw image; for the operations performed by the first processing module, refer to the embodiment shown in FIG. 2 Step 202 in the embodiment of the present application will not be repeated here.

The first determining module (not shown in FIG. 18) is used to determine the first area of the target object in the color image of the target image; the operation performed by the first determining module can refer to the embodiment shown in FIG. 2 Step 203 of the embodiment of the present application will not be repeated here.

Wherein, the recognition module 1803 is configured to: determine the second area in the super-division color image based on the first area; and perform the recognition of the target object based on the image of the second area in the super-division color image.

Optionally, the target raw image is a Bayer mode image, and the super-division module 1802 is used to: obtain a target four-channel image, where the image of the target area includes a plurality of pixel groups arranged in an array, and the pixel group includes two rows and two columns of pixels , The target four-channel image includes: a combination map of pixels in the first row and column 1, a combination map of pixels in the first row and second column, a combination map of pixels in the second row and first column, and a combination map of the pixels in the second row and second column. A combined image of columns of pixels; input the target four-channel image into the first model to obtain the super-division color image output by the first model; where the first model is used to super-division the four-channel image of the unsuper-division Bayer mode image , Output the color image of the Bayer mode image after the resolution has been increased. For this process, reference may be made to the embodiment shown in FIG. 6, and details are not described herein in the embodiment of the present application.

Optionally, the target image is a Bayer mode image, and the super-division module 1902 includes: a supramolecular module (not shown in FIG. 18) for super-division of the image of the target area to obtain a Bayer mode image; processing sub-module (Not shown in FIG. 18), used to perform the second processing on the Bayer mode image to obtain a super-division color image.

Optionally, the supramolecular module is used to obtain a target four-channel image, where the image of the target area includes a plurality of pixel groups arranged in an array, the pixel group includes two rows and two columns of pixels, and the target four-channel image includes: a plurality of pixels The combination map of the pixels in the first row and the first column, the combination map of the pixels in the first row and the second column, the combination map of the pixels in the second row and the first column, and the combination map of the pixels in the second row and the second column in the group; target four channels The image is input to the second model, and the target four-channel image with the increased resolution output by the second model is obtained; the target four-channel image with the increased resolution is converted into a Bayer mode image; among them, the second model is used for the unsupervised The four-channel image of the Bayer mode image is super-divided, and the four-channel image with increased resolution is output. For this process, reference may be made to the embodiment shown in FIG. 10, and details are not described herein in the embodiment of the present application.

Optionally, the first process includes a first distortion correction process, and the second process includes a second distortion correction process; the parameters of the first distortion correction process include: a first distortion curve; the parameters of the second distortion correction process include: a second distortion Curve; the coordinates of any sampling point (X ₀ , Y ₀ ) in the second distortion curve corresponding to the pixel in the hyperdivision target image is (X _Ai , Y _Ai ), any sampling point in the first distortion curve (X ₀ , Y ₀ ) The coordinates of the corresponding pixel in the target image are (X _Bi , Y _Bi ); where X _Ai = (X _Bi -(W _f /2+0.5))*K+(W _f * K/2+0.5); Y _Ai = (Y _Bi -(H _f /2+0.5))*K+(H _f *K/2+0.5); W _f represents the width of the target image, H _f represents the target image The height of the image, K represents the resolution ratio of the super-division color image to the image in the target area.

Optionally, the object recognition device further includes: a second determination module (not shown in FIG. 18), configured to determine a target area in the target image containing the target object. For this process, reference may be made to step 303 in the embodiment shown in FIG. 14, and details are not described in the embodiment of the present application.

Optionally, the target raw image is the m+1th frame image in the raw image video, m≥1, and the object recognition device further includes:

The third processing module (not shown in FIG. 18) is used to perform third processing on the m-th frame image in the raw image video to obtain a color image of the m-th frame image; this process can refer to the embodiment shown in FIG. 14 Step 301 in the embodiment of the application will not be repeated here.

The third determining module (not shown in FIG. 18) is used to determine the third region containing the target object in the color image of the m-th frame of image; this process can refer to step 302 in the embodiment shown in FIG. 14. This application The embodiments are not described in detail here.

The second determining module is used for determining the target area corresponding to the third area in the target raw image.

Optionally, the object recognition device further includes:

Replacement module (not shown in Figure 18), used to replace the two target regions as candidate targets when the target image contains multiple target regions, and there are two target regions that meet the replacement conditions in the multiple target regions Region, the updated multiple target regions; among which, the replacement conditions include: two target regions at least partially overlap, and the sum of the two target regions is greater than the area of the candidate target region; the target region of one of the two target regions The coordinates of the upper left corner are (X ₁₁ , Y ₁₁ ), and the coordinates of the lower right corner are (X ₁₂ , Y ₁₂ ); the coordinates of the upper left corner of the other of the two target areas are (X ₂₁ , Y ₂₁ ), and the lower right corner The coordinates are (X ₂₂ , Y ₂₂ ); the upper left corner coordinates of the candidate target area are (X _M1 , Y _M1 ), and the lower right corner coordinates are (X _M2 , Y _M2 ); X _M1 is the smallest of X ₁₁ and X ₂₁ Y _M1 is the minimum value of Y ₁₁ and Y ₂₁ _{; X M2} is the maximum value of X ₁₂ and X ₂₂ _{; Y M2} is the maximum value of Y ₁₂ and Y ₂₂ . For this process, reference may be made to step 304 in the embodiment shown in FIG. 14, and details are not described herein in the embodiment of the present application.

To sum up, in the object recognition device provided by the embodiments of the present application, the super-division module can perform super-division processing on the image of the target area in the target raw image to enlarge and enhance the detailed information in the image of the target area. Since the super-divided image is a raw image that has not been processed by ISP, the image of the target area after the super-division processing can contain more detailed information. Therefore, the super-division color image (the color image of the image of the target area after the super-division processing) obtained by this embodiment has more detailed information than the color image in the prior art. Therefore, the recognition of the target object based on the super-division color image can improve the accuracy of the recognition of the target object.

The embodiment of the present application provides a computer storage medium in which a computer program is stored, and the computer program is used to execute any object recognition method provided in the embodiment of the present application.

The embodiments of the present application provide a computer program product containing instructions. When the computer program product runs on an object recognition device, the object recognition device executes any object recognition method provided in the embodiments of the present application.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in the form of a computer program product in whole or in part, and the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data. The center transmits to another website, computer, server, or data center through wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium, or a semiconductor medium (for example, a solid state hard disk).

In this application, the terms "first" and "second", etc. are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance. The term "at least one" refers to one or more, and "multiple" refers to two or more, unless expressly defined otherwise.

Different types of embodiments such as method embodiments and device embodiments provided in the embodiments of the present application can be referred to each other, which is not limited in the embodiments of the present application.

In the corresponding embodiments provided in the present application, it should be understood that the disclosed device and the like can be implemented in other structural manners. For example, the device embodiments described above are merely illustrative, for example, the division of units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.

The units described as separate components may or may not be physically separate, and the components described as units may or may not be physical units, and may be located in one place or distributed to multiple object recognition devices (such as terminals). Equipment). Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Anyone familiar with the technical field can easily think of various equivalents within the technical scope disclosed in this application. Modifications or replacements, these modifications or replacements shall be covered within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims

An object recognition method, characterized in that the method includes:

Acquire the target image generated by the image sensor;

Perform super-division on the image of the target region in the target image to obtain a super-division color image, wherein the target region includes at least part of the region in the target image, and the resolution of the super-division color image is greater than that of all the regions. The resolution of the image of the target area;

The target object is recognized based on the super-division color image.
The method according to claim 1, wherein the target raw image is a Bayer mode image.
The method according to claim 1 or 2, wherein the method further comprises:

Performing first processing on the target image to obtain a color image of the target image;

Determining the first area of the target object in the color image of the target image;

Wherein, the recognition of the target object based on the super-division color image specifically includes:

Determining a second area in the super-division color image based on the first area;

Based on the image of the second region in the super-division color image, the target object is recognized.
The method according to any one of claims 1 to 3, wherein the target raw image is a Bayer mode image, and the image of the target area in the target raw image is super-divided to obtain a super-differentiated color Images, including:

Acquire a target four-channel image, wherein the image of the target area includes a plurality of pixel groups arranged in an array, the pixel group includes two rows and two columns of pixels, and the target four-channel image includes: The combination map of the pixels in the first row and the first column, the combination map of the pixels in the first row and the second column, the combination map of the pixels in the second row and the first column, and the combination map of the pixels in the second row and the second column;

Input the target four-channel image into a first model to obtain the super-division color image output by the first model;

Wherein, the first model is used to super-divide the four-channel image of the Bayer-mode image that is not super-divided, and output the color image of the Bayer-mode image after the resolution is increased.
The method according to claim 3, wherein the target image is a Bayer mode image, and the super-division of the image of the target area in the target image to obtain a super-division color image comprises:

Super-divide the image of the target area to obtain a Bayer mode image;

The second processing is performed on the Bayer mode image to obtain the super-division color image.
The method according to claim 5, wherein the super-division of the image of the target area to obtain the Bayer mode image comprises:

Acquire a target four-channel image, wherein the image of the target area includes a plurality of pixel groups arranged in an array, the pixel group includes two rows and two columns of pixels, and the target four-channel image includes: The combination map of the pixels in the first row and the first column, the combination map of the pixels in the first row and the second column, the combination map of the pixels in the second row and the first column, and the combination map of the pixels in the second row and the second column;

Inputting the target four-channel image into a second model to obtain the target four-channel image with an increased resolution output by the second model;

Converting the target four-channel image after the resolution has been increased into the Bayer mode image;

Wherein, the second model is used to super-divide the four-channel image of the unsuper-divided Bayer mode image, and output the four-channel image with increased resolution.
The method according to claim 6, wherein the first processing includes a first distortion correction processing, and the second processing includes a second distortion correction processing;

The parameters of the first distortion correction processing include: a first distortion curve; the parameters of the second distortion correction processing include: a second distortion curve; any sampling point (X 0 , Y 0) in the second distortion curve ) The coordinates of the corresponding pixel in the hyperdivision target image are (X Ai , Y Ai ), and any sampling point (X 0 , Y 0 ) in the first distortion curve is in the target image The coordinates of the corresponding pixel are (X Bi , Y Bi );

Among them, X Ai =(X Bi -(W f /2+0.5))*K+(W f *K/2+0.5); Y Ai =(Y Bi -(H f /2+0.5))*K+( H f *K/2+0.5); W f represents the width of the target image, H f represents the height of the target image, and K represents the resolution of the super-division color image and the image of the target area ratio.
The method according to any one of claims 1 to 7, characterized in that, before the super-division of the image of the target area in the target raw image to obtain a super-division color image, the method further comprises:

Determining the target area containing the target object in the target raw image.
The method according to claim 8, wherein the target raw image is the m+1th frame image in the raw image video, m≥1, and the target object is included in the determined target raw image Before the target area, the method further includes:

Performing a third process on the m-th frame image in the raw image video to obtain a color image of the m-th frame image;

Determining that the color image of the m-th frame image contains the third region of the target object;

The determining the target area in the target image containing the target object includes:

Determining the target area corresponding to the third area in the target image.
The method according to claim 8 or 9, characterized in that, before the super-division of the image of the target region in the target raw image to obtain a super-division color image, the method further comprises:

When the target raw image includes multiple target regions, and there are two target regions that satisfy the replacement condition among the multiple target regions, replace the two target regions with candidate target regions to obtain the updated Multiple target areas;

Wherein, the replacement condition includes: the two target regions at least partially overlap, and the sum of the areas of the two target regions is greater than the area of the candidate target region;

The coordinates of the upper left corner of one of the two target areas are (X 11 , Y 11 ), and the coordinates of the lower right corner are (X 12 , Y 12 ); the upper left corner of the other of the two target areas The coordinates are (X 21 , Y 21 ), and the lower right corner coordinates are (X 22 , Y 22 ); the upper left corner coordinates of the candidate target area are (X M1 , Y M1 ), and the lower right corner coordinates are (X M2 , Y M2 ); X M1 is the minimum value of X 11 and X 21 ; Y M1 is the minimum value of Y 11 and Y 21 ; X M2 is the maximum value of X 12 and X 22 ; Y M2 is the minimum value of Y 12 and Y 22 Maximum value.
An object recognition device, characterized in that the object recognition device includes:

The acquisition module is used to acquire the target image generated by the image sensor;

The super-division module is used to super-division the image of the target region in the target image to obtain a super-division color image, wherein the target region includes at least a part of the region in the target image, and the super-division color The resolution of the image is greater than the resolution of the image in the target area;

The recognition module is used to recognize the target object based on the super-division color image.
The object recognition device according to claim 11, wherein the target raw image is a Bayer pattern image.
The object recognition device according to claim 11 or 12, wherein the object recognition device further comprises:

The first processing module is configured to perform first processing on the target image to obtain a color image of the target image;

A first determining module, configured to determine a first region in the color image of the target image that contains the target object;

Wherein, the identification module is used for:

Determining a second area in the super-division color image based on the first area;

Based on the image of the second region in the super-division color image, the target object is recognized.
The object recognition device according to claim 13, wherein the target raw image is a Bayer pattern image, and the super-division module comprises:

The supramolecular module is used to super-divide the image of the target area to obtain a Bayer mode image;

The processing sub-module is configured to perform a second processing on the Bayer mode image to obtain the super-division color image.
The object recognition device according to any one of claims 11 to 14, wherein the object recognition device further comprises:

The second determining module is configured to determine the target area containing the target object in the target raw image.
The object recognition device according to claim 15, wherein the target raw image is the m+1th frame image in the raw image video, m≥1, and the object recognition device further comprises:

The third processing module is configured to perform third processing on the m-th frame image in the raw image video to obtain a color image of the m-th frame image;

A third determining module, configured to determine a third region containing the target object in the color image of the m-th frame of image;

The second determining module is configured to determine the target area corresponding to the third area in the target image.
An object recognition device, characterized in that the object recognition device includes a processor and an interface, the processor is used to obtain a raw image from an image sensor through the interface, and the processor is used to run a program so that the The object recognition device executes the object recognition method according to any one of claims 1 to 10.
A computer storage medium, characterized in that a computer program is stored in the storage medium, and the computer program is used to execute the object recognition method according to any one of claims 1 to 10.