CN115223173A

CN115223173A - Object identification method and device, electronic equipment and storage medium

Info

Publication number: CN115223173A
Application number: CN202211140612.1A
Authority: CN
Inventors: 唐可信; 叶立平
Original assignee: Shenzhen Akusense Technology Co Ltd
Current assignee: Shenzhen Akusense Technology Co Ltd
Priority date: 2022-09-20
Filing date: 2022-09-20
Publication date: 2022-10-21

Abstract

The application provides an object identification method, an object identification device, electronic equipment and a storage medium, and the object identification method and the device are used for collecting an image to be identified; identifying the image type of the image to be identified; segmenting a foreground image from the image to be identified according to an image segmentation strategy corresponding to the image type; estimating an estimated region of the character object in the image to be recognized based on the image characteristics corresponding to the image to be recognized; acquiring pixel proximity information corresponding to each pixel point in the image to be identified; and identifying the character object in the foreground image according to the estimated area and the pixel proximity information, wherein the method can improve the accuracy of character object identification.

Description

Object identification method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of video data processing, and in particular, to an object identification method and apparatus, an electronic device, and a storage medium.

Background

With the development of Optical Character Recognition (OCR) technology, many scholars have begun to research Character Recognition in images. The character extraction of the image has important significance and effect on identifying character information embedded in the complex image. Digital images contain a large amount of useful information.

Pictures, which are generally obtained by a photographing apparatus, are classified into pictures with natural scenes and plain text pictures. Due to the complexity of a natural scene, the text background in the natural scene is quite complex, namely, the text in the image is usually superposed on the complex image background, and meanwhile, the characters are shielded, folded or deformed due to different shooting angles, so that the problem of low accuracy rate of the currently identified character object exists.

Disclosure of Invention

The embodiment of the application provides an object identification method and device, which can improve the accuracy of character object identification.

An embodiment of the present application provides an object identification method, which includes:

collecting an image to be identified;

identifying the image type of the image to be identified;

according to an image segmentation strategy corresponding to the image type, segmenting a foreground image from the image to be identified;

estimating an estimated region of the character object in the image to be recognized based on the image characteristics corresponding to the image to be recognized;

acquiring pixel proximity information corresponding to each pixel point in the image to be identified;

and identifying the character object in the foreground image according to the estimated area and the pixel proximity information.

Optionally, in the object identification method according to the present application, the identifying a character object in the foreground image according to the estimated area and the pixel proximity information includes:

constructing a reference area corresponding to the character object in the foreground image based on the pixel proximity information corresponding to each pixel point;

recombining the pre-estimated area according to the positions of the reference area and the foreground image in the image to be identified;

and identifying the character object in the foreground image based on the recombined region, the estimated region and the reference region.

Optionally, in the object identification method according to the present application, the reconstructing the pre-estimated region according to the positions of the reference region and the foreground image in the image to be identified includes:

determining a mapping area corresponding to the pre-estimated area in the foreground image according to the position of the foreground image in the image to be identified;

respectively detecting the distance from each boundary of each reference area to the mapping area;

arranging the boundaries of the reference areas according to the detection result;

reorganizing the reference region based on a relative position relationship between the mapping region and the reference region and a sorting result;

the recognizing the character object in the foreground image based on the reconstructed region, the pre-estimated region and the reference region includes: and identifying the character object in the foreground image based on the reorganized area, the mapping area and the reference area.

Optionally, in the object identification method according to the present application, the identifying a character object in the foreground image based on the reorganized area, the mapping area, and the reference area includes:

calculating the intersection ratio between the recombined region and the mapping region, and;

calculating an intersection-to-parallel ratio between the reference region and the mapping region;

and determining the area with the intersection ratio larger than a preset value as a first candidate area, and identifying the character object in the first candidate area.

Optionally, in the object identification method described in the present application, the method further includes:

determining the area with the intersection ratio larger than a preset value as a second candidate area;

detecting whether the intersection ratio of each boundary in the second candidate region is larger than that of the corresponding second candidate region;

determining a boundary larger than the intersection ratio of the corresponding second candidate regions as a target boundary;

performing non-maximum inhibition processing on the first candidate region according to the target boundary to obtain a third candidate region;

the identifying the character object in the first candidate region comprises: identifying a character object in the third candidate region.

Optionally, in the object identification method according to the present application, the constructing a reference region corresponding to a character object in the foreground image based on the pixel proximity information corresponding to each pixel point includes:

determining a target pixel point corresponding to the character object;

taking a target pixel point as a center, acquiring a pixel point within a preset range according to pixel proximity information corresponding to the target pixel point, and obtaining a reference pixel point corresponding to the target pixel point;

and constructing a reference region corresponding to the character object in the foreground image according to the target pixel points and the reference pixel points corresponding to the target pixel points.

Optionally, in the object identification method according to the present application, the segmenting a foreground image from the image to be identified according to an image segmentation policy corresponding to the image type includes:

detecting the image type;

when the image type is detected to be the environment image type, extracting the characteristic image of the image to be identified, and carrying out classification processing on the characteristic image to obtain the classification probability that each characteristic point in the characteristic image belongs to the foreground; constructing an error adjustment image of the characteristic image according to the classification probability; segmenting a foreground image from the image to be identified based on the error adjustment image and the characteristic image;

and when the image type is detected to be an electronic image type, carrying out binarization processing on the image to be identified, and segmenting a foreground image from the image to be identified based on binarization pixel points of which the pixel values in the binarization image are preset pixel values.

An embodiment of the present application further provides an object recognition apparatus, which includes:

the acquisition module is used for acquiring an image to be identified;

the first identification module is used for identifying the image type of the image to be identified;

the segmentation module is used for segmenting a foreground image from the image to be identified according to an image segmentation strategy corresponding to the image type;

the estimation module is used for estimating an estimation area of the character object in the image to be recognized based on the image characteristics corresponding to the image to be recognized;

the acquisition module is used for acquiring pixel proximity information corresponding to each pixel point in the image to be identified;

and the second identification module is used for identifying the character object in the foreground image according to the estimated area and the pixel proximity information.

An embodiment of the present application further provides an electronic device, which includes a photoelectric sensor, a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the program to perform the object recognition method.

Embodiments of the present application also provide a storage medium having stored therein processor-executable instructions, which are loaded by one or more processors to perform the above object recognition method.

The application provides an object recognition method, an object recognition device, electronic equipment and a storage medium, after an image to be recognized is collected, a foreground image is segmented from the image to be recognized according to an image segmentation strategy corresponding to the image type, then, an estimated region of a character object in the image to be recognized is estimated based on image characteristics corresponding to the image to be recognized, then, pixel proximity information corresponding to each pixel point in the image to be recognized is obtained, and finally, the character object in the foreground image is recognized according to the estimated region and the pixel proximity information.

Drawings

Fig. 1 is a schematic view of a scene of an object recognition method according to the present application;

FIG. 2 is a flow chart of an object identification method of the present application;

FIG. 3 is a schematic diagram of region reorganization in the object recognition method of the present application;

FIG. 4 is another schematic flow chart of the object recognition method of the present application;

FIG. 5 is a schematic structural diagram of a deep learning network according to the present application;

FIG. 6 is a schematic diagram of a third sub-network in the deep learning network of the present application;

fig. 7 is a schematic structural diagram of an object recognition apparatus of the present application;

fig. 8 is a schematic structural diagram of a second identification module of an embodiment of an object identification apparatus according to the present application;

fig. 9 is a schematic view of a working environment structure of an electronic device in which the object recognition apparatus of the present application is located.

Detailed Description

Referring to the drawings, wherein like reference numbers refer to like elements, the principles of the present application are illustrated as being implemented in a suitable computing environment. The following description is based on illustrated embodiments of the application and should not be taken as limiting the application with respect to other embodiments that are not detailed herein.

In the description that follows, specific embodiments of the present application will be described with reference to steps and symbols of operations performed by one or more computers, unless otherwise indicated. It will thus be appreciated that those steps and operations, which are referred to herein several times as being computer-executed, include being manipulated by a computer processing unit in the form of electronic signals representing data in a structured form. This manipulation transforms the data or maintains it at locations in the computer's memory system, which may be reconfigured or otherwise altered in a manner well known to those skilled in the art. The data maintains a data structure that is a physical location of the memory that has particular characteristics defined by the data format. However, the principles of the present application are described in the foregoing text and are not meant to be limiting, as those of ordinary skill in the art will appreciate that various steps and operations described below may be implemented in hardware.

The object identification method and the object identification device can be arranged in any electronic equipment and are used for collecting an image to be identified, identifying the image type of the image to be identified, segmenting a foreground image from the image to be identified according to an image segmentation strategy corresponding to the image type, estimating an estimation area of a character object in the image to be identified based on the image characteristics corresponding to the image to be identified, acquiring pixel proximity information corresponding to each pixel point in the image to be identified, and identifying the character object in the foreground image according to the estimation area and the pixel proximity information. Including, but not limited to, personal computers, server computers, multiprocessor systems, consumer electronics, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The object recognition device is preferably a data processing terminal or a server for image recognition, and is used for recognizing the character objects in the foreground image based on the estimated area and the pixel proximity information, so that the problem of low accuracy of the recognized character objects caused by shielding, wrinkling or deformation of characters due to complicated text backgrounds and different shooting angles of the image to be recognized is solved, and the accuracy of character object recognition can be improved.

For example, referring to fig. 1, the object recognition apparatus is integrated in a vehicle-mounted camera 10, the vehicle-mounted camera 10 is connected to a processor 20, when the vehicle-mounted camera 10 receives a character recognition instruction, the vehicle-mounted camera 10 collects an image (i.e., an image to be recognized) within a visual field, then the vehicle-mounted camera 10 recognizes an image type of the image to be recognized, and segments a foreground image from the image to be recognized according to an image segmentation policy corresponding to the image type, then the vehicle-mounted camera 10 estimates an estimated region of a character object in the image to be recognized based on an image feature corresponding to the image to be recognized, then the vehicle-mounted camera 10 obtains pixel proximity information corresponding to each pixel point in the image to be recognized, and finally, the vehicle-mounted camera 10 recognizes the character object in the foreground image according to the estimated region and the pixel proximity information.

According to the object recognition scheme, the image segmentation strategy is flexibly output according to the image type, and the character object recognition is carried out by using the parameters of different dimensions of the image to be recognized, so that the accuracy of character object recognition can be improved.

Referring to fig. 2, fig. 2 is a flowchart illustrating an embodiment of an object recognition method according to the present disclosure. The object recognition method of the present embodiment may be implemented by using the electronic device, and the object recognition method of the present embodiment includes:

step 101, collecting an image to be identified.

And 102, identifying the image type of the image to be identified.

And 103, segmenting a foreground image from the image to be identified according to an image segmentation strategy corresponding to the image type.

And 104, estimating an estimated region of the character object in the image to be recognized based on the image characteristics corresponding to the image to be recognized.

And 105, acquiring pixel proximity information corresponding to each pixel point in the image to be identified.

And 106, identifying the character object in the foreground image according to the estimated area and the pixel proximity information.

The object recognition method of the present embodiment is explained in detail below.

In step 101, the image to be recognized includes the character object to be detected, and the image to be recognized may be pre-stored locally, may also be pulled through accessing a network interface, and may also be obtained through real-time shooting by a camera, which is determined according to the actual situation.

In step 102, the image type may include an environment image type and an electronic image type, the image to be identified of the environment image type may be a shot image, and the image to be identified of the electronic image type may be a scanned image, an image generated by an electronic device, and the like.

In step 103, different image segmentation strategies are output according to different image types, so that the flexibility of character object recognition in the scheme can be improved. It can be understood that compared with an image to be recognized of an electronic image type, a character object in the image to be recognized of an environmental image type is usually superposed on a complex image background, and meanwhile, during image acquisition, characters are shielded, wrinkled or deformed at different acquisition angles, so that the image to be recognized of the environmental image type can be processed by using image features of the image to be recognized to segment a foreground image; in addition, since the character object is usually clear in the to-be-identified image of the electronic image type, binarization may be used to process the to-be-identified image of the type, that is, optionally, in some embodiments, the step "segmenting the foreground image from the to-be-identified image according to an image segmentation policy corresponding to the image type" may specifically include:

(11) Detecting the image type;

(12) When the image type is detected to be the environment image type, extracting a characteristic image of the image to be identified, and carrying out classification processing on the characteristic image to obtain classification probability of each characteristic point in the characteristic image belonging to the foreground; constructing an error adjustment image of the characteristic image according to the classification probability; segmenting a foreground image from the image to be identified based on the error adjustment image and the characteristic image;

(13) When the image type is detected to be an electronic image type, binarization processing is carried out on the image to be recognized, and a foreground image is segmented from the image to be recognized on the basis of binarization pixel points of which the pixel values in the binarization image are preset pixel values.

For an image to be identified of an environment type, a feature image of the image to be identified may be extracted, for example, the image to be identified may be first downsampled and encoded to obtain a downsampled image, then the downsampled image may be upsampled and encoded to obtain an upsampled image, then the upsampled image may be convolved, and the upsampled image after the convolution may be normalized to obtain a probability that each pixel in the upsampled image belongs to a foreground and a probability that each pixel belongs to a background, and then a maximum value of the probabilities that each pixel belongs to the foreground may be determined as a classification probability corresponding to each pixel.

In the process of identifying the background and the foreground in the image, identification errors are likely to occur, for example, pixel points belonging to the foreground are erroneously identified as pixel points belonging to the background, or pixel points belonging to the background are erroneously identified as pixel points belonging to the foreground. In order to avoid the problem of false identification, pixel points can be compensated, specifically, pixel points to be compensated with classification probability values smaller than a preset value can be determined, a difference value between the pixel points to be compensated and the preset value is calculated, the difference value is determined as a compensation value of the pixel points to be compensated, an error adjustment image is constructed based on the compensation value and an up-sampling image, then, the characteristic of the pixel in the down-sampling image is multiplied by the compensation value of the pixel corresponding to the error adjustment image to obtain a compensation characteristic of the pixel, and then, the compensation characteristic of the pixel is added with the characteristic of the pixel at the same position in the up-sampling image to obtain a fusion characteristic of the pixel; and finally, determining the pixels with the classification probability of the foreground corresponding to each pixel in the fusion characteristic image larger than the foreground probability threshold as foreground pixels, and performing communication processing on the foreground pixels to obtain the foreground image.

In step 104, feature extraction may be performed on the image to be recognized by using a feature pyramid neural network to obtain image features corresponding to the image to be recognized, then, pixel-by-pixel detection may be performed on the image to be recognized according to the image features, and a scale may be predicted for each pixel point, where the scale may be a height and/or a width, and may be specifically determined according to an actual situation, in a scene of environment image detection, a position of a pixel point on an outermost side may be determined, and a minimum area of a detection area may be determined based on a maximum distance between pixel points on the same horizontal line or the same vertical line, and then, the minimum predicted area is multiplied by a preset coefficient to obtain an area corresponding to the area.

In step 105, the pixel proximity information may include a combination of pixels adjacent to the pixel point, which reflects a spatial relationship between the pixels, where the pixel proximity information may be pixel 4 neighborhood information, pixel diagonal neighborhood information, or pixel 8 neighborhood information, and the pixel 8 neighborhood information may be considered as information obtained by fusing the pixel 4 neighborhood information and the pixel diagonal neighborhood information, and when the pixel point is located at an image boundary, it may be considered that some neighborhood points corresponding to the pixel point fall outside the image.

In step 106, a reference area corresponding to the character object may be constructed by using the pixel proximity information, and then the character object in the foreground image is identified according to the estimated area and the reference area, that is, optionally, in some embodiments, the step "identifying the character object in the foreground image according to the estimated area and the pixel proximity information" may specifically include:

(21) Constructing a reference region corresponding to the character object in the foreground image based on the pixel proximity information corresponding to each pixel point;

(22) Recombining the pre-estimated area according to the reference area and the position of the foreground image in the image to be identified;

(23) And identifying the character object in the foreground image based on the recombined region, the estimated region and the reference region.

For example, a pixel point within a preset range may be obtained based on pixel proximity information corresponding to the pixel point, and specifically, a target pixel point corresponding to the character object may be determined, then, pixel proximity information based on the target pixel point is determined, a reference pixel point corresponding to the target pixel point is determined, and finally, based on the target pixel point and the reference pixel point corresponding thereto, a reference region corresponding to the character object is constructed in the foreground image, that is, optionally, in some embodiments, the step "constructing a reference region corresponding to the character object in the foreground image based on pixel proximity information corresponding to each pixel point" may specifically include:

(31) Determining a target pixel point corresponding to the character object;

(32) Taking the target pixel point as a center, acquiring pixel points within a preset range according to pixel proximity information corresponding to the target pixel point, and obtaining a reference pixel point corresponding to the target pixel point;

(33) And constructing a reference area corresponding to the character object in the foreground image according to the target pixel points and the reference pixel points corresponding to the target pixel points.

Based on 8 neighborhood information of the target pixel point, pixel points within a preset range (upper, lower, left, right, upper left, upper right, lower left and lower right) are obtained and serve as reference pixel points corresponding to the target pixel point, and then a reference region corresponding to the character object is constructed in the foreground image according to the target pixel point and the reference pixel points corresponding to the target pixel point.

After obtaining the reference region, the distance between the boundary of the reference region and the estimated region may be detected, then the estimated region is reconstructed according to the detection result, and then, based on the reconstructed region, the estimated region, and the reference region, the character object in the foreground image is identified, that is, optionally, in some embodiments, the step "identifying the character object in the foreground image based on the reconstructed region, the estimated region, and the reference region" may specifically include:

(41) Determining a mapping area corresponding to the pre-estimated area in the foreground image according to the position of the foreground image in the image to be identified;

(42) Respectively detecting the distance from each boundary of each reference area to the mapping area;

(43) Arranging the boundaries of the reference areas according to the detection result;

(44) And recombining the reference regions based on the relative position relationship between the mapping regions and the reference regions and the sequencing result.

Because the estimated region is an estimated result in the image to be recognized, and the foreground image is a partial image in the image to be recognized, a mapping relationship between the foreground image and the image to be recognized may be established according to a position of each pixel point in the foreground image in the image to be recognized, and then the estimated region is mapped to the foreground image according to the mapping relationship, thereby obtaining a mapping region in the foreground image, for convenience of description, the reference region is described as a rectangular region below, first, a distance from each boundary in the reference region to the mapping region may be detected, boundaries in each reference region are sequentially arranged according to a sequence of the distances from small to large, then each reference region is recombined according to relative position information between each boundary of the mapping region and the reference region, each boundary in each reference region is sequentially combined according to the sequence of the arrangement, so as to obtain a recombined reference region, for example, referring to fig. 3, the diagram includes a reference region S0, a reference region S1 and a reference region S2, and the detection results are: the distance from the right side boundary of the reference region S2 to the mapping region is 1, the distance from the right side boundary of the reference region S0 to the mapping region is 2, the distance from the right side boundary of the reference region S1 to the mapping region is 3, the distance from the left side boundary of the reference region S1 to the mapping region is 1, the distance from the left side boundary of the reference region S0 to the mapping region is 2, and the distance from the left side boundary of the reference region S2 to the mapping region is 3, at this time, the boundaries with the same rank can be combined to obtain the reorganized reference region, it should be noted that, in the present embodiment, "left" and "right" are based on the orientation or positional relationship shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the indicated device or element must have a specific orientation, be configured and operated in a specific orientation, and thus cannot be understood as a limitation to the present application.

After obtaining the recomposed region, the character objects in the foreground image may be identified based on the recomposed region, the mapping region and the reference region, for example, specifically, the intersection and parallel ratios between each reference region and the mapping region may be calculated, and the intersection and parallel ratios between each recomposed region and the mapping region may be respectively calculated, and the character objects in the foreground image may be identified based on the calculated intersection and parallel ratios, that is, optionally, in some embodiments, the step "identifying the character objects in the foreground image based on the recomposed region, the mapping region and the reference region" may specifically include:

(51) Calculating the intersection and parallel ratio between the recombined region and the mapping region, and;

(52) Calculating the intersection ratio between the reference area and the mapping area;

(53) And determining the area with the intersection ratio larger than the preset value as a first candidate area, and identifying the character object in the first candidate area.

Here, some concepts of Intersection-over-unity (IoU) are introduced, and one concept used in target detection is the overlapping rate of the generated candidate frame (i.e. the reference region or the recombined reference region) and the original mark frame (mapping region), i.e. the ratio of their Intersection to Union. The optimal situation is complete overlap, i.e. a ratio of 1.

The reorganized region is constructed based on the relative position information between the mapping region and each boundary of the region, and therefore when each boundary of the reorganized region is from a different reference region, the intersection ratio of the reorganized bounding box is lower than that of the original bounding box, and therefore, a region meeting the conditions needs to be selected from the reference region and the reorganized reference region according to the intersection ratio.

For example, if the first preset threshold is 0.5, the intersection ratio between the reference region a and the mapping region is 0.3, the intersection ratio between the reorganized region a ' and the mapping region is 0.7, the intersection ratio between the reference region B and the mapping region is 0.8, and the intersection ratio between the reorganized region B ' and the mapping region is 0.2, the reorganized region a ' and the reference region B are determined as the first candidate regions.

Further, in an area where the intersection ratio is less than or equal to the preset value, a situation that the intersection ratio of some boundaries of the areas is greater than the intersection ratio of the corresponding areas may occur, and in order to further improve the accuracy of target detection, the non-maximum suppression processing may be performed on the first candidate area according to the boundary, that is, optionally, in some embodiments, the object identification method of the present application may further include:

(61) Determining the area with the intersection ratio larger than a preset value as a second candidate area;

(62) Detecting whether the intersection ratio of each boundary in the second candidate region is larger than that of the corresponding second candidate region;

(63) Determining a boundary larger than the intersection ratio of the corresponding second candidate regions as a target boundary;

(64) Performing non-maximum inhibition processing on the first candidate region according to the target boundary to obtain a third candidate region;

optionally, therefore, in some embodiments, identifying the character object in the first candidate region comprises: character objects in the third candidate region are identified.

This completes the object recognition process of the present embodiment.

According to the object recognition method, after the image to be recognized is collected, the foreground image is segmented from the image to be recognized according to the image segmentation strategy corresponding to the image type, then, the estimated region of the character object in the image to be recognized is estimated based on the image characteristics corresponding to the image to be recognized, then, the pixel proximity information corresponding to each pixel point in the image to be recognized is obtained, finally, the character object in the foreground image is recognized according to the estimated region and the pixel proximity information.

An embodiment of the present application further provides an object identification method, where the object identification apparatus is integrated in a terminal, please refer to fig. 4, and a specific process is as follows:

step 201, the terminal collects an image to be identified.

Step 202, the terminal identifies the image type of the image to be identified as the environment image type.

And step 203, the terminal detects the character object in the image to be recognized through the photoelectric sensor.

And 204, extracting the characteristic image of the image to be identified by the terminal, and classifying the characteristic image to obtain the classification probability of each characteristic point in the characteristic image belonging to the foreground.

And step 205, the terminal constructs an error adjustment image of the characteristic image according to the classification probability.

And step 206, the terminal segments the foreground image from the image to be identified based on the error adjustment image and the characteristic image.

And step 207, estimating an estimated region of the character object in the image to be recognized by the terminal based on the image characteristics corresponding to the image to be recognized.

And step 208, the terminal acquires pixel proximity information corresponding to each pixel point in the image to be identified.

And step 209, the terminal identifies the character object in the foreground image according to the estimated area and the pixel proximity information.

In step 202, the terminal performs optical character detection on a character object in an image to be recognized through a photoelectric sensor, and when the terminal detects that the image to be recognized contains optical characters, the terminal performs the following steps, that is, performs a flow of optical character recognition; and when the optical character contained in the image to be recognized is not detected, outputting prompt information, such as 'character object not detected', and ending the process.

In step 207, the terminal may perform feature extraction on the image to be recognized by using a preset deep learning network to obtain an image feature corresponding to the image to be recognized, and then perform pixel-by-pixel detection on the image to be recognized according to the image feature. The deep learning network can be pre-trained, the terminal can collect an image sample containing a target character, a target area where the target character is located is marked on the image sample, then the terminal can classify each pixel point in the image sample according to the image characteristics of the image sample to obtain the classification result of each pixel point belonging to the target character, then the terminal can construct a construction area of the target character corresponding to each pixel point in the image sample according to the image characteristics, and then the terminal can recombine the construction area according to the relative position of each boundary of the target area and the construction area; and finally, the terminal trains the deep learning network according to the predicted region and the target region, thereby obtaining the trained deep learning network.

Optionally, please refer to fig. 5, which illustrates a deep learning network provided in the present application, which includes a first sub-network, a second sub-network, and a third sub-network, where the first sub-network may be used for feature extraction, the second sub-network may be used for up-sampling or down-sampling features extracted by the first sub-network, and the third sub-network may be used for regression analysis, where the third sub-network specifically includes a reorganization module and a fusion module, as shown in fig. 6.

A recombination module: the method is used for calculating the intersection ratio between a construction area corresponding to each pixel point and a target area, decomposing each reference area, sequencing according to the distance between each boundary of each construction area and the target image area, recombining the construction areas based on the position information and the sequencing of each boundary to obtain a recombined construction area, and then calculating the intersection ratio between the recombined construction area and the target object area, so that the edges of all the construction areas have two intersection ratios C1' and C1, and as the intersection ratio of the recombined construction area is lower than that of the original construction area, the recombined construction area larger than the original construction area can be selected for training, and the corresponding loss function can be expressed as:

wherein L is _IoU The method comprises the steps of calculating a loss regression function based on an intersection ratio, a loss between a construction region Ai and a target region Di, and a loss between a construction region Ai' and a target region Di, wherein N is the number of construction regions and/or recombined construction regions with the intersection ratio larger than a preset threshold value in each batch, and f is an indication function.

A fusion module: the module brings the boundary prediction score in the training process into the traditional non-maximum inhibition process, and takes the edge with higher quality in the area with lower score into consideration compared with the traditional non-maximum inhibition process, so that the character positioning is more accurate.

Therefore, the terminal can flexibly output the image segmentation strategy according to the image type, and character object recognition is carried out by using the parameters of different dimensions of the image to be recognized, so that the accuracy of character object recognition can be improved.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an embodiment of the object recognition apparatus of the present application, and the object recognition apparatus of the present embodiment can be implemented by using the object recognition method. The object recognition apparatus 30 of the present embodiment includes an acquisition module 301, a first recognition module 302, a segmentation module 303, an estimation module 304, an acquisition module 305, and a second recognition module 306, which are as follows:

the acquisition module 301 is configured to acquire an image to be identified.

The first identification module 302 is used for identifying the image type of the image to be identified.

And the segmentation module 303 is configured to segment a foreground image from the image to be identified according to the image segmentation policy corresponding to the image type.

The estimation module 304 is configured to estimate an estimated area of the character object in the image to be recognized based on the image feature corresponding to the image to be recognized.

An obtaining module 305, configured to obtain pixel proximity information corresponding to each pixel point in the image to be identified;

the second identifying module 306 is configured to identify a character object in the foreground image according to the estimated area and the pixel proximity information.

Optionally, in some embodiments, the segmentation module 303 may be specifically configured to: detecting the image type; when the image type is detected to be the environment image type, extracting a characteristic image of the image to be identified, and classifying the characteristic image to obtain the classification probability that each characteristic point in the characteristic image belongs to the foreground; constructing an error adjustment image of the characteristic image according to the classification probability; segmenting a foreground image from the image to be identified based on the error adjustment image and the characteristic image; when the image type is detected to be an electronic image type, binarization processing is carried out on the image to be recognized, and a foreground image is segmented from the image to be recognized on the basis of binarization pixel points of which the pixel values in the binarization image are preset pixel values.

Optionally, in some embodiments, please refer to fig. 8, where fig. 8 is a schematic structural diagram of a second identification module of an embodiment of the object identification apparatus of the present application, and the second identification module 306 may specifically include a determining unit 3061, an obtaining unit 3062, and a constructing unit 3063;

the determination unit 3061 is configured to determine a target pixel point corresponding to the character object; the obtaining unit 3062 is configured to obtain pixel points within a preset range according to pixel proximity information corresponding to the target pixel point by taking the target pixel point as a center, and obtain reference pixel points corresponding to the target pixel point; the construction unit 3063 is configured to construct a reference region corresponding to the character object in the foreground image according to the target pixel point and the reference pixel point corresponding to the target pixel point.

This completes the process of recognizing the character object by the object recognition device 30 of the present embodiment.

The specific working principle of the object recognition apparatus of this embodiment is the same as or similar to the description in the embodiment of the object recognition method, and please refer to the detailed description in the embodiment of the object recognition method.

The object recognition device of the embodiment is used for collecting an image to be recognized, segmenting a foreground image from the image to be recognized according to an image segmentation strategy corresponding to the image type, estimating an estimated region of a character object in the image to be recognized based on image characteristics corresponding to the image to be recognized, acquiring pixel proximity information corresponding to each pixel point in the image to be recognized, and recognizing the character object in the foreground image according to the estimated region and the pixel proximity information.

As used in this application, the terms "component," "module," "system," "interface," "process," and the like are generally intended to refer to a computer-related entity: hardware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a controller and the controller can be a component. One or more components can reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers.

Fig. 9 and the following discussion provide a brief, general description of an operating environment of an electronic device in which the video data transmission apparatus described herein may be implemented. The operating environment of FIG. 9 is only one example of a suitable operating environment and is not intended to suggest any limitation as to the scope of use or functionality of the operating environment. Example electronic devices 1012 include, but are not limited to, wearable devices, head-mounted devices, medical health platforms, personal computers, server computers, hand-held or laptop devices, mobile devices (such as mobile phones, personal Digital Assistants (PDAs), media players, and the like), multiprocessor systems, consumer electronics, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

Although not required, embodiments are described in the general context of "computer readable instructions" being executed by one or more electronic devices. Computer readable instructions may be distributed via computer readable media (discussed below). Computer readable instructions may be implemented as program modules, such as functions, objects, application Programming Interfaces (APIs), data structures, etc. that perform particular tasks or implement particular abstract data types. Typically, the functionality of the computer readable instructions may be combined or distributed as desired in various environments.

FIG. 9 illustrates an example of an electronic device 1012 that includes one or more embodiments of the object recognition devices of the present application. In one configuration, electronic device 1012 includes at least one processing unit 1016 and memory 1018. Depending on the exact configuration and type of electronic device, memory 1018 may be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.) or some combination of the two. This configuration is illustrated in fig. 1 by dashed line 1014.

In other embodiments, electronic device 1012 may include additional features and/or functionality. For example, device 1012 may also include additional storage (e.g., removable and/or non-removable) including, but not limited to, magnetic storage, optical storage, and the like. Such additional storage is illustrated in FIG. 9 by storage 1020. In one embodiment, computer readable instructions to implement one or more embodiments provided herein may be in storage 1020. Storage 1020 may also store other computer readable instructions to implement an operating system, an application program, and the like. Computer readable instructions may be loaded in memory 1018 for execution by processing unit 1016, for example.

The term "computer readable media" as used herein includes computer storage media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions or other data. Memory 1018 and storage 1020 are examples of computer storage media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by electronic device 1012. Any such computer storage media may be part of electronic device 1012.

Electronic device 1012 may also include communication connection(s) 1026 that allow electronic device 1012 to communicate with other devices. Communication connection(s) 1026 may include, but is not limited to, a modem, a Network Interface Card (NIC), an integrated network interface, a radio frequency transmitter/receiver, an infrared port, a USB connection, or other interfaces for connecting electronic device 1012 to other electronic devices. The communication connection 1026 may include a wired connection or a wireless connection. Communication connection(s) 1026 may transmit and/or receive communication media.

The term "computer readable media" may include communication media. Communication media typically embodies computer readable instructions or other data in a "modulated data signal" such as a carrier wave or other transport mechanism and includes any information delivery media. The term "modulated data signal" may include signals that: one or more of the signal characteristics may be set or changed in such a manner as to encode information in the signal.

Electronic device 1012 may include input device(s) 1024 such as keyboard, mouse, pen, voice input device, touch input device, infrared camera, video input device, and/or any other input device. Output device(s) 1022 such as one or more displays, speakers, printers, and/or any other output device may also be included in device 1012. Input device 1024 and output device 1022 may be connected to electronic device 1012 via a wired connection, wireless connection, or any combination thereof. In one embodiment, an input device or an output device from another electronic device may be used as input device 1024 or output device 1022 for electronic device 1012.

Components of electronic device 1012 may be connected by various interconnects, such as a bus. Such interconnects may include Peripheral Component Interconnect (PCI), such as PCI express, universal Serial Bus (USB), firewire (IEEE 13104), optical bus structures, and so forth. In another embodiment, components of electronic device 1012 may be interconnected by a network. For example, memory 1018 may be comprised of multiple physical memory units located in different physical locations interconnected by a network.

Those skilled in the art will realize that storage devices utilized to store computer readable instructions may be distributed across a network. For example, electronic device 1030 accessible via network 1028 may store computer readable instructions to implement one or more embodiments provided herein. Electronic device 1012 may access electronic device 1030 and download a part or all of the computer readable instructions for execution. Alternatively, electronic device 1012 may download pieces of the computer readable instructions, as needed, or some instructions may be executed at electronic device 1012 and some at electronic device 1030.

Various operations of embodiments are provided herein. In one embodiment, the one or more operations may constitute computer readable instructions stored on one or more computer readable media, which when executed by an electronic device, will cause the computing device to perform the operations. The order in which some or all of the operations are described should not be construed as to imply that these operations are necessarily order dependent. Those skilled in the art will appreciate alternative orderings having the benefit of this description. Moreover, it should be understood that not all operations are necessarily present in each embodiment provided herein.

Also, although the disclosure has been shown and described with respect to one or more implementations, equivalent alterations and modifications will occur to others skilled in the art based upon a reading and understanding of this specification and the annexed drawings. The present disclosure includes all such modifications and alterations, and is limited only by the scope of the appended claims. In particular regard to the various functions performed by the above described components (e.g., elements, resources, etc.), the terms used to describe such components are intended to correspond, unless otherwise indicated, to any component which performs the specified function of the described component (e.g., that is functionally equivalent), even though not structurally equivalent to the disclosed structure which performs the function in the herein illustrated exemplary implementations of the disclosure. In addition, while a particular feature of the disclosure may have been disclosed with respect to only one of several implementations, such feature may be combined with one or more other features of the other implementations as may be desired and advantageous for a given or particular application. Furthermore, to the extent that the terms "includes," has, "" contains, "or variants thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term" comprising.

Each functional unit in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer-readable storage medium. The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Each apparatus or system described above may perform the method in the corresponding method embodiment.

In summary, although the present application has been disclosed with the embodiments, the numbers before the embodiments are used for convenience of description, and the sequence of the embodiments of the present application is not limited. Furthermore, the above embodiments are not intended to limit the present application, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the present application, so that the scope of the present application is defined by the appended claims.

Claims

1. An object recognition method, comprising:

collecting an image to be identified;

identifying the image type of the image to be identified;

segmenting a foreground image from the image to be identified according to an image segmentation strategy corresponding to the image type;

based on the image characteristics corresponding to the image to be recognized, estimating an estimated area of the character object in the image to be recognized;

2. The method of claim 1, wherein identifying the character object in the foreground image according to the estimated area and the pixel proximity information comprises:

3. The method according to claim 2, wherein the reconstructing the pre-estimated region according to the positions of the reference region and the foreground image in the image to be recognized comprises:

the recognizing the character object in the foreground image based on the reconstructed region, the pre-estimated region and the reference region includes: and identifying the character object in the foreground image based on the recombined region, the mapping region and the reference region.

4. The method of claim 3, wherein identifying the character object in the foreground image based on the reorganized region, the mapping region, and the reference region comprises:

calculating the intersection and parallel ratio between the recombined region and the mapping region, and;

5. The method of claim 4, further comprising:

6. The method according to claim 2, wherein the constructing a reference region corresponding to the character object in the foreground image based on the pixel proximity information corresponding to each pixel point comprises:

determining a target pixel point corresponding to the character object;

taking a target pixel point as a center, acquiring pixel points within a preset range according to pixel proximity information corresponding to the target pixel point, and obtaining a reference pixel point corresponding to the target pixel point;

and constructing a reference area corresponding to the character object in the foreground image according to the target pixel points and the reference pixel points corresponding to the target pixel points.

7. The method according to any one of claims 1 to 6, wherein the segmenting a foreground image from the image to be identified according to the image segmentation strategy corresponding to the image type includes:

detecting the image type;

when the image type is detected to be the environment image type, extracting the characteristic image of the image to be identified, and carrying out classification processing on the characteristic image to obtain the classification probability that each characteristic point in the characteristic image belongs to the foreground; constructing an error adjustment image of the feature image according to the classification probability; segmenting a foreground image from the image to be identified based on the error adjustment image and the characteristic image;

8. An object recognition apparatus, comprising:

the acquisition module is used for acquiring an image to be identified;

9. An electronic device comprising a photosensor, a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the object recognition method according to any one of claims 1-7 are implemented when the program is executed by the processor.

10. A storage medium having stored therein processor-executable instructions, which are loaded by one or more processors to perform the object recognition method of any one of claims 1 to 7.