CN116385570A

CN116385570A - Image processing method, device, equipment and storage medium

Info

Publication number: CN116385570A
Application number: CN202310440147.1A
Authority: CN
Inventors: 潘蓝根; 余静; 李冠彬; 胡奕豪
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-04-21
Filing date: 2023-04-21
Publication date: 2023-07-04

Abstract

The present disclosure provides an image processing method, apparatus, device, and storage medium, which can be applied to the field of artificial intelligence and the field of big data. The method comprises the following steps: performing target detection on the image to be processed to obtain a detection frame image corresponding to the object to be identified in the image to be processed; image coding is carried out on the detection frame image based on the characteristic image word bag model, so that detection frame image coding is obtained, wherein the characteristic image word bag model is constructed according to a sample characteristic element image extracted from a sample image; matching the detection block image code with a sample image code corresponding to the sample image to obtain a matching result; and determining the object to be identified in the image to be processed as a target object according to the matching result.

Description

Image processing method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence and big data, and more particularly, to an image processing method, apparatus, device, medium, and program product.

Background

Along with the rapid development of technology, image-based target object recognition technology is widely applied to application scenes such as electronic commerce, robot automatic operation and the like. For example, in a scene such as an online shopping platform or robot automatic execution, it is generally necessary to detect an acquired image to identify object attribute information such as a category of a target object in the acquired image.

The inventor finds that the efficiency of image detection in the related art is lower, the calculation cost is larger, the detection accuracy is lower, and the practical requirement is difficult to meet.

Disclosure of Invention

In view of the above, the present disclosure provides image processing methods, apparatuses, devices, media, and program products.

According to a first aspect of the present disclosure, there is provided an image processing method including:

performing target detection on an image to be processed to obtain a detection frame image corresponding to an object to be identified in the image to be processed;

image coding is carried out on the detection frame image based on a characteristic image word bag model to obtain a detection frame image code, wherein the characteristic image word bag model is constructed according to a sample characteristic element image extracted from a sample image;

matching the detection block image code with a sample image code corresponding to the sample image to obtain a matching result; and

and determining the object to be identified in the image to be processed as a target object according to the matching result.

According to an embodiment of the present disclosure, the image processing method further includes:

detecting sample characteristic elements of the sample image to obtain a sample characteristic element detection frame;

Determining the sample characteristic element image according to the sample characteristic element detection frame and the sample image; and

and constructing the feature image bag-of-word model according to the sample feature element image.

According to an embodiment of the present disclosure, the sample feature element image includes a plurality of;

wherein, the constructing the feature image bag-of-word model according to the sample feature element image comprises:

processing a plurality of sample characteristic element images based on a clustering algorithm to obtain a clustering center element image unit;

determining characteristic image word codes according to the clustering center element image units; and

and constructing the feature image word bag model according to the feature image word codes.

According to an embodiment of the present disclosure, the processing the plurality of sample feature element images based on the clustering algorithm to obtain a clustering center element image unit includes:

dividing each sample characteristic element image based on a preset rule to obtain a plurality of sample characteristic element image units;

and processing a plurality of sample characteristic element image units based on the clustering algorithm to obtain the clustering center element image unit.

Determining the sample characteristic element image as a sample characteristic element image unit; and

performing image coding on the sample image according to the characteristic image word code corresponding to the characteristic image word bag model to obtain the sample image code;

wherein the matching the detection block image code with the sample image code corresponding to the sample image to obtain a matching result includes:

determining a preset distance between the detection frame image code and the sample image code;

and determining the matching result according to the preset distance.

According to an embodiment of the present disclosure, the preset distance includes at least one of:

hamming distance, euclidean distance.

According to an embodiment of the present disclosure, performing object detection on the image to be processed to obtain a detection frame image corresponding to an object to be identified in the image to be processed includes:

inputting the image to be processed into a target detection model, and outputting a target detection frame corresponding to the object to be identified; and

And determining a detection frame image corresponding to the target detection frame from the images to be processed according to the target detection frame.

According to an embodiment of the present disclosure, the object detection model includes an image feature extraction layer and a detection frame output layer;

the inputting the image to be processed into the target detection model, and outputting the target detection frame corresponding to the object to be identified includes:

inputting the image to be processed into the image feature extraction layer, and outputting the image feature to be processed, wherein the image feature extraction layer is constructed based on a feature pyramid network algorithm; and

and inputting the image characteristics to be processed into the detection frame output layer, and outputting the target detection frame.

A second aspect of the present disclosure provides an image processing apparatus including:

the detection frame image acquisition module is used for carrying out target detection on the image to be processed to obtain a detection frame image corresponding to the object to be identified in the image to be processed;

the detection frame image coding obtaining module is used for carrying out image coding on the detection frame image based on a characteristic image word bag model to obtain a detection frame image coding, wherein the characteristic image word bag model is obtained by constructing a sample characteristic element image extracted from a sample image;

The matching result obtaining module is used for matching the detection block image code with the sample image code corresponding to the sample image to obtain a matching result; and

and the target object identification module is used for determining the object to be identified in the image to be processed as a target object according to the matching result.

A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method described above.

A fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described method.

A fifth aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above method.

According to the image processing method, the device, the equipment, the medium and the program product provided by the disclosure, the detection frame image corresponding to the object to be identified is obtained, and the characteristic image bag-of-word model constructed according to the sample characteristic element image is utilized to encode the detection frame image, so that the image word code related to the sample characteristic element image in the characteristic image bag-of-word model characterizes the detection frame image code, and the similarity between the detection frame image and the sample characteristic element image can be characterized more accurately by the detection frame image code. Therefore, the matching result is obtained after the detection frame image code and the sample image code are matched, the similarity degree between the object to be identified in the detection frame image and the object to be matched in the sample image can be more accurately represented, and the object to be matched, which is matched with the object to be identified, in the sample image can be accurately identified according to the matching result. The target object can be represented by the object to be matched with the object to be identified, and the accuracy of identifying the target object is improved.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates an application scenario diagram of an image processing method, apparatus according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of an image processing method according to an embodiment of the disclosure;

FIG. 3 schematically illustrates a flow chart of an image processing method according to another embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart for constructing a feature image bag-of-word model from a sample feature element image, in accordance with an embodiment of the present disclosure;

FIG. 5 schematically illustrates an application scenario diagram of an image processing method according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a schematic diagram of detection frame image encoding and sample image encoding according to an embodiment of the present disclosure;

fig. 7 schematically shows a block diagram of the structure of an image processing apparatus according to an embodiment of the present disclosure;

fig. 8 schematically illustrates a block diagram of an electronic device adapted to implement an image processing method according to an embodiment of the disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

In the technical scheme of the disclosure, the related data (such as including but not limited to personal information of a user) are collected, stored, used, processed, transmitted, provided, disclosed, applied and the like, all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public welcome is not violated.

The image retrieval may be a technique of retrieving whether there is a match of an image to be matched with a currently retrieved target image from a matching retrieval library containing images to be matched on the basis of the target image. The technology can be widely applied to application scenes such as appointed target search, commodity search in network electronic commerce and the like.

In addition, the image retrieval technology can be applied to the auxiliary robot to judge whether the acquired image has the known target object or not. The robot can be assisted in processing a plurality of daily tasks through the image retrieval system, such as processing tasks of similar image judgment, target object searching and the like. The image retrieval system is commonly referred to as content-based image retrieval (Content Based Image Retrieval).

The inventors creatively found that, unlike text-based image retrieval, the factor affecting the efficiency of image retrieval may be the feature extraction and image feature matching process of the image. Typical image retrieval methods use feature matching between the query image and the target image encoding features to retrieve images of similar categories to the image to be processed. Because of the influence of various factors such as illumination and scale of the image, the global feature is difficult to describe an image, and the image retrieval efficiency and precision are low, so that the image retrieval method is difficult to widely use on the embedded terminal.

Embodiments of the present disclosure provide an image processing method, apparatus, device, medium, and program product, the method comprising: performing target detection on the image to be processed to obtain a detection frame image corresponding to the object to be identified in the image to be processed; image coding is carried out on the detection frame image based on the characteristic image word bag model, so that detection frame image coding is obtained, wherein the characteristic image word bag model is constructed according to a sample characteristic element image extracted from a sample image; matching the detection block image code with a sample image code corresponding to the sample image to obtain a matching result; and determining the object to be identified in the image to be processed as a target object according to the matching result.

According to the embodiment of the disclosure, the detection frame image corresponding to the object to be identified is obtained, and the feature image bag-of-word model constructed according to the sample feature element image is utilized to encode the detection frame image, so that the image word code related to the sample feature element image in the feature image bag-of-word model characterizes the detection frame image code, and the similarity between the detection frame image and the sample feature element image can be characterized more accurately by the detection frame image code. Therefore, the matching result is obtained after the detection frame image code and the sample image code are matched, the similarity degree between the object to be identified in the detection frame image and the object to be matched in the sample image can be more accurately represented, and the object to be matched, which is matched with the object to be identified, in the sample image can be accurately identified according to the matching result. The target object can be represented by the object to be matched with the object to be identified, and the accuracy of identifying the target object is improved.

Fig. 1 schematically illustrates an application scenario diagram of an image processing method, apparatus according to an embodiment of the present disclosure.

As shown in fig. 1, an application scenario 100 according to this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is a medium used to provide a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 through the network 104 using at least one of the first terminal device 101, the second terminal device 102, the third terminal device 103, to receive or send messages, etc. Various communication client applications, such as a shopping class application, a web browser application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc. (by way of example only) may be installed on the first terminal device 101, the second terminal device 102, and the third terminal device 103.

The first terminal device 101, the second terminal device 102, the third terminal device 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by the user using the first terminal device 101, the second terminal device 102, and the third terminal device 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that, the image processing method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the image processing apparatus provided by the embodiments of the present disclosure may be generally provided in the server 105. The image processing method provided by the embodiment of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105. Accordingly, the image processing apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

The image processing method of the disclosed embodiment will be described in detail below with reference to fig. 2 to 6 based on the scene described in fig. 1.

Fig. 2 schematically shows a flowchart of an image processing method according to an embodiment of the present disclosure.

As shown in fig. 2, the image processing method of this embodiment includes operations S210 to S240.

In operation S210, object detection is performed on the image to be processed, and a detection frame image corresponding to the object to be identified in the image to be processed is obtained.

According to an embodiment of the present disclosure, the image to be processed may be an image of a photograph, video frame, or the like in which the object to be recognized is recorded. The image to be processed can be an image obtained by image acquisition of an object to be identified, such as a camera, on a key and a knapsack. The object to be identified may be any type of object, such as a key, a bicycle, or a pet cat. Embodiments of the present disclosure do not limit the specific type of object to be identified.

According to the embodiment of the disclosure, the image to be processed can be subjected to target detection according to the target detection model constructed by the target detection algorithm, for example, the image to be processed can be subjected to target detection based on the target detection model constructed by the convolutional neural network, so that a target detection frame is generated in the image to be processed, and further, the detection frame image can be determined according to the image area corresponding to the target detection frame in the image to be processed.

It should be noted that, the embodiment of the present disclosure does not limit a specific manner of target detection, as long as a detection frame image related to an object to be identified can be generated.

In operation S220, the detection frame image is image-coded based on the feature image bag-of-word model, to obtain a detection frame image code, wherein the feature image bag-of-word model is constructed from a sample feature element image extracted from a sample image.

According to embodiments of the present disclosure, the sample image may be an image related to the object to be matched, such as an image recorded with a key, a photograph of the object the bicycle waits for matching, a video frame, and the like. The sample feature element image may be a partial image of the sample image that is capable of characterizing feature elements of the object to be matched. For example, in the case where the object to be matched in the sample image is an automobile, the sample feature element image may be a partial image representing a feature element such as an automobile wheel, an automobile window, an automobile lamp, or the like.

According to the embodiment of the disclosure, the feature image bag-of-word model is constructed by extracting the sample feature element image from the sample image, the feature element image logograms are generated according to the sample feature element image, and the feature element image logograms corresponding to the feature image bag-of-word model are constructed according to the feature element image logograms. Therefore, the detection frame image can be subjected to image coding according to the characteristic element image logograms in the characteristic element image logograms library, and the detection frame image coding is obtained.

In operation S230, the detection block image code is matched with the sample image code corresponding to the sample image to obtain a matching result.

In operation S240, the object to be identified in the image to be processed is determined as the target object according to the matching result.

According to an embodiment of the disclosure, the sample image encoding may be obtained by encoding part or all of the sample image based on the feature element image logon corresponding to the feature image bag-of-word model. The detection frame image coding and the sample image coding can be processed based on similarity algorithms such as cosine similarity algorithm and the like, so that a matching result is obtained. The matching result may characterize a degree of similarity between the object to be identified in the image to be processed and the object to be matched in the sample image. Accordingly, an object to be matched with the object to be identified can be obtained according to the matching result, and the object to be matched can be determined as a target object corresponding to the object to be identified.

According to an embodiment of the present disclosure, performing object detection on an image to be processed, obtaining a detection frame image corresponding to an object to be identified in the image to be processed may include the following operations.

Inputting the image to be processed into a target detection model, and outputting a target detection frame corresponding to the object to be identified; and determining a detection frame image corresponding to the target detection frame from the image to be processed according to the target detection frame.

According to embodiments of the present disclosure, the object detection model may be constructed based on neural network calculations. For example, the target detection model may be constructed based on a region candidate network (Region Proposal Network) algorithm. Or the target detection model can be constructed based on other types of neural network algorithms, the specific algorithm type for constructing the target detection model is not limited in the embodiment of the disclosure, and a person skilled in the art can select according to actual requirements.

According to the embodiment of the disclosure, the target detection frame can represent the object to be identified in the image to be processed, the detection frame image is determined from the image to be processed through the target detection frame, the object to be identified can be represented more completely and accurately by the detection frame image, meanwhile, the image area with lower correlation with the object to be identified in the image to be processed is effectively removed, the subsequent generation of the detection frame image code is reduced, and the calculation cost and the calculation time length of the detection frame image code and the sample image code are compared, so that the image processing method provided by the embodiment of the disclosure is more suitable for terminal equipment with lower calculation cost.

According to an embodiment of the present disclosure, an object detection model includes an image feature extraction layer and a detection frame output layer.

In the above operation, inputting the image to be processed into the target detection model, and outputting the target detection frame corresponding to the object to be identified includes:

inputting an image to be processed into an image feature extraction layer, and outputting the image feature to be processed, wherein the image feature extraction layer is constructed based on a feature pyramid network algorithm; and inputting the image characteristics to be processed into a detection frame output layer, and outputting a target detection frame.

According to embodiments of the present disclosure, a feature pyramid network (FPN, feature Pyramid Networks) algorithm may be combined with a region candidate network algorithm to construct an image feature extraction layer of the target detection model.

For example, the image feature extraction layer may be constructed based on the GA-RPN (Region Proposal by Guided Anchoring) algorithm and the feature pyramid network algorithm. And carrying out multi-scale feature map use on the image to be processed based on a feature pyramid network algorithm. The image features to be processed with enhanced semantic feature information are obtained by combining a bottom-up and top-down method, so that the feature pyramid network algorithm effectively improves the performance of target detection and instance segmentation. Accordingly, the image feature extraction layer is jointly constructed based on the GA-RPN algorithm, and the image features to be processed can be used to guide the generation of anchors. Sparse and arbitrary anchors are generated by predicting the location and shape of the anchors, and feature adaptation modules are designed to modify the feature map to exactly match the anchors. The generation precision of the target detection frame can be effectively improved by utilizing the mechanism of guiding the anchor frame (target detection frame) by the extracted intermediate image features. Thereby overcoming the need in the related art to predefine dense and preset width anchors. While the shape of the predefined anchors (detection boxes) may not meet the actual size requirements, it is difficult to characterize the technical problem of objects to be identified having a large aspect ratio.

Fig. 3 schematically shows a flowchart of an image processing method according to another embodiment of the present disclosure.

As shown in fig. 3, the image processing method may further include operations S310 to S330.

In operation S310, sample feature element detection is performed on the sample image, so as to obtain a sample feature element detection frame.

In operation S320, a sample feature element image is determined from the sample feature element detection frame and the sample image.

In operation S330, a feature image bag-of-word model is constructed from the sample feature element image.

According to embodiments of the present disclosure, sample feature element detection may be performed on sample images based on a trained deep learning model. Sample feature element detection can be performed on a sample image based on a deep learning model constructed by a convolutional neural network model, for example. Or sample feature element detection may also be performed on the sample image based on the target detection model in the above embodiment.

According to the embodiment of the disclosure, the feature image word bag model is constructed according to the sample feature element image, and the feature image word coding library can be constructed according to the sample feature element image. The feature image word coding library can contain feature image word codes (feature element image word symbols) for characterizing sample feature element images, so that corresponding feature image word bag models can be obtained according to the feature image word codes. Accordingly, the sample image may determine a corresponding sample image code based on the feature image bag-of-word model.

According to embodiments of the present disclosure, the number of image encodings needs to be continually expanded as query images (e.g., images to be processed) increase, relative to the resulting visual bag-of-word models that are typically constructed, typically in an exponential scale to increase the feature dimension of image word encodings. The image processing provided by the embodiment of the disclosure constructs the feature image word coding library based on the sample feature element image, so that the feature image word coding can more accurately represent the image semantic feature information of the object to be identified in the sample image. Therefore, the characteristic image word codes in the characteristic image word code library can improve the generalization expression capability of sample characteristic elements aiming at different objects to be identified, the characteristic image word codes can meet the generalization capability of representing different types of objects to be identified, and the detection block image codes can more accurately represent the image semantic characteristic information corresponding to the objects to be identified in the sample images.

In addition, after the feature image bag-of-word model is constructed, the data scale of the feature image word coding library does not need to be expanded, so that the calculation cost and the coding time for coding the detection block images to obtain the detection block image codes are reduced, and the image processing efficiency is improved. Meanwhile, the technical problem that the data size of the detection block image codes is continuously increased can be avoided, the calculation cost for matching the subsequent detection block image codes with the sample image codes is reduced, and the overall efficiency of target object identification is improved.

According to an embodiment of the present disclosure, the sample feature element image includes a plurality of.

Fig. 4 schematically illustrates a flow chart of constructing a feature image bag-of-word model from a sample feature element image, according to an embodiment of the disclosure.

As shown in fig. 4, constructing a feature image bag-of-word model from a sample feature element image in operation S330 may include operations S410 to S430.

In operation S410, a plurality of sample feature element images are processed based on a clustering algorithm to obtain a cluster center element image unit.

In operation S420, feature image word encoding is determined from the cluster center element image units.

In operation S430, a feature image bag-of-word model is constructed from the feature image word code.

According to the embodiment of the disclosure, the obtained clustering center element image unit can be used for representing the sample characteristic element images in the same clustering cluster by processing a plurality of sample characteristic element images based on a clustering algorithm. Thus, the characteristic image word codes are determined according to the clustering center element image units, sample characteristic element images in the same cluster can be characterized through the same characteristic image word codes, the characteristic image word codes can be ensured to accurately represent the image semantic information of sample characteristic elements in the sample characteristic element images, and meanwhile, the word code data scale of the characteristic image word codes is reduced, so that the calculation cost of the subsequent determination of the detection frame image codes is reduced, the calculation cost of matching the subsequent detection frame image codes with the sample image codes is further reduced, and the overall efficiency of target object identification is improved.

According to the embodiment of the disclosure, the clustering algorithm may include a K-means clustering algorithm, but is not limited thereto, and may also include other types of clustering algorithms, and the embodiment of the disclosure does not limit the specific type of the clustering algorithm, and a person skilled in the art may select according to actual needs.

In accordance with an embodiment of the present disclosure, processing the plurality of sample feature element images based on the clustering algorithm to obtain a cluster center element image unit may include the following operations in operation S410.

Determining a sample characteristic element image as a sample characteristic element image unit; and processing the plurality of sample characteristic element image units based on a clustering algorithm to obtain a clustering center element image unit.

Dividing each sample characteristic element image based on a preset rule to obtain a plurality of sample characteristic element image units; and processing the plurality of sample characteristic element image units based on a clustering algorithm to obtain a clustering center element image unit.

According to an embodiment of the present disclosure, the sample feature element image may be segmented based on a preset rule, the sample feature element image may be segmented based on a grid of a preset size, or the segmentation may also be performed based on a feature region portion of the sample feature element image. The embodiment of the present disclosure does not limit specific preset rules for classifying the sample feature element images, and those skilled in the art may select according to actual requirements.

According to the embodiment of the disclosure, the sample characteristic element images can be characterized in a finer granularity by the plurality of sample characteristic element image units obtained after segmentation, so that the fineness of the image semantic information of the sample characteristic elements represented by the clustering center element image units is enhanced. Therefore, the feature image bag-of-word model constructed according to the clustering center element image unit can further realize fine granularity coding of the sample image and the detection frame image, improve the accuracy of the subsequent matching result and improve the image processing precision.

According to an embodiment of the present disclosure, the image processing method may further include the following operations.

And carrying out image coding on the sample image according to the characteristic image word coding corresponding to the characteristic image word bag model to obtain sample image coding.

According to an embodiment of the present disclosure, in a case where the feature image word encoding includes a plurality of, a plurality of sample image units may be determined from the sample image, for example, the sample image may be divided based on a mesh of a preset size, resulting in the sample image units. And obtaining the sample target characteristic image word code which is most similar to (or closest to) the sample image unit by calculating the similarity between each sample image unit and each characteristic image word code. Thus, the sample image codes can be determined according to the respective corresponding sample target characteristic image word codes of each sample image unit.

In operation S220, matching the detection frame image code with the sample image code corresponding to the sample image to obtain a matching result may include: determining a preset distance between the detection block image code and the sample image code; and determining a matching result according to the preset distance.

According to embodiments of the present disclosure, the preset distance may characterize a degree of similarity between the detection block image encoding and the sample image encoding. Therefore, the similarity between the detection frame image and the sample image can be further represented according to the matching result obtained by the preset distance, so that the sample image matched with the detection frame image can be determined according to the matching result, the object to be matched with the object to be identified in the detection frame image can be accurately determined, and the target object in the image to be processed can be accurately identified.

In another embodiment of the present disclosure, the sample image may be further subjected to target detection to obtain a sample detection frame image representing the object to be matched, and the sample detection frame image is encoded by the feature image word encoding to obtain a sample image encoding. Therefore, the image area with lower correlation with the object to be matched in the sample image can be filtered, the data size of the sample image code is reduced, the matching precision of the subsequent matching result is further improved, and the calculation cost for obtaining the matching result is reduced.

According to an embodiment of the present disclosure, the preset distance comprises at least one of:

hamming distance, euclidean distance.

According to the embodiment of the disclosure, the hamming distance can be determined as the preset distance, and the calculation speed of the matching result can be increased by calculating the hamming distance between the detection frame image code and the sample image code, so that the calculation complexity is reduced, and the processing efficiency of the image processing method provided by the embodiment of the disclosure is further improved.

Fig. 5 schematically illustrates an application scenario diagram of an image processing method according to an embodiment of the present disclosure.

As shown in fig. 5, an image retrieval module 500 may be included in the application scenario. The image retrieval module 500 may include a sample image encoding library 510, a target detection model 520, a feature image bag of words model 530, and a match detection model 540.

The sample image encoding library 510 may include sample image encodings corresponding to N sample images, where N is a positive integer. For example, in the case of n=3, the sample image coding library 510 may include

sample image codes

511, 512, and 513.

The object detection model 520 may be constructed based on the GA-RPN (Region Proposal by Guided Anchoring) algorithm and the feature pyramid network algorithm. By inputting the image to be processed 501 to the target detection model 520, a detection block image 5011 can be output. The detection block image 5011 may record an object to be recognized in the image to be processed 501.

The detection frame image 5011 is input to the feature image bag-of-word model 530, and the detection frame image 5011 can be encoded according to the feature image word encoding corresponding to the feature image bag-of-word model 530, and the detection frame image encoding 5012 can be output.

It should be appreciated that the sample image encodings 511, 512, and 513 may also be based on the feature image bag of words model 530 encoding the corresponding sample image.

The sample image code 511 and the detection frame image code 5012 are input to the matching degree detection model 540, and the matching degree detection model 540 may calculate the hamming distance between the sample image code 511 and the detection frame image code 5012 and output the calculated hamming distance as the first matching result 551.

Accordingly, the sample image code 512 and the detection frame image code 5012 are input to the matching degree detection model 540, and the matching degree detection model 540 may calculate the hamming distance between the sample image code 512 and the detection frame image code 5012 and output the calculated hamming distance as the second matching result 552. The sample image code 513 and the detection frame image code 5012 are input to the matching degree detection model 540, and the matching degree detection model 540 may calculate the hamming distance between the sample image code 513 and the detection frame image code 5012 and output the calculated hamming distance as the third matching result 553.

By determining that the matching result representing the smallest hamming distance among the first matching result 551, the second matching result 552, and the third matching result 553 is the first matching result 551, it is possible to determine that the sample image corresponding to the sample image code 511 is an image matching the image to be processed 501, based on the first matching result 551. In this way, the object to be identified in the image to be processed 501 can be matched with the object to be matched corresponding to the sample image code 511, so as to identify the target object in the image to be processed 501.

Fig. 6 schematically illustrates a schematic diagram of detection frame image encoding and sample image encoding according to an embodiment of the present disclosure.

As shown in fig. 6, a first histogram 610 may represent detection block image encoding and a second histogram 620 may represent sample image encoding. By calculating the preset distance between the first histogram 610 and the second histogram 620, a matching result between the image to be processed and the sample image can be obtained.

In an embodiment of the present disclosure, a device capable of executing the image processing method provided by the embodiment of the present disclosure may be provided in a mobile terminal of a user, so that a target sample image matched with an image to be processed that needs to be queried by the user may be searched by using the image processing method provided by the embodiment of the present disclosure, and then a target object in the image to be processed is identified.

Based on the image processing method, the disclosure also provides an image processing device. The device will be described in detail below in connection with fig. 7.

Fig. 7 schematically shows a block diagram of the structure of an image processing apparatus according to an embodiment of the present disclosure.

As shown in fig. 7, the image processing apparatus 700 of this embodiment includes a detection frame image obtaining module 710, a detection frame image encoding obtaining module 720, a matching result obtaining module 730, and a target object identifying module 740.

The detection frame image obtaining module 710 is configured to perform target detection on an image to be processed, so as to obtain a detection frame image corresponding to an object to be identified in the image to be processed.

The detection frame image encoding obtaining module 720 is configured to perform image encoding on the detection frame image based on a feature image bag-of-word model to obtain a detection frame image encoding, where the feature image bag-of-word model is constructed according to a sample feature element image extracted from a sample image.

The matching result obtaining module 730 is configured to match the detection frame image code with the sample image code corresponding to the sample image to obtain a matching result.

The target object recognition module 740 is configured to determine an object to be recognized in the image to be processed as a target object according to the matching result.

According to an embodiment of the present disclosure, the image processing apparatus further includes: the device comprises a sample characteristic element detection frame obtaining module, a sample characteristic element image determining module and a characteristic image word bag model constructing module.

The sample characteristic element detection frame obtaining module is used for detecting sample characteristic elements of the sample image to obtain a sample characteristic element detection frame.

The sample characteristic element image determining module is used for determining a sample characteristic element image according to the sample characteristic element detection frame and the sample image.

The feature image word bag model building module is used for building a feature image word bag model according to the sample feature element images.

The feature image bag-of-word model building module comprises: the clustering center element image unit obtains a sub-module, the characteristic image word coding obtains a sub-module and the characteristic image word bag model constructs a sub-module.

The clustering center element image unit obtaining submodule is used for processing a plurality of sample characteristic element images based on a clustering algorithm to obtain a clustering center element image unit.

The characteristic image word code obtaining sub-module is used for determining characteristic image word codes according to the clustering center element image units.

The feature image word bag model construction submodule is used for constructing a feature image word bag model according to feature image word codes.

According to an embodiment of the present disclosure, a cluster center element image unit obtaining submodule includes: a sample characteristic element image unit obtaining unit and a clustering center element image unit obtaining unit.

The segmentation unit is used for segmenting each sample characteristic element image based on a preset rule to obtain a plurality of sample characteristic element image units.

The first clustering unit is used for processing the plurality of sample characteristic element image units based on a clustering algorithm to obtain a clustering center element image unit.

According to an embodiment of the present disclosure, a cluster center element image unit obtaining submodule includes:

the sample feature element image unit determination unit is configured to determine a sample feature element image as a sample feature element image unit.

The second clustering unit is used for processing the plurality of sample characteristic element image units based on a clustering algorithm to obtain a clustering center element image unit.

According to an embodiment of the present disclosure, the image processing apparatus further includes a sample image encoding obtaining module.

The sample image coding obtaining module is used for carrying out image coding on the sample image according to the characteristic image word coding corresponding to the characteristic image word bag model to obtain sample image coding.

The matching result obtaining module comprises: a preset distance determining sub-module and a matching result determining sub-module.

The preset distance determining submodule is used for determining a preset distance between the detection block image code and the sample image code.

And the matching result determining submodule determines a matching result according to the preset distance.

hamming distance, euclidean distance.

According to an embodiment of the present disclosure, a detection frame image obtaining module includes: the device comprises a target detection frame obtaining sub-module and a detection frame image obtaining sub-module.

The target detection frame obtaining submodule is used for inputting the image to be processed into the target detection model and outputting a target detection frame corresponding to the object to be identified.

The detection frame image obtaining sub-module is used for determining a detection frame image corresponding to the target detection frame from the image to be processed according to the target detection frame.

Wherein, the target detection frame obtains submodule includes: an image feature extraction unit and a target detection frame acquisition unit.

The image feature extraction unit is used for inputting the image to be processed into the image feature extraction layer and outputting the image feature to be processed, wherein the image feature extraction layer is constructed based on the feature pyramid network algorithm.

The target detection frame obtaining unit is used for inputting the image characteristics to be processed into the detection frame output layer and outputting the target detection frame.

According to an embodiment of the present disclosure, any of the detection frame image obtaining module 710, the detection frame image encoding obtaining module 720, the matching result obtaining module 730, and the target object identifying module 740 may be combined in one module to be implemented, or any one of the modules may be split into a plurality of modules. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. According to embodiments of the present disclosure, at least one of the detection frame image acquisition module 710, the detection frame image encoding acquisition module 720, the matching result acquisition module 730, and the target object identification module 740 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or as hardware or firmware in any other reasonable manner of integrating or packaging the circuitry, or as any one of or a suitable combination of any of the three. Alternatively, at least one of the detection frame image obtaining module 710, the detection frame image encoding obtaining module 720, the matching result obtaining module 730, and the target object identifying module 740 may be at least partially implemented as a computer program module, which when executed, may perform the corresponding functions.

As shown in fig. 8, an electronic device 800 according to an embodiment of the present disclosure includes a processor 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 801 may also include on-board memory for caching purposes. The processor 801 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the disclosure.

In the RAM 803, various programs and data required for the operation of the electronic device 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. The processor 801 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 802 and/or the RAM 803. Note that the program may be stored in one or more memories other than the ROM 802 and the RAM 803. The processor 801 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

According to an embodiment of the present disclosure, the electronic device 800 may also include an input/output (I/O) interface 805, the input/output (I/O) interface 805 also being connected to the bus 804. The electronic device 800 may also include one or more of the following components connected to an input/output (I/O) interface 805: an input portion 806 including a keyboard, mouse, etc.; an output portion 807 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to an input/output (I/O) interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.

The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 802 and/or RAM 803 and/or one or more memories other than ROM 802 and RAM 803 described above.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. The program code, when executed in a computer system, causes the computer system to perform the methods provided by embodiments of the present disclosure.

The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 801. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed, and downloaded and installed in the form of a signal on a network medium, and/or from a removable medium 811 via a communication portion 809. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 801. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.

The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims

1. An image processing method, comprising:

Image coding is carried out on the detection frame image based on a characteristic image word bag model to obtain detection frame image coding, wherein the characteristic image word bag model is constructed according to a sample characteristic element image extracted from a sample image;

2. The method of claim 1, further comprising:

3. The method of claim 2, wherein the sample feature element image comprises a plurality of;

4. A method according to claim 3, wherein said processing a plurality of said sample feature element images based on a clustering algorithm to obtain a cluster center element image unit comprises:

5. A method according to claim 3, wherein said processing a plurality of said sample feature element images based on a clustering algorithm to obtain a cluster center element image unit comprises:

6. A method according to claim 3, further comprising:

according to the feature image word codes corresponding to the feature image word bag models, carrying out image coding on the sample images to obtain sample image codes;

Wherein, the matching the detection block image code with the sample image code corresponding to the sample image to obtain a matching result includes:

and determining the matching result according to the preset distance.

7. The method of claim 6, wherein the preset distance comprises at least one of:

hamming distance, euclidean distance.

8. The method of claim 1, wherein the performing object detection on the image to be processed to obtain a detection frame image corresponding to the object to be identified in the image to be processed includes:

and determining a detection frame image corresponding to the target detection frame from the image to be processed according to the target detection frame.

9. The method of claim 8, wherein the object detection model includes an image feature extraction layer and a detection frame output layer;

the inputting the image to be processed into a target detection model, and outputting a target detection frame corresponding to the object to be identified includes:

10. An image processing apparatus comprising:

the detection frame image coding acquisition module is used for carrying out image coding on the detection frame image based on a characteristic image word bag model to obtain detection frame image coding, wherein the characteristic image word bag model is constructed according to a sample characteristic element image extracted from a sample image;

11. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-9.

12. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1 to 9.

13. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 9.