CN107729901B - Image processing model establishing method and device and image processing method and system - Google Patents

Image processing model establishing method and device and image processing method and system Download PDF

Info

Publication number
CN107729901B
CN107729901B CN201610652942.7A CN201610652942A CN107729901B CN 107729901 B CN107729901 B CN 107729901B CN 201610652942 A CN201610652942 A CN 201610652942A CN 107729901 B CN107729901 B CN 107729901B
Authority
CN
China
Prior art keywords
image
model
matrix
image processing
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610652942.7A
Other languages
Chinese (zh)
Other versions
CN107729901A (en
Inventor
李�昊
孙修宇
刘巍
潘攀
华先胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610652942.7A priority Critical patent/CN107729901B/en
Publication of CN107729901A publication Critical patent/CN107729901A/en
Application granted granted Critical
Publication of CN107729901B publication Critical patent/CN107729901B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

The application provides an image processing model establishing method and device, an image processing method and system, and the four schemes are consistent in technical idea, wherein the image processing system comprises: an image processing apparatus and a matrix multiplier; the image processing device is configured with a pre-established image processing model, wherein the image processing model comprises a first model and a second model; the image processing device is used for processing the image to be processed through the image processing model to obtain a region characteristic matrix and a region characteristic weight matrix of the image to be processed; the matrix multiplier is used for carrying out matrix multiplication processing on the area characteristic matrix and the area characteristic weight matrix of the image to be processed to obtain the characteristic of the image to be processed. The system can rapidly realize the end-to-end feature extraction from the image to the target feature, and the establishment process of the image processing model greatly reduces the requirement on the sample size, thereby reducing the manual labeling cost in the early stage.

Description

Image processing model establishing method and device and image processing method and system
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to an image processing model establishing method, an image processing model establishing apparatus, an image processing method, and an image processing system.
Background
In recent years, with the continuous development of mobile internet technology, a search entry is being shifted from traditional text search to image search, and the image search is well developed at present and also becomes a search mode commonly used by users. For example: in the field of electronic commerce, a user can search for commodities by searching in a mode of searching in a picture.
The realization of image search mainly depends on an image description algorithm, and at present, the commonly used image description algorithm mainly comprises two modules, a detection module and a feature extraction module; the detection module is used for matting out an object part needing to extract features from an original image so as to remove background interference. And the characteristic extraction module is used for extracting characteristics from the target part which is extracted by the monitoring module. In implementation, the detection module and the feature extraction module need to be trained specifically, and the training processes of the two modules are separated but sample-related.
During training, a large number of training samples (generally, millions to millions of samples) need to be manually labeled for training of the detection module; training the detection module earlier, utilizing the training result of detection module again, the required training sample of artificial mark characteristic extraction module, to the manual work, mark work load is great to, the training of characteristic extraction module just begins after the detection module training is accomplished, consequently, whole training flow is longer, and the time cost is higher.
Disclosure of Invention
The technical problem to be solved by the application is to provide an image processing model establishing method, which can quickly realize end-to-end model establishment, reduce the workload of manual labeling of samples, improve the efficiency of model establishment and ensure the reliability of the model.
In addition, the application also provides an image processing model establishing device, an image processing method and an image processing system, so as to ensure the realization and the application of the method in practice.
In a first aspect of the present application, a method for building an image processing model is provided, the method including:
inputting a pre-labeled sample image into a first model for learning to obtain a region characteristic matrix of each image in the sample image, wherein the region characteristic matrix is used for representing the target characteristic condition in each region in the image;
inputting the region characteristic matrix of each image into a second model for learning to obtain a region characteristic weight matrix of each image, wherein the region characteristic weight matrix is used for representing the weight of each region in the region characteristic matrix, and the weight represents the significance of the target characteristic in the region;
calculating the characteristics of each image according to the area characteristic matrix of each image and the corresponding area characteristic weight matrix;
when the accuracy of the features of each image obtained after learning and the features of the image in the pre-labeled sample image tends to be stable, determining that the image processing model obtained by learning comprises the following steps: the first model and the second model.
In a second aspect of the present application, there is provided an apparatus for creating an image processing model, the apparatus comprising:
the first model learning module is used for inputting a pre-labeled sample image into a first model for learning to obtain a region feature matrix of each image in the sample image, and the region feature matrix is used for representing the target feature condition in each region in the image;
the second model learning module is used for inputting the region feature matrix of each image into a second model for learning to obtain a region feature weight matrix of each image, the region feature weight matrix is used for representing the weight of each region in the region feature matrix, and the weight represents the significance of the target feature in the region;
the calculation module is used for calculating the characteristics of each image according to the area characteristic matrix of each image and the corresponding area characteristic weight matrix;
a determining module, configured to determine, when the accuracy of the feature of each image obtained after learning and the feature of the image in the pre-labeled sample image tends to be stable, that the image processing model obtained by learning includes: the first model and the second model.
In a third aspect of the present application, there is provided an image processing system, the system comprising:
an image processing apparatus and a matrix multiplier;
wherein the image processing device is in communication with the matrix multiplier;
the image processing equipment is provided with a pre-established image processing model, and the image processing model comprises a first model and a second model; the first model is a model of a regional characteristic matrix of an image, which can be obtained by learning the image; the second model is a model which can learn a regional characteristic matrix of the image to obtain a regional characteristic weight matrix of the image;
the image processing device is used for processing the image to be processed through the image processing model to obtain a region characteristic matrix and a region characteristic weight matrix of the image to be processed, and outputting the region characteristic matrix and the region characteristic weight matrix to the matrix multiplier;
the matrix multiplier is used for carrying out matrix multiplication processing on the received area characteristic matrix and the area characteristic weight matrix of the image to be processed to obtain the characteristics of the image to be processed.
In a fourth aspect of the present application, there is provided an image processing method, the method comprising:
inputting an image to be processed into a pre-established image processing model, wherein the image processing model comprises a first model and a second model;
processing the image to be processed through a first model in the image processing model to obtain a region characteristic matrix of the image to be processed, and inputting the region characteristic matrix into the second model to obtain a region characteristic weight matrix of the image to be processed;
and calculating the characteristics of the image to be processed according to the regional characteristic matrix of the image to be processed and the regional characteristic weight matrix of the image to be processed.
Compared with the prior art, the technical scheme provided by the application has the following advantages:
the technical scheme provided by the application provides an end-to-end image processing model establishing method, and the end-to-end model learning only needs pre-labeled sample images with characteristic attributes. However, it is known that in the prior art, a large number of sample images with target information need to be pre-labeled for training of the detection module, and a large number of sample images with feature attributes need to be pre-labeled for training of the feature extraction module, so that compared with the prior art, the technical scheme of the application does not need so many sample images, and reduces nearly half of the labeling workload.
In addition, the image processing model of the related art includes: the device comprises a detection module and a feature extraction module; the detection module is mainly used for detecting a target area in the image; and the feature extraction module is used for extracting the target features of the image aiming at the target area. Different from the prior art, the image processing model provided by the application comprises a first model and a second model, wherein the first model is mainly a model which can be used for learning the characteristics of the region position of an image; and the second model is mainly a model that can learn the feature weights of the region positions of the image.
It can be seen that: the combination of the two models can learn the target feature condition of the image, so that the end-to-end learning from the image to the feature can be realized, and compared with the layered learning from the image to the target region and from the target region to the feature in the prior art, the learning mode of the method has the advantages of high learning efficiency and reliability of the learning model.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
Fig. 1 is a block diagram of an image processing system according to an embodiment 1 of the present application;
FIG. 2 is a block diagram of an embodiment 2 of an image processing system provided in the present application;
FIG. 3 is a flow chart of a method for building an image processing model provided herein;
FIG. 4 is a flow chart of an image processing method provided by the present application;
fig. 5 is a block diagram of an image processing model creation apparatus according to the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The application is operational with numerous general purpose or special purpose computing device environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multi-processor apparatus, distributed computing environments that include any of the above devices or equipment, and the like.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The technical scheme provided by the application mainly provides a new image processing model, which is different from the prior art in that the new image processing model is formed by combining a first model and a second model, and the respective functions of the first model and the second model are completely different from a measurement module and a feature extraction module in the prior art; on one hand, the present application provides a method for establishing such a new image processing model (which may also be understood as a training method or a learning method), which is different from the prior art in that a large number of sample images are not labeled for training of each module, but only a pre-labeled sample image with a characteristic attribute is required in the entire establishing process of the present application. This greatly reduces the sample size requirements and thus reduces the labor labeling costs of the earlier stages. On the other hand, the application also provides an image processing system based on the new image processing model, and the system can rapidly realize end-to-end feature extraction from images to target features, can be well adapted to business scenes with a large number of images to be processed, and can be well adapted to an e-commerce platform. On the other hand, in order to ensure the practical application and implementation of the establishing method, the application also provides an establishing device of the image processing model.
In order to facilitate a person skilled in the art to understand the technical solutions of the present application well, the image processing system provided in the present application will be explained first.
Referring to fig. 1, fig. 1 is a block diagram of an image processing system provided in the present application, and as shown in fig. 1, the system 100 includes:
an image processing apparatus 101 and a matrix multiplier 102;
wherein the image processing apparatus 101 is in communication with the matrix multiplier 102;
the image processing equipment is provided with a pre-established image processing model, and the image processing model comprises a first model and a second model;
the first model is a model of a regional characteristic matrix of an image, which can be obtained by learning the image; the second model is a model which can learn a regional characteristic matrix of the image to obtain a regional characteristic weight matrix of the image;
in implementation, the image processing model may be an image processing model established by the establishing method of the image processing model provided in the present application.
The image processing device 101 is configured to process an image to be processed through the image processing model to obtain a region feature matrix and a region feature weight matrix of the image to be processed, and output the region feature matrix and the region feature weight matrix to the matrix multiplier;
the matrix multiplier 102 is configured to perform matrix multiplication processing on the received region feature matrix and the region feature weight matrix of the image to be processed, so as to obtain the feature of the image to be processed.
Based on the system 100 shown in fig. 1, when an image needs to extract a target feature, the image is input to the image processing device 101 of the system 100, and then a region feature matrix of the image is obtained through training by a first model 1011 in the image processing device 101; then, the first model 101 outputs the trained region feature matrix to the second model 1012 and the matrix multiplier 102, the second model 1012 trains the region feature weight matrix a of the image, and the region feature weight matrix B is output to the matrix multiplier 102. Finally, the matrix multiplier 102 performs matrix multiplication processing on the received area feature matrix a and the area feature weight matrix B of the image to obtain an object feature matrix C, and the object feature matrix C can represent the object feature condition included in the image.
In addition, on the basis of the system shown in fig. 1, the present application also provides a more practical system, see the system 200 shown in fig. 2, and the difference between the system 200 and the system 100 shown in fig. 1 is that in the system shown in fig. 2, an image processing model configured in an image processing device provides a more practical, more specific model structure, and a model structure adopting a full convolution neural network module to merge an attention model is provided, so that not only can the advantages of high learning capability and high learning speed of the full convolution neural network model be better exerted, but also the advantage of the significance of the fast learning region of the attention model can be better exerted, and the whole image processing model can exert better processing effect.
As shown in fig. 2, the system 200 includes:
an image processing apparatus 201 and a matrix multiplier 202;
wherein the image processing apparatus 201 is in communication with the matrix multiplier 202;
the image processing device 201 is configured with a pre-established image processing model, which includes a full convolution neural network model and an attention model;
the image processing device 201 is configured to process an image to be processed through the image processing model to obtain a region feature matrix and a region feature weight matrix of the image to be processed, and output the region feature matrix and the region feature weight matrix to the matrix multiplier;
the matrix multiplier 202 is configured to perform matrix multiplication processing on the received region feature matrix and the region feature weight matrix of the image to be processed, so as to obtain the feature of the image to be processed.
Next, the operation principle of the system 200 is explained in conjunction with the practical application scenario of the e-commerce platform.
The method comprises the steps of extracting features of clothing images on an electronic commerce platform, and extracting specific clothing features in the images. Based on such service requirements, the clothing images to be processed on the electronic product platform may be respectively input to the image processing device 101 in the system 100, and then, since the full convolution neural network model 2011 in the image processing device 101 learns the clothing images to obtain the region feature matrix of the images, the region feature matrix can represent the feature condition of the region position of the service image, where the feature may include: the shape of the collar, the color of the collar, the shape of the cuffs, the color of the cuffs, the length of the sleeves, etc. and characteristic information relating to the attributes of a particular garment.
Then, the attention model 2012 learns the region feature matrix of the clothing image to obtain a region feature weight matrix of the clothing image, and the region feature weight matrix can be used to normalize the significance of the target feature at the region position of the service image, and is mainly embodied by the weight corresponding to the region position.
Based on this, the matrix multiplier 202 performs matrix multiplication on the area feature matrix of the clothing image and the corresponding area feature weight matrix to obtain the target feature matrix, and the target feature matrix can obtain the target feature condition included in the clothing image.
The above example is only explained by taking the clothing product image in the e-commerce platform as an example, but the system provided by the embodiment of the application can be applied to any platform that needs to extract features of an image, can process other types of images, and is not limited to the processing of the service product image.
It should be noted here that, in order to adapt to different application scenarios, the image processing apparatus in the above system may be specifically configured with an image processing model pre-established with a sample image of a specific application scenario; one image processing model may be provided in the image processing apparatus, or a plurality of image processing models may be provided. Of course, matching the function of the image processing device, one matrix multiplier may be configured in the system, or one corresponding matrix multiplier may be configured for each image processing model. Of course, in the system, only one matrix multiplier may be configured for a plurality of image processing models, and one matrix multiplier may be shared by a plurality of image processing models. Thus, it can be seen that: the block diagrams shown in fig. 1 and 2 are only for facilitating understanding of those skilled in the art, and the structure of the system provided by the present application is not limited to fig. 1 and 2.
In addition, when implemented, the image processing device and the matrix multiplier in fig. 1 and fig. 2 may be integrated in the same hardware device, or may be deployed as separate hardware devices.
The above-mentioned systems 100, 200 mainly rely on configured image processing models, and the present application provides a corresponding method and apparatus for establishing such image processing models, and then, a method for establishing such image processing models is described first.
Referring to fig. 3, fig. 3 is a flowchart illustrating a method for building an image processing model according to the present application, and as shown in fig. 3, the method may include: step 301 to step 304:
step 301: inputting a pre-labeled sample image into a first model for learning to obtain a region characteristic matrix of each image in the sample image, wherein the region characteristic matrix is used for representing the target characteristic condition in each region in the image.
During implementation, a certain number of pre-labeled sample images can be collected according to the actual model learning requirement, then the collected sample images are input to the first model for learning, the first model can be initialized randomly at the beginning of training, and then the sample images are learned in sequence to obtain the region feature matrix corresponding to each image.
In the embodiment of the present application, the first model may adopt an existing model structure, as long as the model can be learned for the image, and the region feature condition of the image is obtained by learning.
The inventor also provides that the first model adopts a full convolution neural network model structure in the embodiment of the application, the full convolution neural network only has convolution processing and is not fully connected, and independence of regional characteristics can be ensured; in addition, the full convolution neural network has the characteristics of simple structure, less training parameters and strong adaptability.
When implemented, embodiments of the present application may support images in any format, including but not limited to JPG, PNG, TIF, BMP, etc. Of course, in order to ensure the uniformity and processing rate of image processing during implementation, when the sample image is received, the sample image may be converted into a uniform format supported by the system, and then the sample image is processed accordingly. Of course, in order to adapt to the processing performance of the system, the sample images with different sizes may be cut into fixed-size images supported by the system, and then the images are processed accordingly.
In implementation, in order to adapt to an e-commerce platform, the labeled target features of each image in the pre-labeled sample image include: the image belongs to a product attribute feature of a product category. For example, if the product category of the clothing image is clothing category, the labeled target features of the clothing image include clothing attribute features.
Further, in order to provide more and more efficient sample features, the labeled target features of each image in the pre-labeled sample image may further include: a similarity feature of the image with another image.
For example, image a has a 90% similarity to image B and image a has a 0% similarity to image C; the image similarity relation features can indirectly represent the relation of the internal features of the two images from the whole. On the basis of specific product characteristics, model learning can be performed by taking the similar relation characteristics of the images as sample data, so that the learned model is more reliable and has higher performance.
Step 302: inputting the region feature matrix of each image into a second model for learning to obtain a region feature weight matrix of each image, wherein the region feature weight matrix is used for representing the weight of each region in the region feature matrix, and the weight represents the significance of the target feature in the region.
In the embodiment of the present application, the second model may adopt an existing model structure as long as the model can be learned for the image, and the learning obtains the significance of the region features representing the image.
The inventor also proposes a specific implementation of the second model, in the embodiment of the present application, the second model adopts an attention model structure, the attention model is also called a visual attention model, and the model is a model for simulating a human visual attention system by a computer, and the noticeable features observed by human eyes are extracted in an image, and in the case of the computer, the noticeable region features of the image are obtained. The attention model is mainly capable of determining regions with high significance in the image, and in the application, the significance of the regions in the image is represented by weight. By utilizing the attention model, the purposes of eliminating image background interference and reducing unimportant information weight can be achieved through the weight of each region position, and therefore the specific target characteristic part in the image can be extracted through the weight.
It should be noted here that, for convenience of calculation and improvement of learning efficiency, in implementation, it may be preset that the matrix sizes output by the first model and the second model are the same, and if the size of the region feature matrix learned by the first model is N × M, the size of the region feature weight matrix learned by the second model is also N × M, where N and M are positive integers greater than or equal to 2, and N and M may have the same value or different values.
Of course, the implementation of the embodiment of the present application may not specifically require the size of the matrix of the outputs of the two models. However, if the size of the region feature matrix output by the first model is different from that of the region feature weight matrix output by the second model, the region feature matrix output by the first model can be transformed into a matrix with the same size by means of matrix transformation, so that the calculation operation in step 303 is facilitated.
Step 303: and calculating the characteristics of each image according to the area characteristic matrix of each image and the corresponding area characteristic weight matrix.
In implementation, step 203 may be implemented using a matrix multiplier to compute the characteristic portion of each image. The matrix multiplier refers to a multiplier capable of realizing matrix multiplication.
By repeating the above-described steps 301 to 303 a plurality of times, the corresponding feature portion can be learned a plurality of times for the sample image. After learning, it is decided by step 304 whether to stop learning to get a learned image processing model.
Step 304: when the accuracy of the features of each image obtained after learning and the features of the image in the pre-labeled sample image tends to be stable, determining that the image processing model obtained by learning comprises the following steps: the first model and the second model.
When the method is implemented, the embodiment of the present application may further determine whether the precision of the feature of each image obtained after learning and the feature of the graph in the pre-labeled sample image tends to be stable by the following method:
calculating the precision between the feature of each image obtained by each learning and the feature of the image in a pre-labeled sample image, and taking the precision as the precision of the learning model;
and judging whether the variation amplitude of the precision of the multiple learning models is smaller than a preset variation amplitude threshold value or not, and if so, determining that the precision of the feature of each image obtained after the multiple learning and the feature of the image in the pre-labeled sample image tends to be stable.
When the accuracy of the multiple learning models is judged to tend to be stable and not to suddenly change, the first model and the second model which are learned at the moment are shown to reach a stable state, and the learning does not need to be continued, at the moment, the well-learned image processing model can be determined to comprise: a first model and a second model. The judgment mode can effectively control learning to obtain an image processing model with better performance, and can save sample learning time as far as possible.
In implementation, a learning period may be preset (for example, the learning period is 1000 iterations), and then, when the number of iterations of model learning reaches 1000, the learning accuracy of all the pre-labeled sample images is determined through the above step 304, and if the accuracy is improved very little in the learning process, it is determined that the currently learned model has reached a stable state, and the learning may be stopped.
In addition, the inventor of the present application also considers that the sample images marked in advance are all marked manually, and the sample images marked manually sometimes have the problems of marking errors, feature marking errors and the like, so as to facilitate manual correction of the marked images to provide sample images with higher quality for the next model training, on the basis of the above method, the following steps can be added:
and when the precision between the learned characteristics of a certain sample image and the characteristics of the image in the pre-labeled sample image is smaller than a preset precision threshold value, feeding back alarm information to a background manual monitoring system, wherein the alarm information is used for prompting that the certain sample image has a marking problem.
Based on the alarm information, background personnel can pertinently correct the marking condition of the suspicious sample image so as to ensure the accuracy of image marking.
The method for establishing the image processing model provides an end-to-end image processing model establishing method, and the end-to-end model learning only needs a pre-labeled sample image with characteristic attributes. However, it is known that in the prior art, a large number of sample images with target information need to be pre-labeled for training of the detection module, and a large number of sample images with feature attributes need to be pre-labeled for training of the feature extraction module, so that compared with the prior art, the technical scheme of the application does not need so many sample images, and reduces nearly half of the labeling workload.
In addition, the image processing model of the related art includes: the device comprises a detection module and a feature extraction module; the detection module is mainly used for detecting a target area in the image; and the feature extraction module is used for extracting the target features of the image aiming at the target area. Unlike the prior art, the image processing model provided by the application comprises: a first model and a second model, wherein the first model is mainly a model which can be learned for the characteristics of the region position of the image; and the second model is mainly a model that can learn the feature weights of the region positions of the image.
It can be seen that: the first model and the second model can be combined to learn the target feature condition of the image, so that end-to-end learning from image to feature can be realized, and compared with the prior art of layered learning from image to target region and from target region to feature, the learning mode of the method has the advantages of high learning efficiency and reliability of the learning model.
Based on the method for establishing the image processing model, the present application also provides an image processing method, which mainly uses the image processing model to perform feature extraction on an image, and is mainly applied to the systems shown in fig. 1 and fig. 2, and the image processing method is explained below.
Referring to fig. 4, fig. 4 is a flowchart illustrating an image processing method provided by the present application, where the method is applied to an image processing model established by using the method illustrated in fig. 2, and the method may include: step 401 to step 403:
step 401: inputting an image to be processed into a pre-established image processing model, wherein the image processing model comprises a first model and a second model;
step 402: processing the image to be processed through a first model in the image processing model to obtain a region characteristic matrix of the image to be processed, and inputting the region characteristic matrix into the second model to obtain a region characteristic weight matrix of the image to be processed;
step 403: and calculating the characteristics of the image to be processed according to the regional characteristic matrix of the image to be processed and the regional characteristic weight matrix of the image to be processed.
When the method is applied to the system shown in fig. 2, the first model in the image processing model adopts a full convolution neural network model; the second model in the image processing model employs an attention model. That is, the image processing model includes a full convolution neural network model and an attention model.
In addition, corresponding to the method for establishing the image processing model, the embodiment of the application also provides a device for establishing the corresponding image processing model, and the device is used for realizing the method. The device is explained below with reference to fig. 5.
Fig. 5 is an apparatus for creating an image processing model according to the present application, and as shown in fig. 5, the apparatus may include: the first model learning module 501, the second model learning module 502, the calculation module 503 and the determination module 504 are explained below based on the operation principle of the apparatus.
A first model learning module 501, configured to input a pre-labeled sample image into a first model for learning, so as to obtain a region feature matrix of each image in the sample image, where the region feature matrix is used to represent a target feature condition in each region in the image;
a second model learning module 502, configured to input the region feature matrix of each image into a second model for learning, so as to obtain a region feature weight matrix of each image, where the region feature weight matrix is used to represent a weight of each region in the region feature matrix, and the weight represents a significance of a target feature in the region;
a calculating module 503, configured to calculate a feature of each image according to the region feature matrix of each image and the corresponding region feature weight matrix;
a determining module 504, configured to determine that, when the accuracy of the feature of each image obtained after learning and the feature of the image in the pre-labeled sample image tends to be stable, the image processing model obtained by learning includes: the first model and the second model.
When implemented, the first model employed by the first model learning module may be a full convolution neural network model.
When implemented, the second model employed by the second model learning module may be an attention model.
When implemented, the labeled target features of each image in the pre-labeled sample image comprise: the image belongs to a product attribute feature of a product category.
Further, the labeled target feature of each image in the pre-labeled sample image further comprises: a similarity feature of the image with another image.
In implementations, the apparatus may further include: the judging module is used for judging whether the precision of the feature of each image obtained after learning and the feature of the image in the pre-labeled sample image tends to be stable or not;
the judging module comprises:
the precision calculation submodule is used for calculating the precision between the features of each image obtained by learning each time and the features of the image in the pre-marked sample image, and taking the precision as the precision of the learning model;
and the determining submodule is used for judging whether the variation amplitude of the precision of the multi-time learning model is smaller than a preset variation amplitude threshold value or not, and if so, determining that the precision of the feature of each image obtained after the multi-time learning and the precision of the feature of the image in the pre-labeled sample image tend to be stable.
In implementations, the apparatus may further include:
and the warning module is used for feeding back warning information to a background manual monitoring system when the precision between the learned characteristics of a certain sample image and the characteristics of the image in the pre-labeled sample image is smaller than a preset precision threshold value, wherein the warning information is used for prompting that the certain sample image has a marking problem.
The device for establishing the image processing model provides an end-to-end image processing model establishing method, and the end-to-end model learning only needs a pre-labeled sample image with characteristic attributes. However, it is known that in the prior art, a large number of sample images with target information need to be pre-labeled for training of the detection module, and a large number of sample images with feature attributes need to be pre-labeled for training of the feature extraction module, so that compared with the prior art, the technical scheme of the application does not need so many sample images, and reduces nearly half of the labeling workload.
In addition, the image processing model of the related art includes: the device comprises a detection module and a feature extraction module; the detection module is mainly used for detecting a target area in the image; and the feature extraction module is used for extracting the target features of the image aiming at the target area. Unlike the prior art, the image processing model provided by the application comprises: a first model and a second model, wherein the first model is mainly a model which can be learned for the characteristics of the region position of the image; and the second model is mainly a model that can learn the feature weights of the region positions of the image.
It can be seen that: the first model and the second model can be combined to learn the target feature condition of the image, so that end-to-end learning from image to feature can be realized, and compared with the prior art of layered learning from image to target region and from target region to feature, the learning mode of the method has the advantages of high learning efficiency and reliability of the learning model.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Finally, it is further noted that, herein, relational terms such as first, second, third, fourth, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The method for establishing an image processing model, the device for establishing an image processing model, the method for processing an image, and the system for processing an image provided by the present application are described in detail above, specific examples are applied in the present application to explain the principles and embodiments of the present application, and the description of the above embodiments is only used to help understand the method and the core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (8)

1. A method for building an image processing model, the method comprising:
inputting a pre-labeled sample image into a first model for learning to obtain a region characteristic matrix of each image in the sample image, wherein the region characteristic matrix is used for representing the target characteristic condition in each region in the image;
inputting the region characteristic matrix of each image into a second model for learning to obtain a region characteristic weight matrix of each image, wherein the region characteristic weight matrix is used for representing the weight of each region in the region characteristic matrix, and the weight represents the significance of the target characteristic in the region;
calculating the characteristics of each image according to the area characteristic matrix of each image and the corresponding area characteristic weight matrix;
when the accuracy of the features of each image obtained after learning and the features of the image in the pre-labeled sample image tends to be stable, determining that the image processing model obtained by learning comprises the following steps: the first model and the second model,
the labeled target features of each image in the pre-labeled sample image comprise product attribute features of a product category to which the image belongs and similar relation features of the image and another image.
2. The method of building an image processing model according to claim 1,
the first model adopts a full convolution neural network model;
the second model employs an attention model.
3. The method for building an image processing model according to claim 1, wherein it is determined whether the accuracy of the feature of each image obtained after learning and the feature of the graph in the pre-labeled sample image tends to be stable by:
calculating the precision between the feature of each image obtained by each learning and the feature of the image in a pre-labeled sample image, and taking the precision as the precision of the learning model;
and judging whether the variation amplitude of the precision of the multiple learning models is smaller than a preset variation amplitude threshold value or not, and if so, determining that the precision of the feature of each image obtained after the multiple learning and the feature of the image in the pre-labeled sample image tends to be stable.
4. The method of building an image processing model according to claim 1, further comprising:
and when the precision between the learned characteristics of a certain sample image and the characteristics of the image in the pre-labeled sample image is smaller than a preset precision threshold value, feeding back alarm information to a background manual monitoring system, wherein the alarm information is used for prompting that the certain sample image has a marking problem.
5. An image processing method applied to the image processing model established by the method of any one of claims 1 to 4, the method comprising: inputting an image to be processed into a pre-established image processing model, wherein the image processing model comprises a first model and a second model;
processing the image to be processed through a first model in the image processing model to obtain a region characteristic matrix of the image to be processed, and inputting the region characteristic matrix into the second model to obtain a region characteristic weight matrix of the image to be processed;
and calculating the characteristics of the image to be processed according to the regional characteristic matrix of the image to be processed and the regional characteristic weight matrix of the image to be processed.
6. The image processing method according to claim 5,
the first model adopts a full convolution neural network model;
the second model employs an attention model.
7. An image processing system, characterized in that the system comprises:
an image processing apparatus and a matrix multiplier;
wherein the image processing device is in communication with the matrix multiplier;
the image processing apparatus is configured with an image processing model established according to the method of any one of claims 1 to 4, the image processing model comprising a first model and a second model; the first model is a model of a regional characteristic matrix of an image, which can be obtained by learning the image; the second model is a model which can learn a regional characteristic matrix of the image to obtain a regional characteristic weight matrix of the image;
the image processing device is used for processing the image to be processed through the image processing model to obtain a region characteristic matrix and a region characteristic weight matrix of the image to be processed, and outputting the region characteristic matrix and the region characteristic weight matrix to the matrix multiplier;
the matrix multiplier is used for carrying out matrix multiplication processing on the received area characteristic matrix and the area characteristic weight matrix of the image to be processed to obtain the characteristics of the image to be processed.
8. An apparatus for creating an image processing model, the apparatus comprising:
the first model learning module is used for inputting a pre-labeled sample image into a first model for learning to obtain a region feature matrix of each image in the sample image, and the region feature matrix is used for representing the target feature condition in each region in the image;
the second model learning module is used for inputting the region feature matrix of each image into a second model for learning to obtain a region feature weight matrix of each image, the region feature weight matrix is used for representing the weight of each region in the region feature matrix, and the weight represents the significance of the target feature in the region;
the calculation module is used for calculating the characteristics of each image according to the area characteristic matrix of each image and the corresponding area characteristic weight matrix;
a determining module, configured to determine, when the accuracy of the feature of each image obtained after learning and the feature of the image in the pre-labeled sample image tends to be stable, that the image processing model obtained by learning includes: the first model and the second model,
the labeled target features of each image in the pre-labeled sample image comprise product attribute features of a product category to which the image belongs and similar relation features of the image and another image.
CN201610652942.7A 2016-08-10 2016-08-10 Image processing model establishing method and device and image processing method and system Active CN107729901B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610652942.7A CN107729901B (en) 2016-08-10 2016-08-10 Image processing model establishing method and device and image processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610652942.7A CN107729901B (en) 2016-08-10 2016-08-10 Image processing model establishing method and device and image processing method and system

Publications (2)

Publication Number Publication Date
CN107729901A CN107729901A (en) 2018-02-23
CN107729901B true CN107729901B (en) 2021-04-27

Family

ID=61200152

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610652942.7A Active CN107729901B (en) 2016-08-10 2016-08-10 Image processing model establishing method and device and image processing method and system

Country Status (1)

Country Link
CN (1) CN107729901B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106934397B (en) * 2017-03-13 2020-09-01 北京市商汤科技开发有限公司 Image processing method and device and electronic equipment
CN108903913B (en) * 2018-05-31 2020-12-01 中南大学湘雅医院 Skin flap transplantation postoperative care monitoring equipment, system, method, product and server
CN110688509A (en) * 2018-06-19 2020-01-14 新智数字科技有限公司 Sample data storage method and device
CN109117781B (en) * 2018-08-07 2020-09-08 北京一维大成科技有限公司 Multi-attribute identification model establishing method and device and multi-attribute identification method
CN109145816B (en) * 2018-08-21 2021-01-26 北京京东尚科信息技术有限公司 Commodity identification method and system
CN110210535B (en) * 2019-05-21 2021-09-10 北京市商汤科技开发有限公司 Neural network training method and device and image processing method and device
CN110503151B (en) * 2019-08-26 2020-11-03 北京推想科技有限公司 Image processing method and system
CN113222167A (en) * 2020-02-06 2021-08-06 浙江大学 Image processing method and device
CN112686185A (en) * 2021-01-05 2021-04-20 北京地平线机器人技术研发有限公司 Relationship feature extraction method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063872A (en) * 2014-07-04 2014-09-24 西安电子科技大学 Method for detecting salient regions in sequence images based on improved visual attention model
CN104077609A (en) * 2014-06-27 2014-10-01 河海大学 Saliency detection method based on conditional random field
CN104732534A (en) * 2015-03-18 2015-06-24 中国人民公安大学 Method and system for matting conspicuous object in image
CN105224963A (en) * 2014-06-04 2016-01-06 华为技术有限公司 The method of changeable degree of depth learning network structure and terminal
CN105426919A (en) * 2015-11-23 2016-03-23 河海大学 Significant guidance and unsupervised feature learning based image classification method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9633444B2 (en) * 2014-05-05 2017-04-25 Xiaomi Inc. Method and device for image segmentation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224963A (en) * 2014-06-04 2016-01-06 华为技术有限公司 The method of changeable degree of depth learning network structure and terminal
CN104077609A (en) * 2014-06-27 2014-10-01 河海大学 Saliency detection method based on conditional random field
CN104063872A (en) * 2014-07-04 2014-09-24 西安电子科技大学 Method for detecting salient regions in sequence images based on improved visual attention model
CN104732534A (en) * 2015-03-18 2015-06-24 中国人民公安大学 Method and system for matting conspicuous object in image
CN105426919A (en) * 2015-11-23 2016-03-23 河海大学 Significant guidance and unsupervised feature learning based image classification method

Also Published As

Publication number Publication date
CN107729901A (en) 2018-02-23

Similar Documents

Publication Publication Date Title
CN107729901B (en) Image processing model establishing method and device and image processing method and system
CN109284729B (en) Method, device and medium for acquiring face recognition model training data based on video
US9251588B2 (en) Methods, apparatuses and computer program products for performing accurate pose estimation of objects
CN110008956B (en) Invoice key information positioning method, invoice key information positioning device, computer equipment and storage medium
CN110688929B (en) Human skeleton joint point positioning method and device
TW201915787A (en) Search method and processing device
CN107729908B (en) Method, device and system for establishing machine learning classification model
CN111488873B (en) Character level scene text detection method and device based on weak supervision learning
CN110414581B (en) Picture detection method and device, storage medium and electronic device
CN111626362A (en) Image processing method, image processing device, computer equipment and storage medium
CN111444850B (en) Picture detection method and related device
US9208404B2 (en) Object detection with boosted exemplars
US20230418910A1 (en) Multimodal sentiment classification
EP3989158A1 (en) Method, apparatus and device for video similarity detection
CN111461164A (en) Sample data set capacity expansion method and model training method
CN108694716B (en) Workpiece detection method, model training method and equipment
CN114881989A (en) Small sample based target object defect detection method and device, and electronic equipment
CN112926621A (en) Data labeling method and device, electronic equipment and storage medium
CN111651674A (en) Bidirectional searching method and device and electronic equipment
CN114782752B (en) Small sample image integrated classification method and device based on self-training
CN110852189A (en) Low-complexity dense crowd analysis method based on deep learning
CN111814036A (en) Wireless hotspot and interest point matching method based on search engine, electronic device and storage medium
CN114170468B (en) Text recognition method, storage medium and computer terminal
CN110610206A (en) Image vulgar attribution identification method, device and equipment
CN116091984B (en) Video object segmentation method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant