Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows an exemplary system architecture 100 to which embodiments of the picture processing method or picture processing apparatus of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as an image recognition application, a shopping-like application, a search-like application, an instant messaging tool, a mailbox client, social platform software, and the like.
Here, the terminals 101, 102, and 103 may be hardware or software. When the terminals 101, 102, 103 are hardware, they may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the terminals 101, 102, 103 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server providing various services, such as a background server providing support for the terminal devices 101, 102, 103. The background server may analyze and perform other processing on the received data such as the target picture, and feed back a processing result (for example, a suspected error-labeled picture) to the terminal device.
It should be noted that the image processing method provided in the embodiment of the present application may be executed by the server 105 or the terminal devices 101, 102, and 103, and accordingly, the image processing apparatus may be disposed in the server 105 or the terminal devices 101, 102, and 103.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a picture processing method according to the present application is shown. The picture processing method comprises the following steps:
step 201, inputting a target picture into a classification model trained in advance.
In this embodiment, an execution subject (e.g., the server or the terminal device shown in fig. 1) on which the picture processing method operates may input the target picture into a classification model trained in advance. The classification model may be a two-classification model or a multi-classification model, and is mainly used for classifying objects contained in the picture. A target picture refers to any picture that contains a certain object or objects.
In practice, the classification Model may be trained by classifiers (classifiers) such as a Support Vector Machine (SVM), a Naive Bayesian Model (NBM), a Convolutional neural network Model (CNN), and the like. In addition, the classification model may also be pre-trained by some classification function (e.g., softmax function, etc.).
The training process of the classification model may be as follows: firstly, a training sample set is obtained, and each sample in the training sample set is a picture marked with the type of an included object. Each picture may be used as an input, and the type labeled to the object included in the picture may be used as an output to train an initial classification model (such as the above-mentioned support vector machine, etc.), so as to obtain a classification model. In the case where the classification model is a binary classification model, the labels for the samples are divided into two types. In this case, one of the two labels may be assigned to each sample according to the object included in the sample. In the case where the classification model is a multi-classification model, the labels applied to the samples may be classified into various types. In this case, one or more of the above-described labels may be assigned to each sample according to the object included in the sample.
Step 202, in response to determining that the object included in the target picture is inconsistent with the classification result output by the classification model, searching for a similar picture of the target picture in a training sample set of the classification model, wherein the similarity between the similar picture and the target picture is greater than the similarity between other sample pictures in the training sample set and the target picture.
In this embodiment, after the execution subject inputs the target picture into the classification model, the classification result may be acquired from the output of the classification model. The execution subject may search for a similar picture of the target picture in a training sample set of the classification model in response to determining that the object included in the target picture is inconsistent with the classification result output by the classification model.
The similarity between the similar picture and the target picture is greater than the similarity between other sample pictures in the training sample set and the target picture. That is, the similarities between the sample pictures and the target pictures are sequenced, and the similarities between the preset number of similar pictures and the target pictures are arranged at the positions of the preset number.
The executing body may determine whether the object included in the target picture is consistent with the classification result output by the classification model in various ways. For example, if the target picture is a picture with labeled categories, the execution subject may determine whether the categories labeled by the objects in the target picture are consistent with the classification result. If not, the object contained in the target picture is determined to be inconsistent with the classification result output by the classification model, and vice versa. If the target picture is a picture without labeling the category, the execution subject may determine whether the object included in the target picture is consistent with the classification result output by the classification model by receiving a determination result determined by the user based on the object included in the target picture and the classification result.
In a specific application scenario, a picture a containing a "cat" is input into a classification model, and the classification result of the classification model is a dog. The performing agent may look for similar pictures of picture a in a training sample set of the classification model, which may include a picture of a cat labeled as a dog.
The similar picture can be understood as a picture with higher similarity to the target picture. Similar pictures of the target picture may be determined from the training sample set in a variety of ways. For example, the execution body may determine a hash value of the picture. The smaller the hash value difference between the pictures is, the greater the similarity of the pictures is, so that the similarity between the similar picture and the target picture determined in the following manner is greater than the similarity between other sample pictures in the training sample set and the target picture. If the hash value difference between the target picture and a certain sample picture in the training sample set is smaller than the threshold value, the sample picture can be taken as a similar picture. Or when the target picture and the sample pictures in the training sample set are sorted by the hash value difference. The preset number of sample pictures with the minimum difference in the training sample set can be used as similar pictures. In addition, a proximity algorithm may also be used for the lookup. Since the sample picture closer to the target picture has a greater similarity to the target picture when performing the proximity algorithm calculation, several sample pictures closest to the target picture may be taken as the similar pictures.
Step 203, determine the similar picture as a suspected error labeling picture.
In this embodiment, the execution subject may determine the similar picture as a suspected error marked picture. The suspected labeling error picture is a picture with high possibility of labeling errors. It is understood that, here, the labeling of the suspected labeling error picture in the training sample set may be correct or incorrect.
In some optional implementations of the embodiment, the target picture is a sample picture in a training sample set for training the classification model.
In this implementation, the target picture may be a sample picture in a training sample set for training the classification model. Because the pictures in the training sample set are all clearly marked, the sample pictures in the training sample set can be used as target pictures. The labeled errors of the sample pictures in the training sample set may be the same or similar, so that batch correction of the errors in the training sample set is more facilitated.
With continuing reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the picture processing method according to the present embodiment. In the application scenario of fig. 3, the executing subject 301 may input picture a 302 into a pre-trained classification model 303. In response to determining that an object "tiger" contained in the picture a is inconsistent with a classification result "cat" output by the classification model, the execution main body 301 searches a similar picture X and a similar picture Y304 of the picture a in a training sample set of the classification model, wherein the similarity between the similar picture X and the similar picture Y and the picture a is greater than the similarity between other sample pictures in the training sample set and the picture a; the similar picture is determined to be the suspected tagged error picture 305.
The method provided by the embodiment of the application can screen out suspected wrong labeling pictures from the sample pictures of the training sample set, reduces the number of the wrong labeling pictures in the training sample set, and improves the accuracy of the model.
With further reference to fig. 4, a flow 400 of yet another embodiment of a picture processing method is shown. The process 400 of the image processing method includes the following steps:
step 401, inputting the target picture into a classification model trained in advance.
In this embodiment, an execution subject (e.g., the server or the terminal device shown in fig. 1) on which the picture processing method operates may input the target picture into a classification model trained in advance. The classification model may be a two-classification model or a multi-classification model, and is mainly used for classifying objects contained in the picture. A target picture refers to any picture that contains a certain object or objects.
Step 402, in response to determining that the object included in the target picture is inconsistent with the classification result output by the classification model, based on a proximity algorithm, searching for a similar picture of the target picture from the pictures of the training sample set, wherein the similarity between the similar picture and the target picture is greater than the similarity between other sample pictures in the training sample set and the target picture.
In this embodiment, after the execution subject inputs the target picture into the classification model, the classification result may be acquired from the output of the classification model. The execution subject may search for a similar picture of the target picture in a training sample set of the classification model in response to determining that the object included in the target picture does not belong to the classification result output by the classification model.
In some optional implementations of this embodiment, finding a similar picture of the target picture from the pictures of the training sample set based on a proximity algorithm (kNN) includes:
acquiring characteristic information of a target picture, and acquiring characteristic information of each picture in a training sample set;
in a multi-dimensional space coordinate system, determining the coordinate position of the feature information of the target picture as a target coordinate position, and determining the coordinate position of the feature information of the pictures in the training sample set as a coordinate position to be selected;
determining a preset number of coordinate positions from all coordinate positions to be selected according to the sequence of the distances from the target coordinate positions from small to large;
and determining the pictures corresponding to the preset number of coordinate positions as similar pictures.
In this implementation, the feature information refers to information that embodies the features of the picture. If the classification model is a convolutional neural network, the feature information may be information output by a full connection layer, and is specifically expressed in a vector form. The characteristic information of the picture may be presented in the form of coordinate points in a multi-dimensional space. The distances between the coordinate positions of the pictures in the training sample set and the coordinate position of the target picture are different. The picture corresponds to the coordinate position of the feature information of the picture. The smaller the distance between the coordinate positions is, the greater the similarity between the pictures corresponding to the two coordinate positions is. Therefore, the coordinate positions to be selected can be sorted in the order of the distance from the target coordinate position from small to large. And determining the coordinate position from the end of the sorted sequence with the small distance.
Step 403, determining the similar picture as a suspected error-marked picture.
In this embodiment, the execution subject may determine the similar picture as a suspected error marked picture. The suspected labeling error picture is a picture with high possibility of labeling errors. It is understood that, here, the labeling of the suspected labeling error picture in the training sample set may be correct or incorrect.
Step 404, sending the suspected labeling error picture, and determining whether the suspected labeling error picture is a labeling error picture.
In this embodiment, the execution subject sends the suspected labeling error picture to other electronic devices, and determines whether the suspected labeling error picture is a labeling error picture.
In practice, the execution subject may determine whether the suspected tagged error picture is the tagged error picture in a variety of ways. For example, the execution subject may send a suspected annotation error picture to the target terminal device, so that the target terminal device determines whether the suspected annotation error picture is an annotation error picture. Specifically, the target terminal device may display the suspected error labeling picture to the user. Thus, the user can determine whether the object contained in the picture is consistent with the label of the picture suspected of being labeled with the error. If the judgment result of the user is inconsistent, the target terminal can receive the judgment result of the user in the forms of receiving the operation of the user and the like, and the target terminal can determine that the suspected error labeling picture is the error labeling picture. In addition, the execution body may also display a suspected error picture. Thus, the user of the execution subject can see the suspected error picture and judge whether the suspected error picture is the marked error picture. Then, the execution subject can determine whether the suspected error labeling picture is the error labeling picture by using the judgment result input locally by the user.
In some optional implementations of this embodiment, the method further includes: acquiring a corrected picture, wherein the corrected picture is generated after the labeling correction is carried out on the image with the labeling error;
and replacing the error marked pictures in the training sample set with corrected pictures.
In this implementation manner, when the suspected error labeling picture is an error labeling picture, the execution subject may obtain a corrected picture, and replace the sample picture in the training sample set. Specifically, the acquired correction picture may be returned by the target terminal device, or may be input to the execution subject by the user of the execution subject. The labeling correction herein means to correct the labeling of the picture. The process of correcting the annotation of the picture may be performed by the user using the target terminal device or other devices (such as the execution body described above).
By the aid of the implementation mode, the sample pictures in the training sample set can be replaced, error labeling is reduced, and accuracy of the training sample set is improved.
In some application scenarios of the foregoing implementation, the method further includes: and retraining the classification model based on the label of the corrected picture.
In this application scenario, the executing entity may further improve the accuracy of the classification model by retraining the classification model.
In this embodiment, the execution subject may retrain the classification model based on the label of the modified picture. Specifically, the execution subject may take each picture as an input, take the type labeled to the object included in the modified picture as an output, and retrain the classification model.
The present embodiment can obtain a picture most similar to the target picture by using a proximity algorithm. And the label of the picture with the wrong label can be corrected, so that the picture with the corrected label is used for training to improve the accuracy of the model and reduce the probability of the error output of the model.
With further reference to fig. 5, as an implementation of the method shown in the above-mentioned figures, the present application provides an embodiment of an image processing apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in various electronic devices.
As shown in fig. 5, the picture processing apparatus 500 of the present embodiment includes: a classification unit 501, a search unit 502 and a determination unit 503. The classification unit 501 is configured to input the target picture into a classification model trained in advance; a searching unit 502 configured to search for a similar picture of the target picture in a training sample set of the classification model in response to determining that the object included in the target picture is inconsistent with the classification result output by the classification model, wherein the similarity between the similar picture and the target picture is greater than the similarity between other sample pictures in the training sample set and the target picture; a determining unit 503 configured to determine the similar picture as a suspected tagged error picture.
In some embodiments, the classification unit 501 of the picture processing apparatus 500 may input the target picture into a classification model trained in advance. The classification model may be a two-classification model or a multi-classification model, and is mainly used for classifying objects contained in the picture. A target picture refers to any picture that contains a certain object or objects.
In some embodiments, after the searching unit 502 inputs the target picture into the classification model, the classification result may be obtained from the output of the classification model. The execution subject may search for a similar picture of the target picture in a training sample set of the classification model in response to determining that the object included in the target picture is inconsistent with the classification result output by the classification model.
In some embodiments, the determining unit 503 may determine the similar picture as the suspected-to-mark-error picture. The suspected labeling error picture is a suspected labeling error picture, that is, the suspected labeling error picture may be correctly or incorrectly labeled.
In some optional implementations of the present embodiment, the target picture is a sample picture in a training sample set for training the classification model.
In some optional implementations of the present embodiment, the finding unit 502 is further configured to: acquiring characteristic information of a target picture, and acquiring characteristic information of each picture in a training sample set; in a multi-dimensional space coordinate system, determining the coordinate position of the feature information of the target picture as a target coordinate position, and determining the coordinate position of the feature information of the pictures in the training sample set as a coordinate position to be selected; determining a preset number of coordinate positions from all coordinate positions to be selected according to the sequence of the distances from the target coordinate positions from small to large; and determining the pictures corresponding to the preset number of coordinate positions as similar pictures.
In some optional implementations of this embodiment, the image processing apparatus further includes: and the sending unit is configured to send the suspected labeling error picture and determine whether the suspected labeling error picture is a labeling error picture.
In some optional implementations of this embodiment, the image processing apparatus further includes: an acquisition unit configured to acquire a corrected picture, wherein the corrected picture is a picture generated after the labeling correction is performed on the labeling error picture; and the replacing unit is configured to replace the error marked pictures in the training sample set with the corrected pictures.
In some optional implementations of this embodiment, the image processing apparatus further includes: and the training unit is configured to retrain the classification model based on the label of the corrected picture.
Referring now to FIG. 6, shown is a block diagram of a computer system 600 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
To the I/O interface 605, AN input section 606 including a keyboard, a mouse, and the like, AN output section 607 including a network interface card such as a Cathode Ray Tube (CRT), a liquid crystal display (L CD), and the like, a speaker, and the like, a storage section 608 including a hard disk, and the like, and a communication section 609 including a network interface card such as a L AN card, a modem, and the like, the communication section 609 performs communication processing via a network such as the internet, a drive 610 is also connected to the I/O interface 605 as necessary, a removable medium 611 such as a magnetic disk, AN optical disk, a magneto-optical disk, a semiconductor memory, and the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted into the storage section 608 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601. It should be noted that the computer readable medium of the present application can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a classification unit, a lookup unit, and a determination unit. Where the names of these units do not in some cases constitute a limitation on the unit itself, for example, a classification unit may also be described as a "unit that inputs a target picture into a pre-trained classification model".
As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: inputting a target picture into a pre-trained classification model; in response to the fact that the object contained in the target picture is inconsistent with the classification result output by the classification model, searching for a similar picture of the target picture in a training sample set of the classification model, wherein the similarity between the similar picture and the target picture is greater than the similarity between other sample pictures in the training sample set and the target picture; and determining the similar picture as a suspected error labeling picture.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.