CN112183166A

CN112183166A - Method and device for determining training sample and electronic equipment

Info

Publication number: CN112183166A
Application number: CN201910600036.6A
Authority: CN
Inventors: 武锐
Original assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Current assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date: 2019-07-04
Filing date: 2019-07-04
Publication date: 2021-01-05

Abstract

The application discloses a method and a device for determining a training sample, a computer readable storage medium and electronic equipment, and relates to the technical field of image processing. The method comprises the following steps: processing an image to be detected through a first network model to obtain a first processing result; processing the image to be detected through a second network model to obtain a second processing result; and when the difference value between the first processing result and the second processing result meets a first preset condition, determining the image to be detected as a training sample. The scheme solves the problems of low acquisition efficiency and high labor cost of training samples in the related technology.

Description

Method and device for determining training sample and electronic equipment

Technical Field

The invention relates to the technical field of image processing, in particular to a method and a device for determining a training sample, a computer-readable storage medium and electronic equipment.

Background

In recent years, with the rapid development of computer vision technology, technologies such as image target detection, image classification, image segmentation, video analysis and the like based on computer vision technology have also made breakthrough progress. Compared with the traditional image analysis and image processing technology, the computer vision task of the image processing and video processing technology based on the neural network is wider, and the identification accuracy is high, so that the method becomes a research hotspot.

At present, in order to enable an image processing technology based on a neural network to achieve higher image detection accuracy and image recognition rate, a large number of image training samples are often required to be collected when a visual model is trained. However, in order to obtain higher quality image training samples, it is generally necessary to manually identify the collected images and to label the image training samples suitable for training the visual model. Although the precision of the screening image training sample is high, the required labor cost is high. Under the condition that the number of the required image training samples is large, the method for screening the image training samples is low in speed, is not suitable for large-scale application and popularization, and is limited in application range.

Disclosure of Invention

The present application is proposed to solve the above-mentioned technical problems. Embodiments of the application provide a method, an apparatus, a computer-readable storage medium and an electronic device for determining training samples.

According to an aspect of an embodiment of the present application, there is provided a method of determining a training sample, including: processing an image to be detected through a first network model to obtain a first processing result; processing the image to be detected through a second network model to obtain a second processing result; and when the difference value between the first processing result and the second processing result meets a first preset condition, determining the image to be detected as a training sample.

According to another aspect of the embodiments of the present application, there is provided a method of determining a training sample, including: acquiring a plurality of images to be detected, wherein each image to be detected comprises a target object; respectively processing the plurality of images to be detected through an image processing model to obtain a plurality of processing results; and when the difference values among the plurality of processing results meet a third preset condition, determining at least one image to be detected in the plurality of images to be detected as a training sample.

According to another aspect of the embodiments of the present application, there is provided an apparatus for determining a training sample, including: the first processing module is used for processing the image to be detected through the first network model to obtain a first processing result; the second processing module is used for processing the image to be detected through a second network model to obtain a second processing result; and the first determining module is used for determining the image to be detected as a training sample when the difference value between the first processing result and the second processing result meets a first preset condition.

According to another aspect of the embodiments of the present application, there is provided an apparatus for determining a training sample, including: the acquisition module is used for acquiring a plurality of images to be detected, and each image to be detected comprises a target object; the third processing module is used for respectively processing the plurality of images to be detected through the image processing model to obtain a plurality of processing results; and the fifth determining module is used for determining at least one image to be detected in the plurality of images to be detected as a training sample when the difference value among the plurality of processing results meets a third preset condition.

The technical scheme provided by the embodiment of the application has the beneficial effects that at least:

the same image to be detected is respectively processed through different neural network models, and if the difference value between the obtained image processing results meets a specific condition, the image to be detected is determined to be a training sample, so that the acquisition efficiency of the training sample can be improved, the labor cost is effectively reduced, and particularly, the training sample can be quickly, accurately and efficiently selected from the image to be detected under the condition of large image data volume. In addition, images meeting specific conditions are selected from the images to be detected to serve as training samples, and any one image to be detected is not used as the training sample, so that the selected training samples have pertinence, and the prediction accuracy of the neural network model can be greatly improved through the training samples.

The method comprises the steps of processing a plurality of images to be detected including target objects through an image processing model, determining the plurality of images to be detected as training samples if difference values among obtained image processing results meet specific conditions, so that the acquisition efficiency of the training samples can be improved, the labor cost is effectively reduced, and particularly, the training samples can be quickly, accurately and efficiently selected from the images to be detected under the condition that the image data volume is large. In addition, images meeting specific conditions are selected from the images to be detected to serve as training samples instead of any one image to be detected serving as a training sample, so that the selected training samples have pertinence, and the prediction accuracy of the image processing model can be greatly improved by training the image processing model through the training samples.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally represent like parts or steps.

FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.

Fig. 2 is a flowchart illustrating a method for determining training samples according to an exemplary embodiment of the present application.

Fig. 3 is a flowchart illustrating a method for determining training samples according to an exemplary embodiment of the present application.

Fig. 4a and fig. 4b are schematic diagrams of a face detection result according to an exemplary embodiment of the present application.

Fig. 5 is a block diagram of an apparatus for determining training samples according to an exemplary embodiment of the present application.

Fig. 6 is a schematic flowchart of an apparatus for determining training samples according to another exemplary embodiment of the present application.

Fig. 7 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein.

Summary of the application

A neural network is an operational model, which is formed by a large number of nodes (or neurons) connected to each other, each node corresponding to a policy function, and the connection between each two nodes representing a weighted value, called weight, for a signal passing through the connection. The neural network generally comprises a plurality of neural network layers, the upper network layer and the lower network layer are mutually cascaded, the output of the ith neural network layer is connected with the input of the (i + 1) th neural network layer, the output of the (i + 1) th neural network layer is connected with the input of the (i + 2) th neural network layer, and the like. After the training samples are input into the cascaded neural network layers, an output result is output through each neural network layer and is used as the input of the next neural network layer, therefore, the output is obtained through calculation of a plurality of neural network layers, the prediction result of the output layer is compared with a real target value, the weight matrix and the strategy function of each layer are adjusted according to the difference condition between the prediction result and the target value, the neural network continuously passes through the adjusting process by using the training samples, so that the parameters such as the weight of the neural network and the like are adjusted until the prediction result of the output of the neural network is consistent with the real target result, and the process is called the training process of the neural network. After the neural network is trained, a neural network model can be obtained.

As described above, in order to obtain an image training sample with higher quality in the prior art, it is generally necessary to manually identify the collected image data and mark an image suitable for training a neural network model from the identified image data as a training sample.

In addition, in the process of detecting images through a neural network model, some images cannot or are difficult to be correctly perceived, so that an incorrect perception result is obtained, and the images are used as training samples and are generally called hard examples or hard samples (hard examples). The neural network model is trained through the difficult samples, and the perception prediction accuracy of the neural network model can be further improved. The manually marked training samples are often not able to determine in advance which training samples are difficult samples and which samples are normal samples.

In view of the technical problem, the basic concept of the present application is to provide a method and an apparatus for determining a training sample, a computer-readable storage medium, and an electronic device, which can process the same image to be detected through different neural network models, or process a plurality of images to be detected including a target object through image processing models, to obtain a plurality of image processing results, and then acquire a training sample according to the plurality of image processing results. Therefore, according to the basic concept, the training sample can be automatically acquired, the acquisition efficiency of the training sample is improved, the labor cost is effectively reduced, and particularly, the training sample can be rapidly, accurately and efficiently selected from the image to be detected under the condition that the image data volume is large. In addition, the difference values among the multiple image processing results meet specific conditions, the selected training samples can be difficult samples, and the prediction accuracy of the neural network model can be greatly improved through the difficult samples.

The method, the device and the equipment for determining the training sample can be applied to various computer vision tasks. For example, taking a computer vision task for image detection to which a method for determining a training sample is applied as an example, a terminal acquires an image to be detected including 3 faces, and processes the image to be detected through a first neural network model and a second neural network model respectively to obtain a first processing result and a second processing result, wherein the first neural network model and the second neural network model are both trained network models, and the first neural network model and the second neural network model have different network scales, sensing capabilities, calculation amounts, and the like. When the difference value between the first processing result and the second processing result is large, for example, the second processing result shows that the image to be detected contains 3 faces, and the first processing result shows that the image to be detected contains 1 face, it is indicated that the image to be detected is an image which is not easily and correctly sensed by the neural network model, and the image to be detected can be used as a difficult sample, so that the image to be detected is obtained as a training sample, the prediction accuracy of the neural network model can be further improved, and particularly, the sensing prediction accuracy of the first neural network model with weak sensing capability can be improved.

It should be noted that, the application of the method for determining a training sample to a computer vision task for image detection is only an example, except that the first neural network model and the second neural network model are used to process the image to be detected to obtain the first processing result and the second processing result, the same neural network model may also be used to process a plurality of images to be detected containing 3 human faces to obtain a plurality of processing results, and when a difference value between the plurality of processing results is large, the image to be detected is obtained as the training sample, which is not limited in the embodiment of the present application.

Taking a computer vision task for image recognition, which applies the method, the device and the equipment for determining the training sample to the image recognition, as an example, a terminal obtains an image to be detected of a face including a target object X, and respectively processes the image to be detected through a first neural network model and a second neural network model to obtain a first processing result and a second processing result, wherein the first neural network model and the second neural network model are both trained network models, and the first neural network model and the second neural network model are different in network scale, sensing capability, calculation amount and the like. When the difference value between the first processing result and the second processing result is large, for example, the second processing result is that the face in the image to be detected is correctly matched with the face of the target object X, and the first processing result is that the face in the image to be detected is matched with the face of the target object Y, it is indicated that the image to be detected is an image which is not easily correctly perceived by the neural network model and can be used as a difficult sample, so that the image to be detected is obtained as a training sample, the prediction accuracy of the neural network model can be further improved, and particularly, the perception prediction accuracy of the first neural network model with weak perception capability can be improved.

Similar to the situation of image detection, the application of the method for determining the training sample to the computer vision task of image recognition is only exemplary, and a plurality of images to be detected of the face containing the target object X can be processed by the same neural network model to obtain a plurality of processing results, and when the difference value between the plurality of processing results is large, the images to be detected are obtained as the training samples.

Taking the application of the method, the device and the equipment for determining the training samples to the computer vision task of image classification as an example, the terminal acquires images to be detected including cats, dogs and people, and respectively processes the images to be detected through a first neural network model and a second neural network model to obtain a first processing result and a second processing result, wherein the first neural network model and the second neural network model are trained network models, and the first neural network model and the second neural network model have different network scales, sensing capabilities, calculated amounts and the like. When the difference value between the first processing result and the second processing result is large, for example, the second processing result correctly identifies the cat, the dog, and the person in the image to be detected as the cat, the dog, and the person, respectively, and the probabilities are 0.91, 0.92, and 0.98, and the first processing result correctly identifies the cat, the dog, and the person in the image to be detected as the cat, and the person, respectively, and the probabilities are 0.68, 0.45, and 0.69, it is described that the image to be detected is an image that is not easily correctly perceived by the neural network model, and can be used as a difficult sample, so that the image to be detected is obtained as a training sample, the prediction accuracy of the neural network model can be further improved, and particularly the perception prediction accuracy of the first neural network model with weak perception capability can be improved.

Similar to the situation of image detection, the application of the method for determining the training sample to the computer vision task of image classification is only exemplary, and a plurality of images to be detected including cats, dogs and people can be processed through the same neural network model to obtain a plurality of processing results, and when the difference value between the plurality of processing results is large, the images to be detected are obtained as the training samples.

The computer vision task for determining the training sample through the terminal device is only an example, and may be applied to other computer vision tasks such as image segmentation, key point detection and the like besides image detection, image recognition and image classification, and the embodiment of the present application is not limited thereto.

In addition, the method, the apparatus, and the device for determining the training samples according to the embodiments of the present application can be applied to various application scenarios, such as: a security monitoring scene, a face unlocking scene, an intelligent driving or remote sensing scene, and the like, which are not limited in the embodiments of the present application.

It should be noted that, in addition to obtaining the training sample through the terminal device, the embodiment of the present application may also obtain the training sample through a server, or a cooperation between the terminal device and the server, or the like.

Exemplary System

FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application. The implementation environment includes: a server 140 and a plurality of terminal devices 110, 120, 130.

The terminals 110, 120, 130 may be mobile terminal devices such as a mobile phone, a game console, a tablet Computer, a camera, a video camera, and a car-mounted Computer, or the terminals 110, 120, 130 may be Personal Computers (PCs), such as a laptop portable Computer and a desktop Computer. Those skilled in the art will appreciate that the types of terminals 110, 120, 130 described above may be the same or different, and that the number may be greater or fewer. For example, the number of the terminals may be one, or several tens or hundreds of the terminals, or more. The number of terminals and the type of the device are not limited in the embodiments of the present application.

The terminals 110, 120, 130 and the server 140 are connected via a communication network. Optionally, the communication network is a wired network or a wireless network.

The server 140 is a server, or is composed of a plurality of servers, or is a virtualization platform, or is a cloud computing service center. In some optional embodiments, the server 140 receives training samples collected by the terminals 110, 120, 130, and trains the neural network model through the training samples to update the neural network model. However, this is not limited in this embodiment, and in alternative embodiments, the terminals 110, 120, 130 collect training samples and train the neural network model through the training samples to update the neural network model.

Based on the system architecture shown in fig. 1, the method provided by the embodiment of the present application can be applied to the above-mentioned multiple computer vision tasks for obtaining training samples.

Exemplary method

Fig. 2 is a flowchart illustrating a method for determining training samples according to an exemplary embodiment of the present application. The method may be applied to the implementation environment provided in fig. 1 and executed by the terminal device 110, 120, or 130 shown in fig. 1, but the embodiment of the present application is not limited thereto, and the method may also be executed by the server 140. An exemplary embodiment of the present application will be described below, taking as an example the method performed by the terminal device. In an embodiment, at least one neural network model is deployed in the terminal device 110, 120 or 130, and functions such as image perception and decision making are performed through the neural network model. Illustratively, the neural network model is formed through extensive data training.

As shown in fig. 2, the method may include the following

steps

210, 220, and 230.

Step 210, processing the image to be detected through the first network model to obtain a first processing result.

The image to be detected is obtained by shooting a target object through a terminal device, or is obtained from data stored locally, or can be obtained from the internet, and the like, which is not limited in the present application. For example, in the process of obtaining an image to be detected by shooting a target object, the terminal device may call the camera component to shoot the target object, and use the shot image or a certain frame of image in the shot video stream as the image to be detected. The camera assembly may include: a camera arranged on the terminal device or a camera device connected with the terminal device.

In an embodiment, a first network model, such as a neural network model, is pre-deployed in the terminal device. Optionally, the first Network model may be a Convolutional Neural Network (CNN), a Deep Neural Network (DNN), a Recurrent Neural Network (RNN), or the like, and the type of the first Network model is not limited in this embodiment of the application.

The image to be detected can be processed through the first network model, and a first processing result is obtained. The image processing is, for example, the image classification, the image recognition, the image detection, and the like described above, and accordingly, the first processing result may be a first classification result, a first recognition result, a first detection result, and the like. However, the embodiment of the present application is not limited thereto, and the image processing may also be other computer vision tasks such as image segmentation and key point detection.

And step 220, processing the image to be detected through a second network model to obtain a second processing result.

In an embodiment, a second network model is also pre-deployed in the terminal device, where the second network model may be a neural network model of the same type as or a different type from the first network model, and the type of the second network model is not limited in this embodiment of the application. For example, the first network model and the second network model may both be DNNs, or the first network model is DNN and the second network model is CNN.

In one embodiment, the first network model and the second network model differ in network size, perceptual capability, computational complexity, and the like. The second network model may be a neural network with a larger scale, and the network scale of the neural network model is larger than that of the first network model. The image to be detected can be processed through the second network model, and a second processing result is obtained. The first network model and the second network model perform the same image processing on the image to be detected, which may be referred to above specifically and will not be described herein again.

Step 230, when the difference value between the first processing result and the second processing result meets a first preset condition, determining the image to be detected as a training sample.

Generally speaking, the same image to be detected is processed by different neural network models, and the obtained processing results should be similar or identical. However, when the image to be detected is subjected to image processing by the first network model and the second network model which are different, a difference value exists between the obtained first processing result and the second processing result, and the difference value meets a first preset condition, the image to be detected can be determined as a training sample. The neural network model is further trained through the training sample, so that the prediction accuracy of the neural network model can be improved.

According to the method for determining the training sample, the same image to be detected is processed through different neural network models respectively, different processing results are obtained, whether the image to be detected is the training sample is determined according to the difference value between the processing results, the training sample can be obtained based on actual business scenes, the acquisition range of the training sample is enlarged, the acquisition efficiency of the training sample is improved, the labor cost is effectively reduced, and particularly under the condition that the image data volume is large, the training sample can be rapidly, accurately and efficiently selected from the image to be detected. In addition, the difference value between the first processing result and the second processing result accords with the specific condition, the selected training sample can be a difficult sample which is difficult to perceive, the neural network model is trained through the difficult sample, the prediction accuracy of the neural network model can be greatly improved, and particularly the prediction accuracy of the neural network model with weak perception prediction accuracy can be remarkably improved.

In some embodiments of the present application, the number of parameters of the first network model is less than or equal to the number of parameters of the second network model, or the number of network layers of the first network model is less than or equal to the number of network layers of the second network model. In another embodiment, the number of parameters of the first network model is less than or equal to the number of parameters of the second network model, and the number of network layers of the first network model is less than or equal to the number of network layers of the second network model.

Generally, the larger the network size of the neural network model is, the stronger the perception capability thereof is, and the larger the calculation amount is. And the number of parameters and the number of network layers of the neural network model can be used for representing the network scale of the neural network model. The larger the number of parameters, the larger the scale of the neural network model; the larger the number of network layers, the larger the scale of the neural network model. Particularly, in the case of the same network structure, the larger the number of parameters or the larger the number of network layers, the larger the scale of the neural network, the stronger the perception capability, and the larger the calculation overhead. For example, for Deep Residual networks (Deep Residual networks, resnets), there are 50, 101, and 152 Network layers for ResNet50, ResNet101, and ResNet152, respectively, and the sizes of ResNet50, ResNet101, and ResNet152 increase in order, the sensing capability also increases in order, and the computational overhead involved also increases in order. Therefore, the accuracy of ResNet152 is higher than that of ResNet50 in image recognition, image classification and image detection, and the accuracy of perceptual prediction is better.

Alternatively, the network structure of the second network model may be designed according to computer vision tasks, or the network structure of the second network model may adopt at least a part of the existing network structure, for example: network models with larger Network scales, such as Deep Residual Network (ResNet) or Dense Convolutional Network (densnet), and the Network structure of the second Network model is not limited in the embodiment of the present disclosure. Alternatively, the network structure of the first network model may be designed according to computer vision tasks, or the network structure of the first network model may adopt at least a part of an existing network structure, such as: network models with smaller network scale, such as google network (google net), squeezet, MobileNet or ShuffleNet, are not limited in the network structure of the first network model in the embodiment of the present disclosure.

In an embodiment, the network structure of the first network model and the network structure of the second network model may be the same, for example, the first network model and the second network model are both ResNet, the first network model is ResNet50, the second network model is ResNet152, the number of parameters of the first network model is less than or equal to the number of parameters of the second network model, and the number of network layers of the first network model is less than or equal to the number of network layers of the second network model. Alternatively, the network structure of the first network model may be different from the network structure of the second network model, for example, the second network model is ResNet, the first network model is google lenet, the number of parameters of the first network model is less than or equal to the number of parameters of the second network model, and the number of network layers of the first network model is less than or equal to the number of network layers of the second network model.

In other embodiments of the present application, the number of parameters of the first network model is less than or equal to the number of parameters of the second network model, or the number of network layers of the first network model is less than or equal to the number of network layers of the second network model, that is, the scale of the first network model is less than or equal to the network scale of the second network model. In addition, in the Knowledge Distillation (Knowledge Distillation) process, a Teacher Network (Teacher Network) is used for training through a supervision Student Network (Student Network), so that Knowledge migration (Knowledge transfer) is realized, wherein the Teacher Network is a neural Network model with a complex Network structure or a large Network scale but high perception prediction accuracy, and the Student Network is a neural Network model with a simplified Network structure or a small Network scale and low complexity. In one embodiment, the second network model may be, for example, a teacher network, the first network model may be a student network, the teacher network is larger than the student network in size, the teacher network has a larger number of parameters and a larger number of network layers, and the perceived accuracy is higher.

The same image to be detected is processed through the second network model with the larger network scale and the first network model with the smaller network scale, if the difference value of the obtained processing result meets the preset condition, the image to be detected can be obtained as a training sample for training the neural network model with the poorer recognition accuracy, so that the prediction accuracy is improved. Generally, the perception accuracy of the first network model with the smaller network scale is lower, so that the first network model with the smaller network scale can be trained through the training samples obtained in the embodiment of the application, the prediction accuracy of the first network model is improved, and the prediction accuracy of the first network model is similar to that of the second network model with the larger network scale. In addition, the first network model with a smaller network size is more suitable for being deployed in a terminal device due to small computation cost and smaller network structure or complexity. Therefore, the prediction accuracy of the terminal device can be improved by improving the prediction accuracy of the first network model with smaller scale deployed in the terminal device.

On the basis of the embodiment shown in fig. 2, the method for determining a training sample provided in the embodiment of the present application may further include the steps of: and acquiring a second network model, wherein the second network model is obtained by integrating the first network model and at least one neural network model different from the first network model.

In some embodiments of the present application, by integrating multiple neural network models, an integrated neural network model with more parameters and/or more network layers than a single neural network model may be obtained. The network scale of the integrated neural network model is larger than that of a single neural network model, so that the perception prediction accuracy is higher, and meanwhile, the calculated amount is generally larger.

The process of integrating the first network model and at least one neural network model different from the first network model to obtain the second network model is performed on a server, but the embodiment of the present application is not limited thereto, and may also be performed on a terminal device with higher prediction accuracy, for example, a PC, according to the prediction accuracy of the terminal. Illustratively, the first network model may be google lenet and the at least one neural network model that is different from the first network model is ShuffleNet and ResNet. Alternatively, the first network model may also be, for example, ResNet50, and the at least one neural network model different from the first network model is ResNet101 and Res Net 152. The number of the at least one neural network model different from the first network model is not limited in the embodiments of the present application, and may be one or more than one. The number of network layers of a second network model network obtained by integrating the first network model and at least one neural network model different from the first network model is larger than that of the first network model, and the number of parameters of the second network model network is larger than that of the first network model.

The second network model may be obtained by a simple integration method, and specifically, the results of the first network model and the at least one neural network model different from the first network model may be synthesized, and the average of the results of the first network model and the at least one neural network model different from the first network model is used as the result of the second network model, or the result of the second network model is voted by the first network model and the at least one neural network model different from the first network model.

For example, the first network model and at least one neural network model different from the first network model may be integrated by a Random Sub Space (Random Sub Space) integration method, a Selective integration (Selective environment) and the like to obtain the second network model, where the integration method of the neural network model in the embodiment of the present application is not limited.

On the basis of the embodiment shown in fig. 2, the method for determining a training sample according to the embodiment of the present application may further include: when the first network model and the second network model are image detection models, calculating the intersection ratio between the first processing result and the second processing result; and when the intersection ratio between the first processing result and the second processing result is less than or equal to a preset value, determining that the difference value between the first processing result and the second processing result meets the first preset condition.

In some embodiments of the present application, when the first network model and the second network model are image detection models, the method for determining training samples is applied to the computer vision task processing process of image detection described above. Referring to fig. 4a and 4b, taking face detection as an example, the image to be detected includes 2 faces, and the image to be detected is processed through the first network model and the second network model respectively to obtain a first processing result and a second processing result, where the first processing result and the second processing result may be face detection frames. Exemplarily, as shown in fig. 4a, the first processing result shows a face of a person including a target object B in an image to be detected; the second processing result is shown in fig. 4B, and the image to be detected includes the face of the target object a and the face of the target object B. The face features in the face detection frame S2 of the target object B displayed by the first processing result, and the face features in the face detection frame S1 of the target object a and the face detection frame S2' of the target object B displayed by the second processing result are calculated, and by comparing the euclidean distances between the face features in the first processing result and the face features in the second processing result, it can be determined whether the euclidean distance between the first processing result and the second processing result meets the first preset condition.

In some embodiments, calculating an Intersection over Union (IoU) between the first processing result and the second processing result, that is, calculating a ratio of an Intersection and a Union between the face detection frame S2 obtained from the first processing result and the face detection frames S1 and S2' obtained from the second processing result, is equivalent to a result obtained by dividing a part of the overlapping detection regions of the first processing result and the second processing result by a set of the detection regions of the first processing result and the second processing result.

And when the intersection ratio between the first processing result and the second processing result is smaller than or equal to a preset value, determining that the difference value between the first processing result and the second processing result is larger and accords with the first preset condition, acquiring the image to be detected as a training sample at the moment, wherein the preset value is a number smaller than 1 and larger than 0. And when the intersection ratio between the first processing result and the second processing result is greater than a preset value, determining that the difference value between the first processing result and the second processing result is smaller and does not accord with the first preset condition, and not acquiring the image to be detected as a training sample. In an embodiment, the preset value is 0.7, that is, when the intersection ratio between the first processing result and the second processing result is less than or equal to 0.7, it is determined that the difference value between the first processing result and the second processing result is larger and meets the first preset condition, and at this time, the image to be detected is obtained as a training sample, but the embodiment of the present application is not limited thereto, and the preset value may be set to other values, such as 0.88, 0.68 and the like, according to different practical application situations. In other embodiments, when the number of detection frames of the first processing result is different from that of the second processing result, it is directly determined that the intersection ratio between the first processing result and the second processing result is less than or equal to the preset value.

The method for determining the training sample provided by the embodiment of the application may further include: when the first network model and the second network model are image classification models or image recognition models, calculating whether the first processing result is the same as the second processing result; when the first processing result is different from the second processing result, determining that a difference value between the first processing result and the second processing result meets the first preset condition.

In some embodiments of the present application, when the first network model and the second network model are image recognition models, the method for determining training samples is applied to the computer vision task processing process of image recognition described above. Referring to the foregoing, the terminal acquires an image to be detected including a face of the target object X, and processes the image to be detected through the first network model and the second network model, respectively, to obtain a first processing result and a second processing result. For example, the first processing result matches the face in the image to be detected as the face of the target object Y, the second processing result correctly matches the face in the image to be detected as the face of the target object X, at this time, it is determined that the first processing result is different from the second processing result, a difference value between the first processing result and the second processing result meets a first preset condition, and the image to be detected is acquired as the training sample. The first network model is trained through the training samples, so that the prediction accuracy of the first network model can be improved.

In addition, when the first processing result and the second processing result both correctly match the face in the image to be detected as the face of the target object X, the first processing result and the second processing result are the same, and at this time, the image to be detected may not be acquired as the training sample.

In some embodiments of the present application, when the first network model and the second network model are image classification models, the method for determining training samples is applied to the computer vision task processing process of image classification described above. Referring to the foregoing, the terminal device obtains images to be detected including a cat, a dog, and a person, and processes the images to be detected through the first network model and the second network model respectively to obtain a first processing result and a second processing result. For example, the second processing result correctly identifies the cat, the dog and the person in the image to be detected as the cat, the dog and the person respectively, and the first processing result identifies the cat, the dog and the person in the image to be detected as the cat, the cat and the person respectively. The first network model is trained through the training samples, so that the prediction accuracy of the first network model can be improved.

In some embodiments of the present application, the method for determining a training sample in the embodiments of the present application further includes: when the first network model and the second network model are image classification models, calculating whether the first processing result is the same as the second processing result; when the first processing result is the same as the second processing result, and a difference value between a first probability of the first processing result and a second probability of the second processing result meets a second preset condition, determining that the difference value between the first processing result and the second processing result meets the first preset condition, wherein the first probability is a probability when the first network model identifies the image to be detected as the first processing result, and the second probability is a probability when the second network model identifies the image to be detected as the second processing result.

Illustratively, the first processing result and the second processing result are the same, for example, the cat, the dog, and the person in the image to be detected are correctly identified as the cat, the dog, and the person, respectively, and the probabilities of the first processing result identifying the cat, the dog, and the person are 0.68, 0.45, and 0.69, respectively, and the probabilities of the second processing result identifying the cat, the dog, and the person are 0.91, 0.92, and 0.98, respectively, at this time, it is determined that the difference value between the first processing result and the second processing result meets the second preset condition, for example, the difference value between the probability values of the first processing result and the second processing result classified as the person is greater than the preset probability value, thereby determining that the difference value between the first processing result and the second processing result meets the first preset condition, and acquiring the image to be detected as the training sample. The value of the preset probability value is not limited, and can be set according to specific application conditions. The first network model is trained through the training samples, so that the prediction accuracy of the first network model can be improved.

Fig. 3 is a flowchart illustrating a method for determining training samples according to another exemplary embodiment of the present application. The method may be applied to the implementation environment provided in fig. 1 and executed by the terminal device 110, 120, or 130 shown in fig. 1, but the embodiment of the present application is not limited thereto, and the method may also be executed by the server 140. An exemplary embodiment of the present application will be described below, taking as an example the method performed by the terminal device. In an embodiment, at least one neural network model is deployed in the terminal device 110, 120 or 130, and functions such as image perception and decision making are performed through the neural network model. Illustratively, the neural network model is formed through extensive data training.

As shown in fig. 3, the method may include the following

steps

310, 320, and 330.

Step 310, a plurality of images to be detected are obtained, wherein each image to be detected comprises a target object.

The plurality of images to be detected are obtained by shooting the target object through the terminal device, or obtained from locally stored data, or obtained from the internet, and the like, which is not limited in the present application.

The target object may be an animal such as a cat or a dog, an object such as a building or a vehicle, a person, or the like, and the number of the target objects may be one or more, which is not limited in the embodiment of the present application.

And 320, processing the multiple images to be detected respectively through the image processing model to obtain multiple processing results.

In one embodiment, an image processing model, such as a neural network model, is pre-deployed in the terminal device. Optionally, the first Network model may be a Convolutional Neural Network (CNN), a Deep Neural Network (DNN), a Recurrent Neural Network (RNN), or the like, and the type of the first Network model is not limited in this embodiment of the application.

And carrying out image processing on the plurality of images to be detected through the image processing model to obtain a plurality of processing results. The image processing is, for example, the image classification, the image recognition, the image detection, and the like described above, and accordingly, the plurality of processing results may be classification results, recognition results, detection results, and the like. However, the embodiment of the present application is not limited thereto, and the image processing may also be other computer vision tasks such as image segmentation and key point detection.

And 330, determining at least one image to be detected in the plurality of images to be detected as a training sample when the difference values among the plurality of processing results meet a third preset condition.

Generally, a plurality of images to be detected containing the same target object are processed through an image processing model, and the obtained processing results should be similar or identical. However, when a plurality of images to be detected containing the same target object are subjected to image processing by the same image processing model, a difference value exists between a plurality of obtained processing results, and the difference value meets a third preset condition, the images to be detected can be determined as training samples. In some embodiments, the third predetermined condition is the same as the first predetermined condition. In other embodiments, the third predetermined condition is different from the first predetermined condition. The image processing model is further trained through the training sample, so that the prediction accuracy of the image processing model can be improved.

According to the method for determining the training sample, the image processing model is used for processing a plurality of images to be detected containing the same target object, a plurality of processing results are obtained, the training sample is determined according to the difference value between the plurality of processing results, the training sample can be automatically obtained, the acquisition efficiency of the training sample is improved, the labor cost is effectively reduced, and particularly, the training sample can be rapidly, accurately and efficiently selected from the images to be detected under the condition that the image data volume is large. In addition, the difference values among the multiple image processing results accord with specific conditions, the selected training samples can be difficult samples which are not easy to perceive, the image processing model is trained through the difficult samples, and the prediction accuracy of the image processing model can be greatly improved.

In some embodiments, the step 310 may include: and shooting the target object through a plurality of terminal devices respectively to obtain the plurality of images to be detected.

In an exemplary process that a plurality of terminal devices obtain a plurality of images to be detected by shooting a target object, each terminal device may call a camera module to shoot the target object, and take the plurality of shot images or a plurality of frame images in a video stream as the plurality of images to be detected. The camera assembly may include: a camera arranged on the terminal device or a camera device connected with the terminal device.

In an embodiment, a plurality of terminal devices send a plurality of images to be detected obtained by shooting to the terminal device executing the method for determining the training sample provided by the embodiment of the application, so as to perform subsequent operations. The terminal device executing the method for determining the training sample provided in the embodiment of the present application may be any one of the plurality of terminal devices, or another terminal device, which is not limited in the embodiment of the present application.

In other embodiments, step 310 may include: acquiring an original image to be detected, wherein the original image to be detected comprises the target object; and carrying out image transformation on the original image to be detected, and acquiring the original image to be detected and the original image to be detected after image transformation into the plurality of images to be detected.

In an embodiment, the original image to be detected is obtained by shooting the target object through a terminal device, or is obtained from data stored locally, or may also be obtained from the internet, and the like, which is not limited in this application.

In some embodiments, by performing image transformation on the original image to be detected, a plurality of image-transformed images similar to the original image to be detected can be obtained. For example, the image after image transformation and the original image to be detected both include the target object.

In some embodiments, image transforming the original image to be detected comprises: and performing at least one of image turning, image cutting, color transformation and noise addition on the original image to be detected. However, the embodiment of the present application is not limited thereto, and other image transformation operations may be performed on the original image to be detected. This Image transformation operation may be implemented, for example, by adjusting the focal length of a camera on the terminal device, or by an Image Signal Processing (ISP) algorithm.

On the basis of the embodiment shown in fig. 3, the method for determining a training sample according to the embodiment of the present application may further include: when the image processing model is the image detection model, calculating the intersection ratio among a plurality of processing results; and when the intersection ratio smaller than or equal to a preset value exists, determining that the difference value among the plurality of processing results meets the third preset condition.

In an embodiment, when two processing results exist in the plurality of processing results and the intersection ratio between the two processing results is less than or equal to a preset value, it is determined that the difference value between the first processing result and the second processing result is larger and meets the third preset condition, and at this time, the image to be detected is acquired as a training sample. The preset value is a number smaller than 1 and larger than 0, or 0 is smaller than or equal to the preset value and smaller than or equal to 1. And when the intersection ratio of any two processing results in the plurality of processing results is greater than a preset value, determining that the difference value between the plurality of processing results is smaller and does not accord with the third preset condition, and not acquiring the image to be detected as a training sample.

In some embodiments of the present application, when the image processing model is an image detection model, the method for determining the training samples can be applied to the computer vision task processing process of image detection described above. Taking human face detection as an example, assuming that the number of images to be detected is 3, and all the images comprise human faces C and D, the 3 images to be detected are processed through image processing models respectively, and 3 processing results are obtained. The processing results are that the image to be detected comprises a human face C and a human face D, the image to be detected comprises the human face C and the human face D, and the image to be detected does not comprise the human face. For example, by comparing the intersection ratio between any two of the plurality of processing results, it may be determined whether the difference value between the plurality of processing results meets the third preset condition. For example, the intersection set between the processing results including the face C and the face D in the image to be detected and the processing results without the detected face is 0, the union set is the face detection frame C and the face detection frame D, the intersection set is divided by the union set to obtain an intersection ratio of the face C and the face D, the intersection ratio is 0, the preset value is 0.5, the intersection ratio is smaller than the preset value, therefore, it can be determined that the difference values between the processing results meet a third preset condition, and at least one of the multiple images to be detected including the face C and the face D is acquired as a training sample.

In an embodiment, the preset value is 0.7, but the embodiment of the present application is not limited thereto, and the preset value may be set to other values, for example, 0.88, 0.68, and the like, according to different practical application situations. In other embodiments, when the number of detection frames of the plurality of processing results is different, it is directly determined that there is an intersection ratio smaller than or equal to a preset value.

The method for determining the training sample provided by the embodiment of the application may further include: when the image processing model is the image recognition model, calculating whether any two processing results in the plurality of processing results are the same; when two processing results in the plurality of processing results are different, determining that a difference value between the plurality of processing results meets the third preset condition.

In some embodiments of the present application, when the image processing model is an image recognition model, the method for determining training samples is applied to the computer vision task processing process of image recognition described above. Referring to the foregoing, the terminal acquires a plurality of images to be detected including a face of the target object X, and processes the plurality of images to be detected respectively through the same image processing model to obtain a plurality of processing results. For example, when one of the processing results matches the face in the image to be detected as the face of the target object Y, and the other processing result correctly matches the face in the image to be detected as the face of the target object X, it is determined that two of the processing results are different, a difference value between the processing results is large, and the difference value meets a third preset condition, at least one of the processing results is obtained as a training sample, and the image processing model is trained to improve the prediction accuracy of the image processing model. And when the processing results are the same, determining that the difference value between the processing results does not meet a third preset condition.

In some embodiments of the present application, when the image processing model is an image classification model, the method for determining training samples is applied to the computer vision task processing process of image classification described above. In the former reference, the terminal device acquires a plurality of images to be detected including cats, dogs and people, and the plurality of images to be detected are respectively processed through the image processing model to obtain a plurality of processing results. For example, one of the processing results correctly identifies the cat, the dog and the person in the image to be detected as the cat, the dog and the person, and the first processing result identifies the cat, the dog and the person in the image to be detected as the cat, the dog and the person, respectively, it is determined that two processing results are different in the plurality of processing results, the difference value between the plurality of processing results is large and accords with a third preset condition, and at least one of the plurality of images to be detected is acquired as a training sample.

In some embodiments of the present application, the method for determining a training sample in the embodiments of the present application further includes: when the image processing model is an image classification model, calculating whether any two processing results in the plurality of processing results are the same; and when any two processing results in the plurality of processing results are the same and the difference value between the probabilities of any two processing results meets a fourth preset condition, determining that the difference value between the plurality of processing results meets the third preset condition, wherein the probability of the processing results is the probability when the image classification model identifies the image to be detected as the corresponding processing result.

Illustratively, the multiple processing results are the same, the cat, the dog and the person in the image to be detected are correctly identified as the cat, the dog and the person respectively, but the probabilities of identifying one of the processing results as the cat, the dog and the person are respectively 0.68, 0.45 and 0.69, and the probabilities of identifying the other one of the processing results as the cat, the dog and the person are respectively 0.91, 0.92 and 0.98, it is determined that the difference between the probabilities of two processing results in the multiple processing results is larger and meets a fourth preset condition, for example, the difference between the probability values of the two classified as the persons is larger than the preset probability value, so that it is determined that the difference between the multiple processing results meets the third preset condition, the image to be detected is obtained as a training sample, and the image processing model is trained to improve the prediction accuracy of the image processing model.

Exemplary devices

The embodiment of the device can be used for executing the embodiment of the method. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Referring to fig. 5, a block diagram of an apparatus for determining training samples according to an exemplary embodiment of the present application is shown. The device has the function of implementing the embodiment shown in fig. 2, and the function may be implemented by hardware, or may be implemented by hardware executing corresponding software. The apparatus may include: a first processing module 510, a second processing module 520, a first determining module 530.

The first processing module 510 is configured to process the image to be detected through the first network model to obtain a first processing result.

The second processing module 520 is configured to process the image to be detected through the second network model to obtain a second processing result.

A first determining module 530, configured to determine the image to be detected as a training sample when a difference value between the first processing result and the second processing result meets a first preset condition.

In an alternative embodiment provided based on the embodiment shown in fig. 5, the number of parameters of the first network model is smaller than or equal to the number of parameters of the second network model, and the number of network layers of the first network model is smaller than or equal to the number of network layers of the second network model.

In an optional embodiment provided based on the embodiment shown in fig. 5, the apparatus for determining a training sample further includes: a network model acquisition module 540.

A network model obtaining module 540, configured to obtain the second network model, where the second network model is obtained by integrating the first network model and at least one neural network model different from the first network model.

In some embodiments provided based on the embodiment shown in fig. 5, the apparatus for determining training samples further comprises: a first calculation module 550 and a second determination module 560.

A first calculating module 550, configured to calculate an intersection ratio between the first processing result and the second processing result when the first network model and the second network model are image detection models.

A second determining module 560, configured to determine that a difference value between the first processing result and the second processing result meets the first preset condition when an intersection ratio between the first processing result and the second processing result is greater than or equal to a preset value.

In some embodiments provided based on the embodiment shown in fig. 5, the apparatus for determining training samples further comprises: a second calculation module 570 and a third determination module 580.

A second calculating module 570, configured to calculate whether the first processing result is the same as the second processing result when the first network model and the second network model are image classification models or image recognition models.

A third determining module 580, configured to determine that a difference value between the first processing result and the second processing result meets the first preset condition when the first processing result is different from the second processing result.

In some embodiments provided based on the embodiment shown in fig. 5, the apparatus for determining training samples further comprises: a third calculation module 590 and a fourth determination module 5100.

A third calculating module 590, configured to calculate whether the first processing result is the same as the second processing result when the first network model and the second network model are image classification models.

The fourth determining module 5100 is configured to determine that a difference value between the first processing result and the second processing result meets a first preset condition when the first processing result is the same as the second processing result and a difference value between a first probability of the first processing result and a second probability of the second processing result meets a second preset condition, where the first probability is a probability that the first network model identifies the image to be detected as the first processing result, and the second probability is a probability that the second network model identifies the image to be detected as the second processing result.

The first calculating module 550, the second calculating module 570 and the third calculating module 590 may be actually the same software or hardware module, or may be different software or hardware modules, which is not limited in the embodiment of the present application. The second determining module 560, the third determining module 580 and the fourth determining module 5100 may be actually the same software or hardware module, or may be different software or hardware modules, which is not limited in this embodiment of the present application.

The device for determining the training samples provided by the embodiment of the application processes the same to-be-detected image through different neural network models respectively, obtains different image processing results, determines the training samples according to difference values between the image processing results, can realize automatic acquisition of the training samples, improves the acquisition efficiency of the training samples, and effectively reduces the labor cost. In addition, the difference values among the multiple image processing results accord with specific conditions, the selected training samples can be difficult samples which are not easy to perceive, the neural network model is trained through the difficult samples, and the prediction accuracy of the neural network model can be greatly improved.

Fig. 6 is a schematic flowchart of an apparatus for determining training samples according to another exemplary embodiment of the present application. The apparatus has the function of implementing the embodiment in fig. 3, and the function may be implemented by hardware, or may be implemented by hardware executing corresponding software. The apparatus may include: an obtaining module 610, a third processing module 620, and a fifth determining module 630.

The acquiring module 610 is configured to acquire a plurality of images to be detected, where each image to be detected includes a target object;

the third processing module 620 is configured to process the multiple images to be detected through the image processing model, respectively, to obtain multiple processing results;

a fifth determining module 630, configured to determine at least one to-be-detected image of the multiple to-be-detected images as a training sample when a difference value between the multiple processing results meets a third preset condition.

In some embodiments provided based on the embodiment shown in fig. 6, the obtaining module 610 includes: a receiving unit 640.

And the shooting unit 640 is used for shooting the target object through a plurality of terminal devices respectively to obtain the plurality of images to be detected.

In other embodiments provided based on the embodiment shown in fig. 6, the obtaining module 610 includes: an original image acquisition unit 650 and an image transformation unit 660.

An original image obtaining unit 650, configured to obtain an original image to be detected, where the original image to be detected includes the target object.

The image transformation unit 660 is configured to perform image transformation on the original image to be detected, and acquire the original image to be detected and the original image to be detected after image transformation as the multiple images to be detected.

In some embodiments provided based on the embodiment shown in fig. 6, the image transformation unit 660 is further configured to: and performing at least one image transformation operation of image turning, image cutting, color transformation and noise addition on the original image to be detected.

In some embodiments provided based on the embodiment shown in fig. 6, the apparatus for determining training samples further comprises: a fifth calculation module 670 and a sixth determination module 680.

A fifth calculating module 670, configured to calculate an intersection ratio between any two of the plurality of processing results when the image processing model is an image detection model.

A sixth determining module 680, configured to determine that a difference value between the multiple processing results meets the third preset condition when there is an intersection ratio greater than or equal to a preset value.

In some embodiments provided based on the embodiment shown in fig. 6, the apparatus for determining training samples further comprises: a sixth calculation module 690 and a seventh determination module 6100.

A sixth calculating module 690, configured to calculate whether any two processing results of the plurality of processing results are the same when the image processing model is an image classification model or an image recognition model.

A seventh determining module 6100, configured to determine that a difference value between the plurality of processing results meets the third preset condition when two of the plurality of processing results are different.

In some embodiments provided based on the embodiment shown in fig. 6, the apparatus for determining training samples further comprises: a seventh calculation module 6110 and an eighth determination module 6120.

A seventh calculating module 6110, configured to calculate whether any two processing results in the multiple processing results are the same when the image processing model is an image classification model;

an eighth determining module 6120, configured to determine that the difference value between the multiple processing results meets a third preset condition when any two processing results in the multiple processing results are the same and the difference value between the probabilities of any two processing results meets a fourth preset condition, where the probability of the processing result is a probability when the image classification model identifies the image to be detected as the corresponding processing result.

It should be noted that the fifth calculation module 670, the sixth calculation module 690, and the seventh calculation module 6110 may be actually the same software or hardware module, or may be different software or hardware modules, which is not limited in this embodiment of the present application. The sixth determining module 680, the seventh determining module 6100, and the eighth determining module 6120 may be actually the same software or hardware module, or may be different software or hardware modules, which is not limited in this embodiment of the present invention.

According to the device for determining the training samples, the image processing model is used for processing a plurality of images to be detected containing the same target object, a plurality of processing results are obtained, the training samples are determined according to the difference values among the plurality of processing results, the training samples can be automatically obtained, the acquisition efficiency of the training samples is improved, and the labor cost is effectively reduced. In addition, the difference values among the multiple image processing results accord with specific conditions, the selected training samples can be difficult samples which are not easy to perceive, the image processing model is trained through the difficult samples, and the prediction accuracy of the image processing model can be greatly improved.

Exemplary electronic device

Next, an electronic apparatus according to an embodiment of the present application is described with reference to fig. 7. FIG. 7 illustrates a block diagram of an electronic device in accordance with an embodiment of the present application.

As shown in fig. 7, the electronic device 10 includes one or more processors 11 and memory 12.

The processor 11 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.

Memory 12 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by the processor 11 to implement the sound source localization methods of the various embodiments of the present application described above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.

In one example, the electronic device 10 may further include: an input device 13 and an output device 14, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

The input means 13 may be, for example, a microphone or a microphone array as described above for capturing an input signal of a sound source. The input means 13 may be a communication network connector when the electronic device is a stand-alone device.

The input device 13 may also include, for example, a keyboard, a mouse, and the like.

The output device 14 may output various information including the determined distance information, direction information, and the like to the outside. The output devices 14 may include, for example, a display, speakers, a printer, and a communication network and its connected remote output devices, among others.

Of course, for the sake of simplicity, only some of the components of the electronic device 7 relevant to the present application are shown in fig. 7, and components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 7 may comprise any other suitable components, depending on the specific application.

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and apparatus, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the method of determining training samples according to various embodiments of the present application described in the "exemplary methods" section above of this specification.

The computer program product may be written with program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present application may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform steps in a method of determining training samples according to various embodiments of the present application described in the "exemplary methods" section above in this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.

The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A method of determining training samples, comprising:

processing an image to be detected through a first network model to obtain a first processing result;

processing the image to be detected through a second network model to obtain a second processing result;

and when the difference value between the first processing result and the second processing result meets a first preset condition, determining the image to be detected as a training sample.

2. The method of claim 1, wherein the number of parameters of the first network model is less than or equal to the number of parameters of the second network model, and the number of network layers of the first network model is less than or equal to the number of network layers of the second network model.

3. The method of claim 1, further comprising:

and acquiring the second network model, wherein the second network model is obtained by integrating the first network model and at least one neural network model different from the first network model.

4. The method of any of claims 1-3, further comprising:

when the first network model and the second network model are image detection models, calculating an intersection ratio between the first processing result and the second processing result;

and when the intersection ratio between the first processing result and the second processing result is less than or equal to a preset value, determining that the difference value between the first processing result and the second processing result meets the first preset condition.

5. The method of any of claims 1-3, further comprising:

when the first network model and the second network model are image classification models or image recognition models, calculating whether the first processing result is the same as the second processing result;

when the first processing result is different from the second processing result, determining that a difference value between the first processing result and the second processing result meets the first preset condition.

6. The method of any of claims 1-3, further comprising:

when the first network model and the second network model are image classification models, calculating whether the first processing result is the same as the second processing result;

and when the first processing result is the same as the second processing result, and a difference value between a first probability of the first processing result and a second probability of the second processing result meets a second preset condition, determining that the difference value between the first processing result and the second processing result meets the first preset condition, wherein the first probability is a probability when the first network model identifies the image to be detected as the first processing result, and the second probability is a probability when the second network model identifies the image to be detected as the second processing result.

7. A method of determining training samples, comprising:

acquiring a plurality of images to be detected, wherein each image to be detected comprises a target object;

respectively processing the plurality of images to be detected through an image processing model to obtain a plurality of processing results;

and when the difference values among the plurality of processing results meet a third preset condition, determining at least one image to be detected in the plurality of images to be detected as a training sample.

8. The method according to claim 7, wherein the acquiring a plurality of images to be detected, each image to be detected including a target object, comprises:

and shooting the target object through a plurality of terminal devices respectively to obtain the plurality of images to be detected.

9. The method according to claim 7, wherein the acquiring a plurality of images to be detected, each image to be detected including a target object, comprises:

acquiring an original image to be detected, wherein the original image to be detected comprises the target object;

and carrying out image transformation on the original image to be detected, and acquiring the original image to be detected and the original image to be detected after image transformation into the plurality of images to be detected.

10. The method of claim 9, wherein the image transformation of the original image to be detected comprises: and performing at least one image transformation operation of image turning, image cutting, color transformation and noise addition on the original image to be detected.

11. The method according to any one of claims 7-10, further comprising:

when the image processing model is an image detection model, calculating the intersection ratio between any two processing results in the plurality of processing results;

and when the intersection ratio smaller than or equal to a preset value exists, determining that the difference value among the plurality of processing results meets the third preset condition.

12. The method according to any one of claims 7-10, further comprising:

when the image processing model is an image classification model or an image recognition model, calculating whether any two processing results in the plurality of processing results are the same;

when two processing results in the plurality of processing results are different, determining that a difference value between the plurality of processing results meets the third preset condition.

13. The method according to any one of claims 7-10, further comprising:

when the image processing model is an image classification model, calculating whether any two processing results in the plurality of processing results are the same;

and when any two processing results in the plurality of processing results are the same and the difference value between the probabilities of any two processing results meets a fourth preset condition, determining that the difference value between the plurality of processing results meets the third preset condition, wherein the probability of the processing results is the probability when the image classification model identifies the image to be detected as the corresponding processing result.

14. An apparatus for determining training samples, comprising:

the first processing module is used for processing the image to be detected through the first network model to obtain a first processing result;

the second processing module is used for processing the image to be detected through a second network model to obtain a second processing result;

and the first determining module is used for determining the image to be detected as a training sample when the difference value between the first processing result and the second processing result meets a first preset condition.

15. An apparatus for determining training samples, comprising:

the acquisition module is used for acquiring a plurality of images to be detected, and each image to be detected comprises a target object;

the third processing module is used for respectively processing the plurality of images to be detected through the image processing model to obtain a plurality of processing results;

and the fifth determining module is used for determining at least one image to be detected in the plurality of images to be detected as a training sample when the difference value among the plurality of processing results meets a third preset condition.

16. A computer-readable storage medium, the storage medium storing a computer program for performing the method of determining training samples of any of the preceding claims 1-6, 7-13.

17. An electronic device, the electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor for performing the method of determining training samples of any of the preceding claims 1-6, 7-13.