CN116935149A

CN116935149A - Neural network training method, image detection method, device, equipment and product

Info

Publication number: CN116935149A
Application number: CN202210352034.1A
Authority: CN
Inventors: 何东超; 陈鹏
Original assignee: Beijing Furuite Unlimited Technology Development Co ltd
Current assignee: Beijing Furuite Unlimited Technology Development Co ltd
Priority date: 2022-04-02
Filing date: 2022-04-02
Publication date: 2023-10-24

Abstract

The invention provides a neural network training method, an image detection device, equipment, a medium and a product, wherein basic initial image detection models which can be adapted to a target scene are obtained through self-adaptive training by means of a large number of on-line source sample images with marking information and a small number of target sample images with marking information which are adapted to the target scene, the training sample images which are required to be used and are in the target scene are pre-marked through the initial image detection models, automatic acquisition of the marking information is realized, and further the marked training sample images and marking information thereof are used, so that the target image detection models are obtained through training, the consumption of manpower, material resources and cost in the marking process of the training sample images can be greatly reduced, the training time of the neural network is effectively shortened, the time proportion occupied by marking the training sample images in the training process of the neural network is reduced, and the training efficiency of the neural network is effectively improved.

Description

Neural network training method, image detection method, device, equipment and product

Technical Field

The disclosure relates to the technical field of deep learning, in particular to a neural network training method, an image detection device, equipment, a medium and a product.

Background

Along with the development of scientific technology, artificial intelligence gradually becomes one of important power for promoting social development, and the appearance of artificial intelligence greatly facilitates the work, study and daily life of people. The detection and identification means based on artificial intelligence are widely applied to various fields, such as freight industry, and can be used for detecting and identifying whether vehicles driven by drivers meet the standards of freight industry in the process of registering and serving.

When detecting and identifying vehicles, the license plate of the vehicle in the vehicle photo uploaded by a driver is generally detected and identified to judge whether the vehicles in the photo belong to a registered company, an artificial intelligence based detection and identification model can be used for realizing detection and identification more quickly and conveniently, a large number of training samples with marking information are needed in the training process of the model for obtaining the model with good detection and identification effect, and a large number of training samples with marking information are obtained by using an artificial marking method, which takes time and labor, so that the training of the model needs a long time.

Disclosure of Invention

The embodiment of the disclosure at least provides a neural network training method, an image detection device, equipment, a medium and a product.

The embodiment of the disclosure provides a neural network training method, which comprises the following steps:

acquiring an open source sample image with open source image annotation information, which is adapted to each application scene, a target sample image with target image annotation information, which is adapted to a target scene, and training sample images to be annotated in the target scene, wherein the number of the open source sample images is larger than that of the target sample images;

training to obtain an initial image detection model based on the open source sample image, the open source image annotation information, the target sample image and the target image annotation information;

based on the initial image detection model, performing image annotation on the training sample image to obtain sample image annotation information of the training sample image;

and training the target neural network based on the training sample image and the sample image annotation information to obtain a target image detection model for detecting target content in the image.

In an optional implementation manner, the training to obtain an initial image detection model based on the open source sample image, the open source image annotation information, the target sample image and the target image annotation information includes:

Extracting a first sample image and a second sample image from the target sample image, and extracting first image annotation information of the first sample image and second image annotation information of the second sample image from the target image annotation information, wherein the number of the second sample images is larger than that of the first sample images;

training the constructed neural network based on the open source sample image and the open source image annotation information, and the first sample image and the first image annotation information, and taking the trained neural network as a candidate image detection model for detecting images;

and performing reinforcement training on the candidate image detection model based on the second sample image and the second image annotation information to obtain a trained initial image detection model.

In an optional implementation manner, the performing image annotation on the training sample image based on the initial image annotation model to obtain sample image annotation information of the training sample image includes:

inputting the training sample image into the initial image annotation model to obtain initial image annotation information of the training sample image, which is output by the initial image annotation model;

Correcting the initial image annotation information based on the image content of the training sample image;

and taking the corrected initial image annotation information as sample image annotation information of the training sample image.

In an alternative embodiment, before the training the target neural network based on the training sample image and the sample image labeling information to obtain a target image detection model for detecting target content in an image, the method further includes:

performing image transformation preprocessing on the training sample image to obtain a processed training sample image, wherein the image transformation preprocessing comprises one or more of image content clipping, image angle rotation and image light condition transformation;

and determining new sample image annotation information for the processed training sample image based on the sample image annotation information and a processing mode for performing image transformation preprocessing on the training sample image.

In an optional implementation manner, the training the target neural network based on the training sample image and the sample image labeling information to obtain a target image detection model for detecting target content in an image includes:

Adding the processed training sample image into the training sample image before processing, and adding new sample image annotation information into the sample image annotation information obtained by annotation to form a training sample set after sample expansion;

and training the target neural network by using each training sample image and corresponding sample image annotation information in the training sample set to obtain a target image detection model.

inputting the training sample image into a target neural network to obtain prediction annotation information of the training sample image, wherein the prediction annotation information comprises an annotation frame and the confidence coefficient of the annotation frame;

based on the prediction annotation information and the sample image annotation information, adjusting network parameters of the target neural network to complete one-time training;

adding the annotation frame with the confidence coefficient larger than a preset difficult case threshold value into the sample image annotation information to obtain the object annotation information after supplementary annotation;

And taking the target labeling information after the supplement labeling as the sample image labeling information, returning to the step of inputting the training sample image into a target neural network to obtain the predicted labeling information of the training sample image until the target neural network meets the training cut-off condition, and taking the trained target neural network as a target image detection model for detecting target content in an image.

and training the initial image detection model based on the training sample image and the sample image annotation information to obtain a target image detection model for detecting target content in an image.

In an alternative embodiment, after the training the target neural network based on the training sample image and the sample image labeling information to obtain a target image detection model for detecting target content in an image, the method includes:

extracting a content image to be identified corresponding to the target content from the training sample image based on the position information of the target content indicated in the sample image annotation information;

Carrying out affine transformation processing on the content image to be identified to obtain the processed content image to be identified;

and training to obtain a target image recognition model for recognizing target content information in an image based on the content information of the target content indicated in the sample image annotation information and the processed content image to be recognized.

The embodiment of the disclosure also provides an image detection method, which comprises the following steps:

acquiring an image to be processed, and training a target image detection model and a target image recognition model according to the neural network training method;

and inputting the image to be processed into the target image detection model to obtain the detected annotation information of the target content in the image to be processed.

In an alternative embodiment, after the inputting the image to be processed into the target image detection model to obtain the detected labeling information of the target content in the image to be processed, the method further includes:

acquiring a target image recognition model obtained by training according to the neural network training method;

carrying out affine transformation processing on the content area image corresponding to the target content extracted from the image to be processed according to the marking information to obtain a processed content area image;

And identifying the target content in the content area image based on the target image identification model.

The embodiment of the disclosure also provides a neural network training device, which comprises:

the sample image acquisition module is used for acquiring open source sample images with open source image annotation information, which are matched with each application scene, target sample images with target image annotation information, which are matched with a target scene, and training sample images to be annotated in the target scene, wherein the number of the open source sample images is larger than that of the target sample images;

the initial model training module is used for training to obtain an initial image detection model based on the open source sample image, the open source image annotation information, the target sample image and the target image annotation information;

the training sample labeling module is used for labeling the training sample image based on the initial image detection model to obtain sample image labeling information of the training sample image;

and the first model training module is used for training the target neural network based on the training sample image and the sample image annotation information to obtain a target image detection model for detecting target content in the image.

In an alternative embodiment, the initial model training module is specifically configured to:

In an alternative embodiment, the training sample labeling module is specifically configured to:

In an alternative embodiment, the neural network training device further includes an image processing module, where the image processing module is configured to:

In an alternative embodiment, the first model training module is specifically configured to:

In an alternative embodiment, the neural network training device further includes a second model training module, where the second model training module is configured to:

The embodiment of the disclosure also provides an image detection device, which comprises:

the acquisition module is used for acquiring the image to be processed and a target image detection model obtained by training according to the neural network training device;

And the image detection module is used for inputting the image to be processed into the target image detection model to obtain the detected annotation information of the target content in the image to be processed.

In an alternative embodiment, the image detection device further includes an image recognition module, where the image recognition module is configured to:

acquiring a target image recognition model obtained by training the neural network training device;

The disclosed embodiments also provide a computer device comprising: the system comprises a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, when the computer device runs, the processor and the memory are communicated through the bus, and the machine-readable instructions are executed by the processor to execute the steps of the neural network training method or the image detection method.

The disclosed embodiments also provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor performs the steps of the neural network training method or the image detection method described above.

The disclosed embodiments also provide a computer program product comprising computer instructions, characterized in that the computer instructions, when executed by a processor, perform the steps of the neural network training method or the image detection method described above.

According to the neural network training method, the image detection method, the device, the medium and the product, when the model is trained, an initial image detection model is obtained through training by using an open source sample image with open source image marking information and a target sample image with target image marking information, which are matched with each application scene, of the target network, further, the training sample image to be marked in the target scene is marked by using the initial image detection model obtained through training, the sample image marking information of the training sample image is obtained, the target neural network is trained by using the obtained training sample image and the sample image marking information, a target image detection model for detecting target content in the image is obtained, and when the image is detected, the image to be processed and the target image detection model obtained through training by the neural network training method are obtained, and the image to be processed is input into the target image detection model, so that the marking information of the detected target content in the image to be processed is obtained.

In this way, a basic initial image detection model which can be adapted to a target scene is obtained through self-adaptive training by means of a large amount of on-line open source data with marking information and a small amount of target sample images with marking information which are adapted to the target scene, so that the initial image detection model is obtained by only needing a small amount of sample images with marking information, the training sample images in a large amount of target scenes are pre-marked through the trained initial image detection model, automatic obtaining of marking information is achieved, a large amount of training sample images with marking information are obtained, and then the training sample images and marking information thereof are used for training to obtain the target detection model, so that the manual marking step is omitted during the training of the target network, the consumption of manpower and material resources in the training sample image marking process can be greatly reduced, the training time of the neural network is effectively shortened, the time proportion occupied by marking the training sample images in the training process of the neural network is reduced, and the training efficiency of the neural network is effectively improved.

Furthermore, in the process of training the target neural network, a labeling frame with the confidence degree detected by the target neural network being larger than a preset difficult case threshold value can be added into the training sample image labeling information, training is repeatedly carried out, difficult case mining is achieved, diversity of training information in a sample can be effectively improved, prediction accuracy of the target neural network on a prediction result is effectively improved, and tolerance of the target neural network in processing difficult scenes is increased.

Correspondingly, when the target detection model obtained through training is used for detecting the image to be detected, the accuracy of the identification result is good, and the identification precision is high.

The foregoing objects, features and advantages of the disclosure will be more readily apparent from the following detailed description of the preferred embodiments taken in conjunction with the accompanying drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for the embodiments are briefly described below, which are incorporated in and constitute a part of the specification, these drawings showing embodiments consistent with the present disclosure and together with the description serve to illustrate the technical solutions of the present disclosure. It is to be understood that the following drawings illustrate only certain embodiments of the present disclosure and are therefore not to be considered limiting of its scope, for the person of ordinary skill in the art may admit to other equally relevant drawings without inventive effort.

FIG. 1 illustrates a flow chart of a neural network training method provided by an embodiment of the present disclosure;

FIG. 2 illustrates a flow chart of another neural network training method provided by embodiments of the present disclosure;

FIG. 3 shows a flow chart of an image detection method provided by an embodiment of the present disclosure;

FIG. 4 illustrates one of the schematic diagrams of a neural network training device provided by embodiments of the present disclosure;

FIG. 5 illustrates a second schematic diagram of a neural network training device provided by embodiments of the present disclosure;

FIG. 6 shows one of the schematic diagrams of an image detection apparatus provided by an embodiment of the present disclosure;

FIG. 7 is a second schematic diagram of an image detection device according to an embodiment of the disclosure;

fig. 8 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. The components of the embodiments of the present disclosure, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be made by those skilled in the art based on the embodiments of this disclosure without making any inventive effort, are intended to be within the scope of this disclosure.

According to the research, in the training process of the neural network, if a good neural network is required to be trained, a large amount of training sample data with marking information is required, manual marking is required for acquiring the marking information of the large amount of training sample data, and the manual marking workload is very large due to the fact that the number of acquired training sample images is quite large, the manual marking speed is low, the time is long, the training time of the neural network is greatly prolonged, and the consumption of resources and cost such as manpower, material resources and the like consumed in the training process is greatly increased.

Based on the above research, the disclosure provides a neural network training method, by constructing an initial image detection model, training the initial image detection model by using an open source sample image and a target sample image, so as to pre-label a training sample image, obtain a training sample image with labeling information, train the target neural network, greatly reduce the consumption of manpower, material resources and cost in the labeling process of the training sample image, effectively shorten the training time of the neural network, reduce the time proportion occupied by the labeling of the training sample image in the training process of the neural network, and effectively improve the training efficiency of the neural network.

The present invention is directed to a method for manufacturing a semiconductor device, and a semiconductor device manufactured by the method.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

Referring to fig. 1, fig. 1 is a flowchart of a neural network training method according to an embodiment of the disclosure. As shown in fig. 1, the neural network training method provided by the embodiment of the present disclosure includes steps S101 to S104, in which:

s101: the method comprises the steps of obtaining open source sample images with open source image annotation information, which are matched with each application scene, target sample images with target image annotation information, which are matched with target scenes, and training sample images to be annotated in the target scenes, wherein the number of the open source sample images is larger than that of the target sample images.

When training the neural network, a large number of sample images for training the neural network need to be acquired, in practical application, samples to be used are usually adapted to a target scene of the application, in order to adapt the neural network obtained through training to the scene of the final neural network application, the sample images under the condition of adapting to the target scene need to be used during training, and the sample images need to be provided with labeling information, and manual labeling of a large number of sample images needs to consume a large amount of manpower and material resources, so that manual labeling time is long.

Therefore, in order to reduce the consumption of manpower, material resources and cost in the process of labeling the sample images, in the step, a large number of open source sample images adapting to each application scene and a small number of target sample images adapting to the target scene can be acquired first, specifically, the open source sample images and the target sample images can be acquired, corresponding image data can be crawled from a network or an open source data set, and accordingly, in the process of crawling the images, corresponding labeling information of the open source sample images and the target image labeling information of the target sample images can be acquired together.

After the target sample image is obtained, the target sample image can be accurately marked manually, so that the target image marking information is obtained, the accuracy of the marking information adapted to the target scene is ensured, or the disclosed target image marking information can be crawled, and the target image marking information is subjected to high-precision calibration.

Further, a training sample image for training the target neural network for training can be obtained.

The open source data set (Common Objects in Context, COCO) is a large-scale object detection, segmentation, key point detection and subtitle data set, and integrates tens of thousands of images under each business scene.

The labeling information is labeling frame information for indicating detected content, position information of a labeling frame in the sample image and the like, and the position information of the detection frame can comprise vertex position information, center position information and the like of the detection frame. Taking the freight industry as an example, under the condition that whether the registration information, the vehicle condition and the personnel condition of a driver are real or not is detected, the detected content can be the face, the vehicle, the license plate and the like in the acquired image, and the labeling information can be the detection frame information, the position information and the like of the detected content.

The number of the target sample images can be far less than that of the open source sample images, and only information which can be provided for the target scene is needed, so that the target scene can be adapted after the neural network training when the neural network training is performed.

S102: and training to obtain an initial image detection model based on the open source sample image, the open source image annotation information, the target sample image and the target image annotation information.

In this step, after the target sample image, the target image annotation information, the open source sample image, and the open source image annotation information are obtained, an initial image detection model may be trained together using the acquired image and the corresponding annotation information, so as to facilitate subsequent training of a target neural network applied to the target scene.

In practical application, an initial neural network for target detection, which is built in advance, may be obtained first, then the target sample image and the open source sample image are used as input of the initial neural network, the corresponding target image annotation information and the open source image annotation information are used as output for learning by the initial neural network, so as to obtain a learned initial image detection model, or the target sample image and the open source sample image are used as input of the initial neural network, the annotation information corresponding to each image is output through learning and prediction of the initial neural network, and then the initial neural network is adjusted until training is completed, so as to obtain the initial image detection model.

In a specific application, when the initial image detection model is used for detecting contents such as texts in images, a differentiable binary network (DBNet) is used, images (the target sample image and the open source sample image) can be input into a neural network when the DBNet is trained, firstly, a feature pyramid network performs feature extraction on the images to obtain feature images, then a probability image and a threshold image are obtained based on the feature images, further an approximate binary feature image is obtained, and labeling information of the images is obtained through matching between the approximate binary feature image and the images. The DBNet is used as a detection model of the text in the image, so that the detection speed is high, the detection effect is very good for any content to be detected, but the detection method is not limited to the detection speed, in other embodiments, other types of detection neural networks, such as a YOLO model, a CTPN model and the like, can be used, and different types of detection neural networks can be used for combination, and the detection method is not repeated here.

In another embodiment, in order to improve the accuracy of the labeling of the initial image detection model, the initial neural network may be trained in batches by using the existing images and corresponding labeling information, while the open source image is adapted to each application scene, and the target sample image is adapted to the target scene, so that the target sample image may be batched to train the initial neural network.

The target sample image comprises a first sample image and a second sample image, and the target image annotation information comprises first image annotation information of the first sample image and second image annotation information of the second sample image.

Correspondingly, when the sample image is crawled, the first sample image and the corresponding first image annotation information, and the second sample image and the corresponding second image annotation information are crawled in batches, so that the target sample image and the target image annotation information are formed, or the target sample image and the target image annotation information are crawled in a total mode, the first sample image and the second sample image are divided from the target sample image, and the corresponding first image annotation information and the corresponding second image annotation information are divided from the target image annotation information.

Specifically, for obtaining the initial image detection model through image training of a batch, after each sample image and corresponding image annotation information thereof are obtained, a first sample image and a second sample image are extracted from the target sample image, first image annotation information of the first sample image and second image annotation information of the second sample image are extracted from the target image annotation information, then the built neural network is trained based on the open source sample image and the open source image annotation information, and the first sample image and the first image annotation information, and the trained neural network is used as a candidate image detection model for detecting an image, and then the candidate image detection model is subjected to reinforcement training based on the second sample image and the second image annotation information, so that the trained initial image detection model is obtained.

The method may include extracting a small number of sample images from the target sample images, taking the extracted small number of sample images as a first sample image, taking the rest of target sample images as a second sample image, correspondingly, extracting first image annotation information corresponding to the first sample image and second image annotation information corresponding to the second sample image from the target image annotation information, taking the first sample image and the annotation information thereof and the obtained open source sample image and the obtained open source image annotation information thereof as a training sample set, training a pre-built initial neural network to obtain a trained candidate image detection model, wherein the obtained candidate image detection model can be used for detecting images, adapting to a final target scene in view of a final desired initial image detection model, taking the second sample image and the second image annotation information thereof as a training sample set, using the training sample set to train the obtained candidate image detection model to obtain a fine-tuning target scene detection model, and adapting to the initial image detection model to obtain a fine-tuning target scene.

Here, in the actual application scenario, the first sample image may be a sample image acquired in a target scenario in a generalized range, the candidate image detection model acquired by using the first sample image to participate in training adapts to a scenario environment in a range of maximum coincidence, and the acquisition environment of the second sample image may be a scenario environment in a specific range in the actual application scenario, for example, in the freight industry, the first sample image may be a sample image acquired in an application scenario by all trucks in the prior art, and the second sample image is a sample image acquired in an application scenario by a certain truck.

Accordingly, since the first sample image is a preliminary study for a model and there is already a large number of the open source sample images, and the second sample image specifically performs fine tuning of the adaptation scene for the model, the number of the first sample images may be smaller than the number of the second sample images, and preferably, for each sample image in the first sample image, it cannot be found in the second sample image.

S103: and carrying out image annotation on the training sample image based on the initial image detection model to obtain sample image annotation information of the training sample image.

In the step, after the initial image detection model is obtained, the acquired training sample image to be marked in the target scene can be input into the initial image detection model, and the sample content in the training sample image can be marked through the initial image detection model, so that sample image marking information of the training sample image is obtained.

In practical use, the initial image detection model is subjected to strict training of an open source sample image and a target sample image, but a certain labeling error is unavoidable, so in practical use, sample image labeling information detected by the initial image detection model can be corrected, specifically, in the process of obtaining sample image labeling information of the training sample image through the initial image detection model, the training sample image can be input into the initial image labeling model first, the initial image labeling information of the training sample image output by the initial image labeling model is obtained, then the initial image labeling information can be corrected based on the image content of the training sample image, after the initial image labeling information obtained after the initial labeling of the training sample image is obtained, for example, the image content of the training sample image can be corrected through the correlation information of the image content of the training sample image and the correlation information of the image content of the training sample image, the initial image labeling information can be corrected through the manual labeling information, the correction of the correction can be carried out in a corresponding manner, the correction of the image size of the training image can be carried out, the correction is carried out, the correction of the actual image can be carried out, the correction of the image can be carried out, and the actual size can be corrected through the correction of the manual labeling information, and the correction of the image can be carried out in order to ensure the accuracy of the actual image, and the correction of the correction can be carried out, and the correction of the image is carried out, the automatic correction method can be used for carrying out preliminary correction in an automatic mode, the marking information with problems is preliminarily corrected, then fine correction is carried out in a manual mode, and then the corrected initial image marking information can be used as sample image marking information of the training sample image, so that more accurate training sample image and marking information of the training sample image can be obtained.

S104: and training the target neural network based on the training sample image and the sample image annotation information to obtain a target image detection model for detecting target content in the image.

In this step, after obtaining the training sample image and the sample image annotation information, that is, a sample that is considered to be ready for training of the neural network is obtained, and then, a target neural network that is built in advance may be trained using the training sample image and the sample image annotation information, thereby obtaining a target image detection model for detecting target content in an image.

When the target neural network is trained, the training sample image may be used as an input of the target neural network, the sample image labeling information may be used as an output of the target neural network, the target neural network may be trained, or the training sample image may be used as an input of the target neural network, the training sample image may be detected through the target neural network, so as to obtain prediction labeling information, and the sample image labeling information and the prediction labeling information may be used to converge the target neural network, so as to complete training.

In a specific embodiment, the training of the target neural network by using the training sample image and the sample image labeling information may be achieved by the following steps:

Here, in the training process, the training sample image may be input to the target neural network, after a series of processing processes such as feature extraction, processing, and recognition are performed on the training sample image, the target neural network may detect the training sample image, that is, detect the predicted label information of the target content in the training sample image, then based on the predicted label information and the sample image label information, by means of an adapted loss function or other tools, may calculate a loss value of the target neural network, adjust a network parameter of the target neural network through the loss value as a reference, thereby completing one training for the target neural network, after one training is finished, in order to effectively improve the detection effectiveness and accuracy of the target neural network, improve the accuracy of the target detection by the neural network, add a difficult-to-dig mode, improve the training accuracy of the target neural network, specifically, extract the predicted label information output by the target neural network therefrom, extract the confidence label information for the detected target content, and obtain the pre-set target frame, label information for the detected target content, then label information is set in the pre-set for the pre-set sample image, and label information is added as a threshold value, and the threshold value is added to the pre-set up sample image, and the threshold value is labeled for the target image is labeled after the pre-set, and returning to the step of inputting the training sample image into the target neural network to perform the next training until the target neural network meets the training cut-off condition, wherein the target neural network can be considered to be trained, and the trained target neural network is used as a target image detection model for detecting target content in the image.

In this embodiment, the description is given taking the example that the labeling frame with the confidence coefficient greater than the preset refractory threshold value is added to the sample image labeling information to obtain the target labeling information after the supplement labeling, but the description is not limited to this, and in other embodiments, if the difference between the confidence coefficient of the labeling frame corresponding to the confidence coefficient lower than the preset refractory threshold value and the preset refractory threshold value is within a certain error range, for example, 0.2 or 0.3 (the specific numerical value can be set according to the actual requirement), the labeling frame corresponding to the confidence coefficient is also used as the labeling frame to be supplemented to the sample image labeling information.

The prediction annotation information comprises an annotation frame and confidence of the annotation frame.

The training cutoff condition may be that when the number of parameter adjustments of the target neural network is greater than or equal to a preset number of times, that is, the number of training times of the target neural network is greater than or equal to a preset number of times, the target neural network may be considered to satisfy the training cutoff condition, but is not limited thereto, and in other embodiments, the loss of the target neural network in each dimension may be less than a loss threshold corresponding to each dimension, that is, the training cutoff condition such as the training cutoff condition, a specific training time or a loss threshold may be specifically set according to training requirements.

The preset difficult-to-sample threshold may be set according to a requirement on model accuracy of the target image detection model, or may be set according to accuracy dynamics of the prediction labeling information actually obtained in a process of training the target neural network in each round, for example, after training for several rounds, confidence degrees of all labeling frames indicated by the prediction labeling information are higher and more accurate, then a higher threshold may be appropriately set, and at an early stage of training, confidence degrees of all labeling frames indicated by the prediction labeling information are lower, then a lower threshold may be appropriately set, and an average value may be calculated as a threshold according to the confidence degrees of all labeling frames.

Generally, for the predicted labeling information output by the target neural network, each labeling frame indicated by the predicted labeling information and the corresponding confidence coefficient thereof are considered to be accurate when the confidence coefficient is greater than or equal to a certain confidence coefficient threshold, while for the confidence coefficient smaller than the confidence coefficient threshold and the labeling frame thereof are considered to be inaccurate and discarded, for the labeling frame with the confidence coefficient greater than a preset refractory threshold, specifically, the labeling frame with the confidence coefficient greater than the preset refractory threshold and smaller than the confidence coefficient threshold and the labeling frame corresponding thereto can supplement the content which is originally detected to be more fuzzy by the target neural network to the labeling information, so as to realize the enhancement of the labeling information.

The training sample image and the labeling information of the sample image are obtained, and then the training sample image can be input into a pre-built target neural network to obtain the predicted labeling information of the training sample image output by the target neural network, wherein the predicted labeling information comprises labeling frames and information thereof such as position information and the like of sample contents in the training sample image, and the confidence level of the corresponding labeling frames, in practical application, the higher the confidence level is, the more accurate the corresponding labeling information is, the corresponding confidence level threshold can be set for a plurality of obtained predicted frames, for example, the confidence level threshold is set to be 0.7, so that the labeling frames with the confidence level not smaller than 0.7 can be considered as reliable, and the predicted labeling frames with the confidence level of a large number of labeling frames are lower than 0.7 for the training sample image, especially the labeling frames which are in a middle section range and suspicious, wherein the confidence level is also possible to be correct, in order to enable the target neural network to be more accurately trained, the more accurately and more accurately, the labeling frames with the confidence level is set to be more easily and more easily compared with the confidence level than 0.7, the confidence level can be set to be more easily labeled with the confidence level threshold to be set to be approximate to the confidence level 5, for example, the confidence level threshold is set to be more than 0.5, the confidence level can be labeled with the confidence level of the predicted to be more than 0.7, the labeling frames can be considered to be more accurately, the predicted to be labeled frames with the predicted, and the predicted to be less than 0.7 is compared with the training sample image, and the threshold is in the threshold is can be set to be a threshold, the confidence threshold is more accurate is can be compared, and the labeling can be compared, and the labeling is 5 has, and repeating the step of inputting the training sample image into the target neural network until the target neural network meets the training cut-off condition, so that the tolerance of the target image detection model obtained by training can be effectively improved when the target image detection model is applied to a difficult scene.

In this embodiment, considering that the target image detection model to be trained is used for detecting the target content in the image, that is, the target content in the image is marked for subsequent recognition, the initial image detection model obtained by training is mainly used for marking the training sample to detect sample image marking information of the training sample image, that is, the target image detection model to be trained and the initial image detection model have essentially the same purpose and function requirements, therefore, the target image detection model and the initial image detection model may use the same neural network, or may use different types of neural networks with the same function.

Further, under the condition that the target image detection model and the initial image detection model use the same neural network, training is performed on the target neural network based on the training sample image and the sample image annotation information to obtain a target image detection model for detecting target content in an image, the initial image detection model can be trained based on the training sample image and the sample image annotation information to obtain a target image detection model for detecting target content in an image, namely the trained initial image detection model can be used as a built target neural network, the training sample image and the sample image annotation information are directly used for training the initial image detection model to obtain the target image detection model, and in view of the fact that the initial image detection model used as the built target neural network is a trained neural network, the neural network has better robustness and accuracy, and when training is performed again with pertinence and purpose, the training speed and efficiency of the target neural network can be accelerated, and the convergence of the model can be accelerated.

According to the neural network training method, a basic initial image detection model which can be adapted to a target scene is obtained through self-adaptive training by means of a large amount of on-line open source data with marking information and a small amount of target sample images with marking information which are adapted to the target scene, so that the initial image detection model can be obtained by only needing a small amount of sample images with marking information, the training sample images in a large amount of target scenes are pre-marked through the trained initial image detection model, automatic acquisition of marking information is achieved, a large amount of training sample images with marking information are obtained, and then the training sample images and marking information thereof are used for training to obtain a target detection model, so that the manual marking step is omitted during training of the target network, the consumption of manpower, material resources and cost in the training sample image marking process can be greatly reduced, the time proportion of the marking of the training sample images in the neural network training process is effectively shortened, the training efficiency of the neural network is effectively improved, meanwhile, when the target image detection model is trained, the method of difficult-case is adopted, the processing difficulty of the target neural network is increased, and the robustness of the target network is improved, and the target image is difficult to detect.

Referring to fig. 2, fig. 2 is a flowchart of another neural network training method according to an embodiment of the disclosure. As shown in fig. 2, the neural network training method provided by the embodiment of the present disclosure includes steps S201 to S205, in which:

s201: the method comprises the steps of obtaining open source sample images with open source image annotation information, which are matched with each application scene, target sample images with target image annotation information, which are matched with target scenes, and training sample images to be annotated in the target scenes, wherein the number of the open source sample images is larger than that of the target sample images.

S202: and training to obtain an initial image detection model based on the open source sample image, the open source image annotation information, the target sample image and the target image annotation information.

S203: and carrying out image annotation on the training sample image based on the initial image detection model to obtain sample image annotation information of the training sample image.

S204: and performing image transformation preprocessing on the training sample image to obtain a processed training sample image, wherein the image transformation preprocessing comprises one or more of image content clipping, image angle rotation and image light condition transformation.

In practical application, because the obtained training sample image may have some defects due to some adverse detection conditions, such as shielding of a shielding object, too dark light, dirt and other environmental or artificial factors, during the acquisition, in this step, in order to make the training sample image more closely to the image in good condition, the training sample image also meets the requirement of model use, and the complexity of the image is properly increased, so that the training sample image can be correspondingly preprocessed to enhance the quality of the image.

The image transformation preprocessing performed on the training sample image may include one or more of image content cropping, image angle rotation, image light condition transformation, and the like.

In the embodiment of the disclosure, the image transformation preprocessing is described by taking the processing of the training sample image after the initial image detection model, but the method is not limited thereto, and in other embodiments, the image transformation preprocessing may be performed on the training sample image before the labeling of the training sample image by the initial image detection model, which is not limited thereto. Accordingly, if the training sample image is subjected to image transformation preprocessing before the training sample image is labeled by the initial image detection model, the labeling information of the initial image detection model on the training sample image is new, and the process of adjusting the labeling information of the sample image is not required in step S205.

S205: and determining new sample image annotation information for the processed training sample image based on the sample image annotation information and a processing mode for performing image transformation preprocessing on the training sample image.

Accordingly, after the training sample image is preprocessed, the size, the angle and other properties of the training sample image may be changed, so that the corresponding labeling information of the training sample image may also be changed, and therefore, the labeling information of the training sample image is updated correspondingly.

In this step, after the training sample image is subjected to the image transformation preprocessing, whether the sample image labeling information needs to be adjusted or not may be detected by the processing mode of the image transformation preprocessing, for example, if the training sample image is too large or the surrounding noisy content of the image is too large, the training sample image is correspondingly cut and/or scaled, and the position, the size and the like of the content in the cut and/or scaled image may change relative to the image, so that the corresponding sample image labeling information needs to be adjusted, so that the labeling frame in the original sample image labeling information may be finely adjusted according to the adjusted position, mode and the like according to the processing mode of the image transformation preprocessing and the actual processing condition, thereby determining the new sample image labeling information of the processed training sample image.

For example, the size of the acquired training sample image may be 1000×1000, and the size of the image to be used is 800×800, so that the training sample image needs to be cut, and positions of each part of the cut may be determined according to the distribution condition of the content in the image, for example, only the upper edge and the right edge may be cut, and after the cutting, the position of the content in the image changes compared with the position before the cutting, for example, the position moves to the right by 0.1cm and moves upwards by 0.1cm relatively to the position before the cutting, so that the corresponding sample image marking information may be updated according to the moving condition.

S206: and training the target neural network based on the training sample image and the sample image annotation information to obtain a target image detection model for detecting target content in the image.

The descriptions of steps S201 to S203 and S206 may refer to the descriptions of steps S101 to S104, and may achieve the same technical effects and solve the same technical problems, which are not described herein.

In one possible embodiment, step S206 includes:

In this step, after the training sample image is labeled to obtain the sample image labeling information, and before the target neural network is trained, image transformation preprocessing may be performed on the training sample image, so that a new training sample image and sample image labeling information corresponding to the new training sample image are generated, and thus, when the target neural network is actually trained, the original training data may be reinforced by the new training sample image and sample image labeling information corresponding to the new training sample image, so as to form a new training sample set after sample expansion, and then the target neural network is trained by using the new training sample set.

Therefore, the diversity, the richness and the comparability of the samples can be effectively increased, and the robustness and the generalization capability of the target neural network can be improved.

In other embodiments, in order to reduce the data size and training time and increase the training speed, the training sample image after the image transformation preprocessing and the labeling information of the updated training sample image may be obtained, and then the training sample image after the image transformation preprocessing and the labeling information of the updated sample image may be directly used as a training sample set for training the target neural network.

In a possible embodiment, after step S206, the method further comprises:

In some practical application scenarios, the target content in the image is marked, only the initial target is needed, and the actual content of the target content in the image needs to be accurately identified in the later stage, so that a target image identification model for identifying the target content information in the image needs to be trained.

Therefore, in this step, after the training sample image and the sample image labeling information corresponding to the training sample image are owned, the content image to be identified corresponding to the target content can be extracted by the position information of the target content in the training sample image indicated by the sample image labeling information.

In an actual application scene, because the acquisition of the image is under a natural environment, the target content in the acquired training sample image is not completely square, and due to the problems of shooting angles and the like, the content in the image may have certain bending, rotation, inclination and the like, in order to enable the target content in the image to be easier to identify, the extracted content image to be identified can be subjected to affine transformation, and the content in the content image to be identified becomes more square and straight in a translation, rotation and the like manner.

And training to obtain a target image recognition model for recognizing the target content information in the image by using the content information of the target content indicated in the sample image annotation information and the processed content image to be recognized.

Correspondingly, before the target image recognition model is actually used, affine transformation can be carried out on the image to be recognized, so that the obtained recognition result is more accurate.

The sample image labeling information may include content information indicating the target content, for example, text information, portrait information, etc., in addition to information including a labeling frame, where the text information may include chinese text information, english text information, etc.

In practical application, a neural network of a target image recognition model may be constructed by using a CRNN (convolutional cyclic neural network), where the CRNN does not need to perform character segmentation on sample data, can recognize text sequences with indefinite lengths, and has fast model speed and good performance, but the method is not limited thereto.

According to the neural network training method, a basic initial image detection model which can be adapted to a target scene is obtained through self-adaptive training by means of a large amount of on-line open source data with the annotation information and a small amount of target sample images with the annotation information and adapted to the target scene, so that the initial image detection model can be obtained by only needing a small amount of sample images with the annotation information, the training sample images in a large amount of target scenes are pre-annotated through the trained initial image detection model, automatic acquisition of the annotation information is achieved, a large amount of sample images with the annotation information are obtained, and then the sample images and the annotation information are used for training to obtain a target detection model, so that the consumption of manpower, material resources and cost in the process of annotating the training sample images can be greatly reduced, the training time of the neural network can be effectively shortened, the time proportion occupied by the annotation of the training sample images in the training process of the neural network is reduced, the training efficiency of the neural network is effectively improved, the training method of a difficult-case mining method is adopted, the training sample images in the process of the difficult scene is increased, meanwhile, the complexity of the training images with the annotation information is automatically detected, the target images are effectively converted, and the training images with the robust images are obtained, and the training images are effectively converted, and the training images of the target images are obtained.

An embodiment of the disclosure further provides an image detection method, please refer to fig. 3, and fig. 3 is a flowchart of the image detection method provided in the embodiment of the disclosure. As shown in fig. 3, the image detection method provided by the embodiment of the present disclosure includes steps S301 to S302, in which:

s301: acquiring an image to be processed and a target image detection model obtained by training according to the neural network training method;

s302: and inputting the image to be processed into the target image detection model to obtain the detected annotation information of the target content in the image to be processed.

Further, in some possible embodiments, after step S302, the method further includes:

acquiring a target image recognition model obtained by training the neural network training method;

The image detection method provided by the embodiment of the disclosure has the advantages of high detection and identification accuracy, high detection efficiency and good robustness.

It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.

Based on the same inventive concept, the embodiments of the present disclosure further provide a neural network training device corresponding to the neural network training method and an image detection device corresponding to the image detection method, and since the principle of solving the problem by the device in the embodiments of the present disclosure is similar to that of the embodiments of the present disclosure, the implementation of the device may refer to the implementation of the method, and the repetition is omitted.

Referring to fig. 4 to 5, fig. 4 is a schematic diagram of a neural network training device according to an embodiment of the disclosure, and fig. 5 is a schematic diagram of a neural network training device according to an embodiment of the disclosure. As shown in fig. 4, a neural network training device 400 provided in an embodiment of the present disclosure includes:

the sample image obtaining module 410 is configured to obtain an open source sample image with open source image labeling information, which is adapted to each application scene, a target sample image with target image labeling information, which is adapted to a target scene, and training sample images to be labeled in the target scene, where the number of the open source sample images is greater than the number of the target sample images;

The initial model training module 420 is configured to train to obtain an initial image detection model based on the open source sample image, the open source image annotation information, the target sample image and the target image annotation information;

the training sample labeling module 430 is configured to perform image labeling on the training sample image based on the initial image detection model, so as to obtain sample image labeling information of the training sample image;

the first model training module 440 is configured to train the target neural network based on the training sample image and the sample image labeling information, so as to obtain a target image detection model for detecting the target content in the image.

In a possible implementation manner, the initial model training module 420 is specifically configured to:

In one possible implementation, the training sample labeling module 430 is specifically configured to:

In an alternative embodiment, as shown in fig. 5, the neural network training device 400 further includes an image processing module 450, where the image processing module 450 is configured to:

In a possible implementation manner, the first model training module 440 is specifically configured to:

In an alternative embodiment, as shown in fig. 5, the neural network training device 400 further includes a second model training module 460, where the second model training module 460 is configured to:

According to the neural network training device provided by the embodiment of the disclosure, the initial image detection model is obtained through self-adaptive training of a large number of on-line open source sample images with marking information and a small number of target sample images with marking information, which are matched with a target scene, the training sample images are pre-marked by the initial image detection model to obtain the training sample images with the marking information, further the training sample images and the marking information thereof are used for training to obtain the target detection model, manual marking is not needed for the training sample images, automatic acquisition of the marking information is realized, a certain number of manual marking sample images are only needed when the initial image detection model is trained, the workload and time of manual marking are greatly reduced, a large number of training samples with the marking information can be obtained quickly, the time required by training a neural network is greatly shortened, the resource consumption required by training the neural network is reduced, meanwhile, a difficult-case mining method is adopted, the tolerance of the target image detection model when the scene is difficult to process is increased, and the robustness and the generalization capability of the target image detection model are enhanced.

Referring to fig. 6 and 7, fig. 6 is a schematic diagram of an image detection device according to an embodiment of the disclosure, and fig. 7 is a schematic diagram of a second image detection device according to an embodiment of the disclosure. As shown in fig. 6, an image detection apparatus 600 provided by an embodiment of the present disclosure includes:

the acquiring module 610 is configured to acquire an image to be processed and a target image detection model obtained by training according to the neural network training device;

the image detection module 620 is configured to input the image to be processed into the target image detection model, and obtain detected annotation information of target content in the image to be processed.

In an alternative embodiment, as shown in fig. 7, the image detection apparatus 600 further includes an image recognition module 630, where the image recognition module 630 is configured to:

The image detection model provided by the embodiment of the disclosure has high detection and identification accuracy, high detection efficiency and good robustness when detecting and identifying the image to be detected.

The process flow of each module in the apparatus and the interaction flow between the modules may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.

The embodiment of the present disclosure further provides a computer device 800, as shown in fig. 8, which is a schematic structural diagram of the computer device 800 provided in the embodiment of the present disclosure, including: a processor 810, a memory 820, and a bus 830. The memory 820 stores machine-readable instructions executable by the processor 810. When the computer device 800 is running, the processor 810 and the memory 820 communicate via the bus 830, and the machine-readable instructions, when executed by the processor 810, perform the steps of the neural network training method or the image detection method described above.

The disclosed embodiments also provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor performs the steps of the neural network training method or the image detection method described in the above method embodiments. Wherein the storage medium may be a volatile or nonvolatile computer readable storage medium.

The embodiments of the present disclosure further provide a computer program product, where the computer program product carries a program code, where instructions included in the program code may be used to perform the steps of the neural network training method or the image detection method described in the foregoing method embodiments, and specifically reference may be made to the foregoing method embodiments, which are not described herein.

Wherein the above-mentioned computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Finally, it should be noted that: the foregoing examples are merely specific embodiments of the present disclosure, and are not intended to limit the scope of the disclosure, but the present disclosure is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, it is not limited to the disclosure: any person skilled in the art, within the technical scope of the disclosure of the present disclosure, may modify or easily conceive changes to the technical solutions described in the foregoing embodiments, or make equivalent substitutions for some of the technical features thereof; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A neural network training method, the method comprising:

2. The method according to claim 1, wherein training to obtain an initial image detection model based on the open source sample image, the open source image annotation information, the target sample image, and the target image annotation information comprises:

3. The method according to claim 1, wherein the performing image labeling on the training sample image based on the initial image labeling model to obtain sample image labeling information of the training sample image includes:

4. The method of claim 1, wherein prior to training the target neural network based on the training sample image and the sample image annotation information to obtain a target image detection model, the method further comprises:

5. The method of claim 4, wherein training the target neural network based on the training sample image and the sample image annotation information to obtain a target image detection model comprises:

6. The method according to claim 1, wherein training the target neural network based on the training sample image and the sample image labeling information to obtain a target image detection model for detecting target content in an image comprises:

7. The method according to claim 1, wherein training the target neural network based on the training sample image and the sample image labeling information to obtain a target image detection model for detecting target content in an image comprises:

8. The method according to claim 1, wherein after training the target neural network based on the training sample image and the sample image labeling information to obtain a target image detection model for detecting target content in an image, the method comprises:

9. An image detection method, the method comprising:

acquiring an image to be processed and a target image detection model obtained by training according to the neural network training method of any one of claims 1 to 8;

10. The method according to claim 9, wherein after the inputting the image to be processed into the target image detection model, the method further comprises:

acquiring a target image recognition model trained by the neural network training method according to any one of claims 1 to 8;

11. A neural network training device, the device comprising:

12. An image detection apparatus, the apparatus comprising:

the acquisition module is used for acquiring an image to be processed and a target image detection model obtained by training of the neural network training device according to claim 11;

13. A computer device, comprising: a processor, a memory and a bus, the memory storing machine readable instructions executable by the processor, the processor and the memory in communication via the bus when the computer device is running, the machine readable instructions when executed by the processor performing the steps of the neural network training method of any one of claims 1 to 8 or the image detection method of any one of claims 9 to 10.

14. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the neural network training method according to any one of claims 1 to 8 or the image detection method according to any one of claims 9 to 10.

15. A computer program product comprising computer instructions which, when executed by a processor, perform the steps of the neural network training method of any one of claims 1 to 8 or the image detection method of any one of claims 9 to 10.