WO2023040629A1

WO2023040629A1 - Neural network training method and apparatus, image processing method and apparatus, and device and storage medium

Info

Publication number: WO2023040629A1
Application number: PCT/CN2022/114983
Authority: WO
Inventors: 张正夫; 梁鼎; 吴一超
Original assignee: 上海商汤智能科技有限公司
Priority date: 2021-09-18
Filing date: 2022-08-26
Publication date: 2023-03-23
Also published as: CN113792734A

Abstract

A neural network training method and apparatus, an image processing method and apparatus, and a device and a storage medium. The neural network training method comprises: acquiring training samples of at least two types (S101); on the basis of a target sampling ratio between the training samples of different types in the at least two types, reading, from the acquired training samples of the at least two types, target training samples for each of multiple instances of training, wherein the number of the target training samples of different types that are read each time is in line with the target sampling ratio (S102); and training a target neural network on the basis of the target training samples that are read, wherein the target neural network is used to identify images to be identified of different types (S103).

Description

Method, device, equipment and storage medium for neural network training and image processing

Cross References to Related Applications

This application claims the priority of the Chinese patent application with application number 202111098594.0 filed on September 18, 2021, and the entire disclosure of the Chinese patent application is incorporated herein by reference.

technical field

The present disclosure relates to the technical field of character recognition, and in particular, to a method, device, device and storage medium for neural network training and image processing.

Background technique

With the rapid development of image technology and the gradual expansion of market demand, image character recognition technology has received extensive attention. Image text recognition is the process of converting text images into a series of symbols that can be represented and processed by a computer. However, the current models used for text and image recognition are often difficult to apply to a variety of usage scenarios after training, resulting in a model with a single function and difficult to adapt to the needs of diverse scenarios.

Contents of the invention

Embodiments of the present disclosure at least provide a method, device, device, and storage medium for neural network training and image processing.

In a first aspect, an embodiment of the present disclosure provides a method for training a neural network, the method comprising:

Obtain at least two types of training samples;

Based on the target sampling ratio between different types of training samples in the at least two types, read the target training samples for each training in multiple trainings from the acquired training samples of the at least two types; wherein, The number of target training samples of different types read each time complies with the target sampling ratio;

Based on the read target training samples, a target neural network is trained, and the target neural network is used to identify different types of images to be identified.

Using the above neural network training method, in the case of obtaining different types of training samples, based on the target sampling ratio between different types of training samples, the training data used for each training can be read from the different types of training samples obtained. The target training samples are used to train the target neural network based on the read target training samples. Since the target sampling ratio between different types of training samples can well control the number of different types of training samples selected, this can reduce the impact of directly mixing training samples with a large gap in the amount of data on feature learning to a certain extent, and improve The recognition accuracy of the target neural network.

In a possible implementation manner, based on the target sampling ratio between different types of training samples in the at least two types, reading multiple training samples from the acquired training samples of the at least two types Target training samples for each training, including:

Determining the number of samples corresponding to each type of training sample based on the target sampling ratio between different types of training samples and the number of training samples required for each training;

According to the determined number of samples, the training samples are read from the acquired training samples of each type.

Here, based on the target sampling ratio and the number of training samples required for each training, the number of samples corresponding to each type of training sample can be determined. The larger the target sampling ratio is, the more corresponding samples will be. The target The smaller the sampling ratio, the smaller the corresponding sampling quantity, so that the read training samples can meet the needs of each training.

In a possible implementation manner, the number of training samples required for each training is determined according to the following steps:

determining the total amount of training samples and the total number of training times corresponding to the at least two types of training samples;

Based on the total amount of training samples and the total number of training times, the number of training samples required for each training is determined.

In a possible implementation manner, the target sampling ratio between different types of training samples is determined according to the following steps:

In the case of receiving the training task, read the sampling ratio range set for different types of training samples from the training configuration file;

In each training, the target sampling ratio is selected from the sampling ratio range.

Here, the target sampling ratio in each training process can be realized based on the configuration of the sampling ratio range in the training configuration file, that is, one configuration can complete the sample selection operation in the entire training process, while ensuring the recognition accuracy, It also improves the efficiency of recognition.

In a possible implementation manner, in each training, selecting the target sampling ratio from the sampling ratio range includes:

After a training session is completed, within the range of the sampling ratio, the target sampling ratio used in the previous training is adjusted based on the preset adjustment step size to obtain the target sampling ratio used in the current training.

In a possible implementation manner, the acquiring at least two types of training samples includes:

Based on the corresponding relationship between the type of the training sample and each pre-configured storage file, the corresponding type of training sample is read from each storage file.

Here, the acquisition of training samples can be realized based on the correspondence between the type of training samples and the storage file. In the case of one type corresponding to one storage file, the acquisition of samples can be realized quickly, further improving the efficiency of network training.

In a possible implementation manner, the image to be recognized includes a text image, and the target training sample includes a target image sample; the training of a target neural network based on the read target training sample includes:

For each type of target image sample, the target image sample is used as the input of the neural network to be trained, and the pre-labeled text for the target image sample is used as the output of the target neural network to be trained, and the training is used for different types of A target neural network for text image recognition.

In a second aspect, an embodiment of the present disclosure also provides an image processing method, the method comprising:

Obtain the image to be recognized;

The image to be recognized is input into the target neural network trained by the method described in any one of the first aspect and its various modes, and an image processing result is output.

In the third aspect, the embodiment of the present disclosure also provides a neural network training device, the device comprising:

An acquisition module, configured to acquire at least two types of training samples;

A reading module, configured to read the target sampling ratio between the at least two types of training samples of different types in the at least two types, and read the training samples used for each training in multiple training sessions from the acquired training samples. Target training samples; wherein, the number of target training samples of different types read each time complies with the target sampling ratio;

The training module is configured to train a target neural network based on the read target training samples, and the target neural network is used to identify different types of images to be identified.

In a fourth aspect, an embodiment of the present disclosure further provides an image processing device, the device comprising:

An acquisition module, configured to acquire an image to be identified;

The processing module is configured to input the image to be recognized into the target neural network trained by the method described in any one of the first aspect and its various modes, and output an image processing result.

In a fifth aspect, an embodiment of the present disclosure further provides an electronic device, including: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the The processor communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, the steps of the neural network training method as described in the first aspect and any of its various implementations are executed or as described in The steps of the image processing method described in the second aspect.

In the sixth aspect, the embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor as in the first aspect and its various implementation modes Any of the steps of the neural network training method or the steps of the image processing method described in the second aspect.

For the effect description of the above-mentioned device, electronic equipment, and computer-readable storage medium, refer to the description of the above-mentioned method, and details are not repeated here.

In order to make the above-mentioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments will be described in detail below together with the accompanying drawings.

Description of drawings

In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the accompanying drawings used in the embodiments. The accompanying drawings here are incorporated into the specification and constitute a part of the specification. The drawings show the embodiments consistent with the present disclosure, and are used together with the description to explain the technical solution of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. For those skilled in the art, they can also make From these drawings other related drawings are obtained.

FIG. 1 shows a flowchart of a method for neural network training provided by an embodiment of the present disclosure;

FIG. 2 shows a schematic diagram of a neural network training device provided by an embodiment of the present disclosure;

FIG. 3 shows a schematic diagram of an image processing device provided by an embodiment of the present disclosure;

FIG. 4 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure;

Fig. 5 shows a schematic diagram of another electronic device provided by an embodiment of the present disclosure.

Detailed ways

In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only It is a part of the embodiments of the present disclosure, but not all of them. The components of the disclosed embodiments generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the claimed disclosure, but merely represents selected embodiments of the present disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative effort shall fall within the protection scope of the present disclosure.

It should be noted that like numerals and letters denote similar items in the following figures, therefore, once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.

The term "and/or" in this article only describes an association relationship, which means that there can be three kinds of relationships, for example, A and/or B can mean: there is A alone, A and B exist at the same time, and B exists alone. situation. In addition, the term "at least one" herein means any one of a variety or any combination of at least two of the more, for example, including at least one of A, B, and C, which may mean including from A, Any one or more elements selected from the set formed by B and C.

It is found through research that text images can be divided into different types according to different visual features, for example, images with printed text, images with handwritten text, and images with natural scene text. In an actual application scenario, a text image (such as an answer sheet image) may contain both printed text and handwritten text. This requires that the trained text recognition model has high recognition accuracy for various types.

Combined with the existing text recognition scheme adapted to a single application scenario, various types of text images can be used as training samples, and the text recognition model can be trained after mixing. However, since the amount of data of different types of text images is likely to be quite different, using a direct mixed training method will result in that text images with a small amount of data cannot perform feature learning well, resulting in low recognition accuracy.

Based on the above research, the present disclosure provides a neural network training and image processing method, device, device, and storage medium to improve recognition accuracy.

In order to facilitate the understanding of this embodiment, a neural network training method disclosed in the embodiments of the present disclosure is first introduced in detail. The execution subject of the neural network training method provided in the embodiments of the present disclosure is generally a computer with certain computing power equipment, the computer equipment includes, for example: terminal equipment or server or other processing equipment, the terminal equipment can be user equipment (User Equipment, UE), mobile equipment, user terminal, terminal, cellular phone, cordless phone, personal digital assistant (Personal Digital Assistant) Assistant, PDA), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc. In some possible implementation manners, the neural network training method can be realized by calling a computer-readable instruction stored in a memory by a processor.

Referring to FIG. 1 , it is a flowchart of a neural network training method provided by an embodiment of the present disclosure. The method includes steps S101 to S103:

S101: Acquire at least two types of training samples.

S102: Based on the target sampling ratio between different types of training samples in at least two types, read the target training samples used for each training in multiple trainings from the obtained at least two types of training samples; wherein, each time The number of different types of target training samples read conforms to the target sampling ratio.

S103: Based on the read target training samples, train the target neural network, where the target neural network is used to identify different types of images to be recognized.

In order to facilitate the understanding of the neural network training method provided by the embodiment of the present disclosure, the application scenario of the method will be described in detail below. The above-mentioned neural network training method can be mainly applied in application fields that need to complete mixed training among various types of training samples. The corresponding application fields are different, and the training samples here are also different. For example, it can be applied to the field of text recognition, and the training samples correspond to text images, which can include printed text images, handwritten text images, natural scene text images, and the like. Next, we will give examples in the field of text recognition.

Considering that there are large differences in different types of text images, directly mixing various types of text images for training will very likely lead to different recognition accuracy among text image types with different data amounts. In the case of multiple rounds of neural network training, for the text image type with a large amount of data, it often has higher recognition accuracy because it can learn more abundant image features, and for the text image type with a small amount of data In terms of recognition, it often has low recognition accuracy due to the inability to learn image features well.

Wherein, the text image type (that is, the type of training sample referred to in S101) refers to the type of image containing text, where the text can be printed text, handwritten text, natural scene text, etc., different text It can correspond to different types of text images, for example, printed text, handwritten text, and natural scene text can correspond to printed text images, handwritten text images, and natural scene text images. Exemplarily, the type of text image may refer to a method of generating text image or a channel for obtaining text image, etc., and the specific type and quantity of text image types are not limited here.

It should be noted that the recognition accuracy of text images will not only be affected by the amount of data, but may also be affected by the difficulty of image processing. For example, compared with printed text images, the presence of personalized handwriting will lead to handwritten text image processing. It is more difficult. In addition, the recognition accuracy will also be affected by various factors, which will not be repeated here.

In order to have high recognition accuracy for different types of text images, it is necessary to finely control the number of different types of text images used during training. Therefore, the embodiment of the present disclosure provides a scheme of reading the target training samples for each training based on the target sampling ratio among different types of training samples, and then training the target neural network based on the read target training samples.

Wherein, different types of training samples in the embodiments of the present disclosure correspond to at least two types of training samples, which can be read from different storage files in specific applications.

In the embodiment of the present disclosure, different types of training samples can be correspondingly stored in different storage files, for example, different file search paths are set for different storage files, and training samples of the same type can be recorded in the same file search Under the path, different types of training samples can be recorded under different file search paths. In this way, in the case that different types of training samples are required, corresponding types of training samples can be read from different storage files based on the data reader.

In a specific application, the relevant target sampling ratio may be preset based on different application scenarios, or may be preset for each training in a specific application scenario. In addition, it can also be automatically generated under the constraints of the sampling ratio range and the preset adjustment step size. Embodiments of the present disclosure do not specifically limit this.

In practical applications, the target sampling ratio can be used as a fixed parameter in the training task. For example, 1:1, 1:2, etc., which are pre-set sampling ratios based on application scenario requirements, can be used in each training process. The target sampling ratio can also be used as a semi-automatic parameter in the training task. For example, based on factors such as the size of the data volume that affect the number of training times, the sampling ratio is preset to be adjusted every time the preset number of training times is reached. For example, every 10 Adjust a new sampling ratio for each training session. The target sampling ratio can also be used as a fully automatic parameter in the training task, for example, a new sampling ratio is adjusted every time training is performed.

In the embodiment of the present disclosure, in the case of determining the target sampling ratio among different types of training samples, the target training samples for each training may be read from the acquired training samples of different types based on the target sampling ratio. The number of target training samples read here is in line with the target sampling ratio.

Here we still take text recognition as an example. There are three types of training samples in total, namely 10,000 printed text images, 500 handwritten text images, and 3,000 natural scene text images. Assuming that the target sampling ratio between the three types of text images is 1:1:1, the number of the three types of target training samples read here can be 300, assuming that the three types of text images In the case where the target sampling ratio between is 3:1:2, the numbers of the three types of target training samples read here may correspond to 300, 100 and 200 samples respectively.

Based on the various types of target training samples read out, the target neural network for recognizing different types of images to be recognized can be trained.

The target neural network training here can be the correspondence between the text image and the text on the image. Before the training of the target neural network, the training samples can be pre-labeled to obtain the network parameter values of the target neural network according to the above corresponding relationship training. In this way, the image to be recognized can be input into the trained target neural network. Recognition by recognizing images, for example, when the image to be recognized includes both printed text and handwritten text, high-precision recognition of printed text and handwritten text can be performed at the same time.

The neural network training method provided by the embodiments of the present disclosure can determine the target training samples according to the following steps:

Step 1. Determine the number of samples corresponding to each type of training sample based on the target sampling ratio between different types of training samples and the number of training samples required for each training;

Step 2: Read the training samples from the obtained training samples of each type according to the determined number of samples.

Here, the number of training samples required for different training sessions may be the same or different. In specific applications, as the number of training times increases, the number of required training samples can increase proportionally.

In the embodiment of the present disclosure, for each training, in the case of determining the number of training samples required for this training, based on the target sampling ratio between different types of training samples, it can be determined that each type of training samples corresponds to number of samples. Here we still take the text recognition of three types of text images as an example, which are 10,000 printed text images, 500 handwritten text images, and 3,000 natural scene text images. Assuming that the number of training samples required for the current training is 1000, and the target sampling ratio between the three types of text images is 1:1:1, the selected three types of target training samples can be respectively It is 333 sheets.

In practical applications, the number of training samples required for each training can usually be the same, which is mainly to reduce as much as possible the different impacts of the unbalanced number of training samples on the performance evaluation between different rounds of training. Here, the total number of training samples and the total number of training times corresponding to at least two types of training samples can be determined first, and then based on the total number of training samples and the total number of training times, the number of training samples required for each training can be determined.

Still taking text recognition as an example here, in the case of determining a total of 13,500 training samples, if a total of 10 trainings are performed, the number of training samples required for each training is 1,350.

In order to obtain higher recognition accuracy, the sampling ratio range can be used here to realize the search of the target sampling ratio, which can be achieved through the following steps:

Step 1. In the case of receiving the training task, read the sampling ratio range set for different types of training samples from the training configuration file;

Step 2. In each training, select the target sampling ratio from the sampling ratio range.

In the embodiment of the present disclosure, the training configuration file is automatically invoked with the development of the training task. The target sampling ratio selected for each training can be determined by reading the sampling ratio range set in the training configuration file.

In the application, the above-mentioned sampling ratio range can be set corresponding to the target sampling ratio of each training, or a minimum sampling ratio and a maximum sampling ratio can be set, and the target sampling ratio of each training can be randomly selected from the above-mentioned minimum sampling ratio and the maximum sampling ratio, in addition, it can also be done in combination with the preset adjustment step setting.

Here, the corresponding sampling ratio range can be determined based on the minimum sampling ratio and the maximum sampling ratio, and the relevant sampling ratio range and preset adjustment step size can be determined in combination with different training requirements. For example, in the case of intending to achieve a high recognition rate for a specific type of sample, the sampling ratio range can be biased towards this specific type of sample; for example, in the case of intending to achieve a higher recognition rate for the overall sample, The range of sampling scales can be averaged across the various types of samples.

Taking two types of training samples as an example, in the case of intending to achieve a high recognition rate for the first type of training samples, the range of sampling ratios that can be set is (0.6, 0.9). In the case of high recognition rate, the sampling ratio range that can be set is (0.4, 0.6).

The smaller the preset adjustment step size, the higher the accuracy of the trained neural network can be to a certain extent, but it will consume more calculations. Adjust the step size to set it. For example, the adjustment step can be set to 0.1, and in the case of a sampling ratio range of (0.6, 0.9), the minimum sampling ratio of 0.6 can be incremented according to the step size until the maximum sampling ratio is traversed; another example can be Set the adjustment step to 0.01, and the specific traversal process is similar to the process described above, and will not be repeated here.

In a specific application, after each training is completed, the target sampling ratio used in the previous training can be adjusted based on the preset adjustment step within the range of the sampling ratio to obtain the target sampling ratio used in this training.

In order to facilitate the understanding of the above-mentioned process of adjusting the target sampling ratio in combination with the preset adjustment step size for training the target neural network, two types of training samples may be used as examples for illustration.

Here, the minimum sampling ratio (such as 0.4) for the two types of training samples set in the training configuration file can be used as the target sampling ratio for the first training, and the first training can be read from the obtained two types of training samples. Target training samples, and based on the read target training samples, perform the step of training the target neural network for the first time; in the case of completing the first training, adjust the minimum sampling ratio based on the preset adjustment step to obtain the adjusted sampling ratio, And as the target sampling ratio corresponding to the next training; then read the target training samples for the next training from the two types of training samples obtained according to the target sampling ratio corresponding to the next training, and based on the read target training Samples are used for the next step of training the target neural network, and so on until the maximum sampling ratio (such as 0.9) is reached.

Here, based on the read target training samples of different types, the target neural network can be trained. In the case that the image to be recognized is a text image, the target training sample can be a target image sample, so, for each type of target image sample, the target image sample is used as the input of the neural network to be trained, and the target image The pre-labeled text of the sample is used as the output of the target neural network to be trained, and the target neural network used to recognize different types of text images is trained.

It can be seen that the target neural network in the embodiment of the present disclosure trains the corresponding relationship between the input image and the marked text, and based on this corresponding relationship, the network parameters of the target neural network can be determined, and then different types of text can be realized. High-precision recognition of images.

In the case where the target neural network is trained, the embodiment of the present disclosure can input the acquired image to be recognized into the trained target neural network, and can output the image processing result, where the image processing result can be identified from the image to be recognized Get the text content.

Those skilled in the art can understand that in the above method of specific implementation, the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process. The specific execution order of each step should be based on its function and possible The inner logic is OK.

Based on the same inventive concept, the embodiment of the present disclosure also provides a device corresponding to the method. Since the problem-solving principle of the device in the embodiment of the present disclosure is similar to the above-mentioned method of the embodiment of the present disclosure, the implementation of the device can refer to the implementation of the method. Repeated points will not be repeated.

Referring to FIG. 2 , it is a schematic diagram of a neural network training device provided by an embodiment of the present disclosure. The device includes: an acquisition module 201, a reading module 202, and a training module 203; wherein,

An acquisition module 201, configured to acquire at least two types of training samples;

The reading module 202 is used to read the target training samples for each training in multiple trainings from the obtained at least two types of training samples based on the target sampling ratio between different types of training samples in the at least two types ;wherein, the number of different types of target training samples read each time complies with the target sampling ratio;

The training module 203 is configured to train the target neural network based on the read target training samples, and the target neural network is used to identify different types of images to be recognized.

Using the above-mentioned neural network training device, in the case of obtaining different types of training samples, based on the target sampling ratio between different types of training samples, the training data used for each training can be read from the obtained different types of training samples. The target training samples are used to train the target neural network based on the read target training samples. Since the target sampling ratio between different types of training samples can well control the number of different types of training samples selected, this can reduce the impact of directly mixing training samples with a large gap in the amount of data on feature learning to a certain extent, and improve The recognition accuracy of the target neural network.

In a possible implementation manner, the reading module 202 is configured to read from the obtained at least two types of training samples based on the target sampling ratio between different types of training samples in the at least two types according to the following steps: Target training samples for each of the multiple training sessions:

determining the number of samples corresponding to each type of training sample based on a target sampling ratio between different types of training samples of at least two types, and the number of training samples required for each training;

In a possible implementation manner, the reading module 202 is configured to determine the number of training samples required for each training according to the following steps:

Determine the total amount of training samples and the total number of training times corresponding to at least two types of training samples;

Based on the total number of training samples and the total number of training times, determine the number of training samples required for each training.

In a possible implementation manner, the reading module 202 is configured to determine the target sampling ratio between different types of training samples according to the following steps:

At each training session, a target sampling ratio is chosen from a range of sampling ratios.

In a possible implementation manner, the reading module 202 is configured to select the target sampling ratio from the sampling ratio range in each training according to the following steps:

After a training session is completed, within the range of the sampling rate, the target sampling rate used in the previous training is adjusted based on the preset adjustment step to obtain the target sampling rate used in the current training.

In a possible implementation manner, the obtaining module 201 is configured to obtain at least two types of training samples according to the following steps:

Based on the corresponding relationship between the type of training samples and each pre-configured storage file, read the corresponding type of training samples from each storage file.

In a possible implementation manner, the image to be recognized includes a text image, and the target training sample includes a target image sample; the training module 203 is configured to train the target neural network based on the read target training sample according to the following steps:

Referring to FIG. 3 , which is a schematic diagram of an image processing device provided by an embodiment of the present disclosure, the device includes: an acquisition module 301 and a processing module 302; wherein,

An acquisition module 301, configured to acquire an image to be identified;

The processing module 302 is configured to input the image to be recognized into the target neural network trained by the above neural network training method, and output the image processing result.

For the description of the processing flow of each module in the device and the interaction flow between the modules, reference may be made to the relevant description in the above method embodiment, and details will not be described here.

An embodiment of the present disclosure also provides an electronic device, as shown in FIG. 4 , which is a schematic structural diagram of the electronic device provided by the embodiment of the present disclosure, including: a processor 401 , a memory 402 , and a bus 403 . The memory 402 stores machine-readable instructions executable by the processor 401 (for example, execution instructions corresponding to the acquisition module 201, the reading module 202, and the training module 203 in the device in FIG. 2 ), and when the electronic device is running, the processor 401 communicates with the memory 402 through the bus 403, and when the machine-readable instructions are executed by the processor 401, the following processes are performed:

Obtain at least two types of training samples;

Based on the target sampling ratio between different types of training samples in at least two types, read the target training samples for each training in multiple trainings from the obtained at least two types of training samples; wherein, each read The number of different types of target training samples conforms to the target sampling ratio;

Based on the read target training samples, the target neural network is trained, and the target neural network is used to identify different types of images to be recognized.

The embodiment of the present disclosure also provides another electronic device, as shown in FIG. 5 , which is a schematic structural diagram of the electronic device provided by the embodiment of the present disclosure, including: a processor 501 , a memory 502 , and a bus 503 . The memory 502 stores machine-readable instructions executable by the processor 501 (for example, execution instructions corresponding to the acquisition module 301 and the processing module 302 in the device in FIG. Communication between the bus 503, the machine-readable instructions are executed by the processor 501 to perform the following processing:

Obtain the image to be recognized;

Input the image to be recognized into the target neural network trained by the above neural network training method, and output the image processing result.

Embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the methods described in the foregoing method embodiments are executed. Wherein, the storage medium may be a volatile or non-volatile computer-readable storage medium.

Embodiments of the present disclosure also provide a computer program product, the computer program product carries a program code, and the instructions included in the program code can be used to execute the steps of the method described in the above method embodiment, for details, please refer to the above method embodiment , which will not be repeated here.

Wherein, the above-mentioned computer program product may be specifically implemented by means of hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.

Those skilled in the art can clearly understand that for the convenience and brevity of description, the specific working process of the above-described system and device can refer to the corresponding process in the foregoing method embodiments, which will not be repeated here. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some communication interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

If the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor. Based on this understanding, the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

Finally, it should be noted that: the above-mentioned embodiments are only specific implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure, rather than limit them, and the protection scope of the present disclosure is not limited thereto, although referring to the aforementioned The embodiments have described the present disclosure in detail, and those skilled in the art should understand that any person familiar with the technical field can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present disclosure Changes can be easily imagined, or equivalent replacements can be made to some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be included in this disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be defined by the protection scope of the claims.

Claims

A method for neural network training, characterized in that the method comprises:

Obtain at least two types of training samples;

Based on the target sampling ratio between different types of training samples in the at least two types, read the target training samples for each training in multiple trainings from the acquired training samples of the at least two types; wherein, The number of target training samples of different types read each time complies with the target sampling ratio;

Based on the read target training samples, a target neural network is trained, and the target neural network is used to identify different types of images to be identified.
The method according to claim 1, characterized in that, based on the target sampling ratio between different types of training samples in the at least two types, read from the acquired training samples of at least two types The target training samples for each training in multiple trainings, including:

determining the number of samples corresponding to each type of training sample based on the target sampling ratio between different types of training samples of the at least two types and the number of training samples required for each training;

According to the determined number of samples, the training samples are read from the acquired training samples of each type.
The method according to claim 2, wherein the number of training samples required for each training is determined according to the following steps:

determining the total amount of training samples and the total number of training times corresponding to the at least two types of training samples;

Based on the total amount of training samples and the total number of training times, the number of training samples required for each training is determined.
The method according to claim 1, wherein the target sampling ratio between different types of training samples is determined according to the following steps:

In the case of receiving the training task, read the sampling ratio range set for different types of training samples from the training configuration file;

In each training, the target sampling ratio is selected from the sampling ratio range.
The method according to claim 4, wherein, in each training, selecting the target sampling ratio from the sampling ratio range includes:

After a training session is completed, within the range of the sampling ratio, the target sampling ratio used in the previous training is adjusted based on the preset adjustment step size to obtain the target sampling ratio used in the current training.
The method according to any one of claims 1 to 5, wherein said obtaining at least two types of training samples comprises:

Based on the corresponding relationship between the type of the training sample and each pre-configured storage file, the corresponding type of training sample is read from each storage file.
The method according to any one of claims 1 to 6, wherein the image to be recognized includes a text image; and the training of the target neural network based on the read target training sample includes:

For each type of text image, the text image is used as the input of the neural network to be trained, and the pre-labeled text for the text image is used as the output of the target neural network to be trained, and the training is used for different types of text images. Recognized target neural network.
A method for image processing, characterized in that the method comprises:

Obtain the image to be recognized;

The image to be recognized is input into the target neural network trained by the method described in any one of claims 1 to 7, and the image processing result is output.
A device for neural network training, characterized in that the device comprises:

An acquisition module, configured to acquire at least two types of training samples;

A reading module, configured to read the target sampling ratio between the at least two types of training samples of different types in the at least two types, and read the training samples used for each training in multiple training sessions from the acquired training samples. Target training samples; wherein, the number of target training samples of different types read each time complies with the target sampling ratio;

The training module is configured to train a target neural network based on the read target training samples, and the target neural network is used to identify different types of images to be identified.
An image processing device, characterized in that the device comprises:

An acquisition module, configured to acquire an image to be identified;

The processing module is used to input the image to be recognized into the target neural network obtained by using the method training described in any one of claims 1 to 7, and output the image processing result.
An electronic device, characterized in that it includes: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the connection between the processor and the memory Communication through the bus, when the machine-readable instructions are executed by the processor, execute the steps of the neural network training method according to any one of claims 1 to 7 or the image processing method according to claim 8 step.
A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the neural network training method according to any one of claims 1 to 7 is executed. Step or the step of the image processing method as claimed in claim 8.