WO2023040629A1 - Neural network training method and apparatus, image processing method and apparatus, and device and storage medium - Google Patents

Neural network training method and apparatus, image processing method and apparatus, and device and storage medium Download PDF

Info

Publication number
WO2023040629A1
WO2023040629A1 PCT/CN2022/114983 CN2022114983W WO2023040629A1 WO 2023040629 A1 WO2023040629 A1 WO 2023040629A1 CN 2022114983 W CN2022114983 W CN 2022114983W WO 2023040629 A1 WO2023040629 A1 WO 2023040629A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
target
training samples
neural network
sampling ratio
Prior art date
Application number
PCT/CN2022/114983
Other languages
French (fr)
Chinese (zh)
Inventor
张正夫
梁鼎
吴一超
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023040629A1 publication Critical patent/WO2023040629A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to the technical field of character recognition, and in particular, to a method, device, device and storage medium for neural network training and image processing.
  • Image text recognition is the process of converting text images into a series of symbols that can be represented and processed by a computer.
  • the current models used for text and image recognition are often difficult to apply to a variety of usage scenarios after training, resulting in a model with a single function and difficult to adapt to the needs of diverse scenarios.
  • Embodiments of the present disclosure at least provide a method, device, device, and storage medium for neural network training and image processing.
  • an embodiment of the present disclosure provides a method for training a neural network, the method comprising:
  • a target neural network is trained, and the target neural network is used to identify different types of images to be identified.
  • the training data used for each training can be read from the different types of training samples obtained.
  • the target training samples are used to train the target neural network based on the read target training samples. Since the target sampling ratio between different types of training samples can well control the number of different types of training samples selected, this can reduce the impact of directly mixing training samples with a large gap in the amount of data on feature learning to a certain extent, and improve The recognition accuracy of the target neural network.
  • Target training samples for each training including:
  • the training samples are read from the acquired training samples of each type.
  • the number of samples corresponding to each type of training sample can be determined.
  • the target The smaller the sampling ratio, the smaller the corresponding sampling quantity, so that the read training samples can meet the needs of each training.
  • the number of training samples required for each training is determined according to the following steps:
  • the number of training samples required for each training is determined.
  • the target sampling ratio between different types of training samples is determined according to the following steps:
  • the target sampling ratio is selected from the sampling ratio range.
  • the target sampling ratio in each training process can be realized based on the configuration of the sampling ratio range in the training configuration file, that is, one configuration can complete the sample selection operation in the entire training process, while ensuring the recognition accuracy, It also improves the efficiency of recognition.
  • selecting the target sampling ratio from the sampling ratio range includes:
  • the target sampling ratio used in the previous training is adjusted based on the preset adjustment step size to obtain the target sampling ratio used in the current training.
  • the acquiring at least two types of training samples includes:
  • the corresponding type of training sample is read from each storage file.
  • the acquisition of training samples can be realized based on the correspondence between the type of training samples and the storage file.
  • the acquisition of samples can be realized quickly, further improving the efficiency of network training.
  • the image to be recognized includes a text image
  • the target training sample includes a target image sample
  • the training of a target neural network based on the read target training sample includes:
  • the target image sample is used as the input of the neural network to be trained, and the pre-labeled text for the target image sample is used as the output of the target neural network to be trained, and the training is used for different types of A target neural network for text image recognition.
  • an embodiment of the present disclosure also provides an image processing method, the method comprising:
  • the image to be recognized is input into the target neural network trained by the method described in any one of the first aspect and its various modes, and an image processing result is output.
  • the embodiment of the present disclosure also provides a neural network training device, the device comprising:
  • An acquisition module configured to acquire at least two types of training samples
  • a reading module configured to read the target sampling ratio between the at least two types of training samples of different types in the at least two types, and read the training samples used for each training in multiple training sessions from the acquired training samples.
  • Target training samples wherein, the number of target training samples of different types read each time complies with the target sampling ratio;
  • the training module is configured to train a target neural network based on the read target training samples, and the target neural network is used to identify different types of images to be identified.
  • an embodiment of the present disclosure further provides an image processing device, the device comprising:
  • An acquisition module configured to acquire an image to be identified
  • the processing module is configured to input the image to be recognized into the target neural network trained by the method described in any one of the first aspect and its various modes, and output an image processing result.
  • an embodiment of the present disclosure further provides an electronic device, including: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the The processor communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, the steps of the neural network training method as described in the first aspect and any of its various implementations are executed or as described in The steps of the image processing method described in the second aspect.
  • the embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor as in the first aspect and its various implementation modes Any of the steps of the neural network training method or the steps of the image processing method described in the second aspect.
  • FIG. 1 shows a flowchart of a method for neural network training provided by an embodiment of the present disclosure
  • FIG. 2 shows a schematic diagram of a neural network training device provided by an embodiment of the present disclosure
  • FIG. 3 shows a schematic diagram of an image processing device provided by an embodiment of the present disclosure
  • FIG. 4 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure
  • Fig. 5 shows a schematic diagram of another electronic device provided by an embodiment of the present disclosure.
  • text images can be divided into different types according to different visual features, for example, images with printed text, images with handwritten text, and images with natural scene text.
  • a text image (such as an answer sheet image) may contain both printed text and handwritten text. This requires that the trained text recognition model has high recognition accuracy for various types.
  • the present disclosure provides a neural network training and image processing method, device, device, and storage medium to improve recognition accuracy.
  • the execution subject of the neural network training method provided in the embodiments of the present disclosure is generally a computer with certain computing power equipment
  • the computer equipment includes, for example: terminal equipment or server or other processing equipment
  • the terminal equipment can be user equipment (User Equipment, UE), mobile equipment, user terminal, terminal, cellular phone, cordless phone, personal digital assistant (Personal Digital Assistant) Assistant, PDA), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc.
  • the neural network training method can be realized by calling a computer-readable instruction stored in a memory by a processor.
  • FIG. 1 it is a flowchart of a neural network training method provided by an embodiment of the present disclosure.
  • the method includes steps S101 to S103:
  • S101 Acquire at least two types of training samples.
  • S102 Based on the target sampling ratio between different types of training samples in at least two types, read the target training samples used for each training in multiple trainings from the obtained at least two types of training samples; wherein, each time The number of different types of target training samples read conforms to the target sampling ratio.
  • S103 Based on the read target training samples, train the target neural network, where the target neural network is used to identify different types of images to be recognized.
  • the above-mentioned neural network training method can be mainly applied in application fields that need to complete mixed training among various types of training samples.
  • the corresponding application fields are different, and the training samples here are also different.
  • it can be applied to the field of text recognition, and the training samples correspond to text images, which can include printed text images, handwritten text images, natural scene text images, and the like.
  • text images which can include printed text images, handwritten text images, natural scene text images, and the like.
  • the text image type (that is, the type of training sample referred to in S101) refers to the type of image containing text, where the text can be printed text, handwritten text, natural scene text, etc., different text It can correspond to different types of text images, for example, printed text, handwritten text, and natural scene text can correspond to printed text images, handwritten text images, and natural scene text images.
  • the type of text image may refer to a method of generating text image or a channel for obtaining text image, etc., and the specific type and quantity of text image types are not limited here.
  • the recognition accuracy of text images will not only be affected by the amount of data, but may also be affected by the difficulty of image processing. For example, compared with printed text images, the presence of personalized handwriting will lead to handwritten text image processing. It is more difficult. In addition, the recognition accuracy will also be affected by various factors, which will not be repeated here.
  • the embodiment of the present disclosure provides a scheme of reading the target training samples for each training based on the target sampling ratio among different types of training samples, and then training the target neural network based on the read target training samples.
  • different types of training samples in the embodiments of the present disclosure correspond to at least two types of training samples, which can be read from different storage files in specific applications.
  • different types of training samples can be correspondingly stored in different storage files, for example, different file search paths are set for different storage files, and training samples of the same type can be recorded in the same file search Under the path, different types of training samples can be recorded under different file search paths. In this way, in the case that different types of training samples are required, corresponding types of training samples can be read from different storage files based on the data reader.
  • the relevant target sampling ratio may be preset based on different application scenarios, or may be preset for each training in a specific application scenario. In addition, it can also be automatically generated under the constraints of the sampling ratio range and the preset adjustment step size. Embodiments of the present disclosure do not specifically limit this.
  • the target sampling ratio can be used as a fixed parameter in the training task. For example, 1:1, 1:2, etc., which are pre-set sampling ratios based on application scenario requirements, can be used in each training process.
  • the target sampling ratio can also be used as a semi-automatic parameter in the training task. For example, based on factors such as the size of the data volume that affect the number of training times, the sampling ratio is preset to be adjusted every time the preset number of training times is reached. For example, every 10 Adjust a new sampling ratio for each training session.
  • the target sampling ratio can also be used as a fully automatic parameter in the training task, for example, a new sampling ratio is adjusted every time training is performed.
  • the target training samples for each training may be read from the acquired training samples of different types based on the target sampling ratio.
  • the number of target training samples read here is in line with the target sampling ratio.
  • the target sampling ratio between the three types of text images is 1:1:1
  • the number of the three types of target training samples read here can be 300, assuming that the three types of text images In the case where the target sampling ratio between is 3:1:2, the numbers of the three types of target training samples read here may correspond to 300, 100 and 200 samples respectively.
  • the target neural network for recognizing different types of images to be recognized can be trained.
  • the target neural network training here can be the correspondence between the text image and the text on the image.
  • the training samples can be pre-labeled to obtain the network parameter values of the target neural network according to the above corresponding relationship training.
  • the image to be recognized can be input into the trained target neural network.
  • Recognition by recognizing images, for example, when the image to be recognized includes both printed text and handwritten text, high-precision recognition of printed text and handwritten text can be performed at the same time.
  • Step 1 Determine the number of samples corresponding to each type of training sample based on the target sampling ratio between different types of training samples and the number of training samples required for each training;
  • Step 2 Read the training samples from the obtained training samples of each type according to the determined number of samples.
  • the number of training samples required for different training sessions may be the same or different.
  • the number of required training samples can increase proportionally.
  • each type of training samples corresponds to number of samples.
  • the text recognition of three types of text images as an example, which are 10,000 printed text images, 500 handwritten text images, and 3,000 natural scene text images. Assuming that the number of training samples required for the current training is 1000, and the target sampling ratio between the three types of text images is 1:1:1, the selected three types of target training samples can be respectively It is 333 sheets.
  • the number of training samples required for each training can usually be the same, which is mainly to reduce as much as possible the different impacts of the unbalanced number of training samples on the performance evaluation between different rounds of training.
  • the total number of training samples and the total number of training times corresponding to at least two types of training samples can be determined first, and then based on the total number of training samples and the total number of training times, the number of training samples required for each training can be determined.
  • the sampling ratio range can be used here to realize the search of the target sampling ratio, which can be achieved through the following steps:
  • Step 1 In the case of receiving the training task, read the sampling ratio range set for different types of training samples from the training configuration file;
  • Step 2 In each training, select the target sampling ratio from the sampling ratio range.
  • the training configuration file is automatically invoked with the development of the training task.
  • the target sampling ratio selected for each training can be determined by reading the sampling ratio range set in the training configuration file.
  • the above-mentioned sampling ratio range can be set corresponding to the target sampling ratio of each training, or a minimum sampling ratio and a maximum sampling ratio can be set, and the target sampling ratio of each training can be randomly selected from the above-mentioned minimum sampling ratio and the maximum sampling ratio, in addition, it can also be done in combination with the preset adjustment step setting.
  • the corresponding sampling ratio range can be determined based on the minimum sampling ratio and the maximum sampling ratio, and the relevant sampling ratio range and preset adjustment step size can be determined in combination with different training requirements. For example, in the case of intending to achieve a high recognition rate for a specific type of sample, the sampling ratio range can be biased towards this specific type of sample; for example, in the case of intending to achieve a higher recognition rate for the overall sample, The range of sampling scales can be averaged across the various types of samples.
  • the range of sampling ratios that can be set is (0.6, 0.9).
  • the sampling ratio range that can be set is (0.4, 0.6).
  • the adjustment step can be set to 0.1, and in the case of a sampling ratio range of (0.6, 0.9), the minimum sampling ratio of 0.6 can be incremented according to the step size until the maximum sampling ratio is traversed; another example can be Set the adjustment step to 0.01, and the specific traversal process is similar to the process described above, and will not be repeated here.
  • the target sampling ratio used in the previous training can be adjusted based on the preset adjustment step within the range of the sampling ratio to obtain the target sampling ratio used in this training.
  • two types of training samples may be used as examples for illustration.
  • the minimum sampling ratio (such as 0.4) for the two types of training samples set in the training configuration file can be used as the target sampling ratio for the first training, and the first training can be read from the obtained two types of training samples.
  • Target training samples and based on the read target training samples, perform the step of training the target neural network for the first time; in the case of completing the first training, adjust the minimum sampling ratio based on the preset adjustment step to obtain the adjusted sampling ratio, And as the target sampling ratio corresponding to the next training; then read the target training samples for the next training from the two types of training samples obtained according to the target sampling ratio corresponding to the next training, and based on the read target training Samples are used for the next step of training the target neural network, and so on until the maximum sampling ratio (such as 0.9) is reached.
  • the maximum sampling ratio such as 0.9
  • the target neural network can be trained.
  • the image to be recognized is a text image
  • the target training sample can be a target image sample, so, for each type of target image sample, the target image sample is used as the input of the neural network to be trained, and the target image The pre-labeled text of the sample is used as the output of the target neural network to be trained, and the target neural network used to recognize different types of text images is trained.
  • the target neural network in the embodiment of the present disclosure trains the corresponding relationship between the input image and the marked text, and based on this corresponding relationship, the network parameters of the target neural network can be determined, and then different types of text can be realized. High-precision recognition of images.
  • the embodiment of the present disclosure can input the acquired image to be recognized into the trained target neural network, and can output the image processing result, where the image processing result can be identified from the image to be recognized Get the text content.
  • the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possible
  • the inner logic is OK.
  • the embodiment of the present disclosure also provides a device corresponding to the method. Since the problem-solving principle of the device in the embodiment of the present disclosure is similar to the above-mentioned method of the embodiment of the present disclosure, the implementation of the device can refer to the implementation of the method. Repeated points will not be repeated.
  • FIG. 2 it is a schematic diagram of a neural network training device provided by an embodiment of the present disclosure.
  • the device includes: an acquisition module 201, a reading module 202, and a training module 203; wherein,
  • An acquisition module 201 configured to acquire at least two types of training samples
  • the reading module 202 is used to read the target training samples for each training in multiple trainings from the obtained at least two types of training samples based on the target sampling ratio between different types of training samples in the at least two types ;wherein, the number of different types of target training samples read each time complies with the target sampling ratio;
  • the training module 203 is configured to train the target neural network based on the read target training samples, and the target neural network is used to identify different types of images to be recognized.
  • the training data used for each training can be read from the obtained different types of training samples.
  • the target training samples are used to train the target neural network based on the read target training samples. Since the target sampling ratio between different types of training samples can well control the number of different types of training samples selected, this can reduce the impact of directly mixing training samples with a large gap in the amount of data on feature learning to a certain extent, and improve The recognition accuracy of the target neural network.
  • the reading module 202 is configured to read from the obtained at least two types of training samples based on the target sampling ratio between different types of training samples in the at least two types according to the following steps: Target training samples for each of the multiple training sessions:
  • the training samples are read from the acquired training samples of each type.
  • the reading module 202 is configured to determine the number of training samples required for each training according to the following steps:
  • the reading module 202 is configured to determine the target sampling ratio between different types of training samples according to the following steps:
  • a target sampling ratio is chosen from a range of sampling ratios.
  • the reading module 202 is configured to select the target sampling ratio from the sampling ratio range in each training according to the following steps:
  • the target sampling rate used in the previous training is adjusted based on the preset adjustment step to obtain the target sampling rate used in the current training.
  • the obtaining module 201 is configured to obtain at least two types of training samples according to the following steps:
  • the image to be recognized includes a text image
  • the target training sample includes a target image sample
  • the training module 203 is configured to train the target neural network based on the read target training sample according to the following steps:
  • the target image sample is used as the input of the neural network to be trained, and the pre-labeled text for the target image sample is used as the output of the target neural network to be trained, and the training is used for different types of A target neural network for text image recognition.
  • FIG. 3 is a schematic diagram of an image processing device provided by an embodiment of the present disclosure
  • the device includes: an acquisition module 301 and a processing module 302; wherein,
  • An acquisition module 301 configured to acquire an image to be identified
  • the processing module 302 is configured to input the image to be recognized into the target neural network trained by the above neural network training method, and output the image processing result.
  • FIG. 4 is a schematic structural diagram of the electronic device provided by the embodiment of the present disclosure, including: a processor 401 , a memory 402 , and a bus 403 .
  • the memory 402 stores machine-readable instructions executable by the processor 401 (for example, execution instructions corresponding to the acquisition module 201, the reading module 202, and the training module 203 in the device in FIG. 2 ), and when the electronic device is running, the processor 401 communicates with the memory 402 through the bus 403, and when the machine-readable instructions are executed by the processor 401, the following processes are performed:
  • the target neural network is trained, and the target neural network is used to identify different types of images to be recognized.
  • the embodiment of the present disclosure also provides another electronic device, as shown in FIG. 5 , which is a schematic structural diagram of the electronic device provided by the embodiment of the present disclosure, including: a processor 501 , a memory 502 , and a bus 503 .
  • the memory 502 stores machine-readable instructions executable by the processor 501 (for example, execution instructions corresponding to the acquisition module 301 and the processing module 302 in the device in FIG. Communication between the bus 503, the machine-readable instructions are executed by the processor 501 to perform the following processing:
  • Embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the methods described in the foregoing method embodiments are executed.
  • the storage medium may be a volatile or non-volatile computer-readable storage medium.
  • Embodiments of the present disclosure also provide a computer program product, the computer program product carries a program code, and the instructions included in the program code can be used to execute the steps of the method described in the above method embodiment, for details, please refer to the above method embodiment , which will not be repeated here.
  • the above-mentioned computer program product may be specifically implemented by means of hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.
  • a software development kit Software Development Kit, SDK
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor.
  • the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

A neural network training method and apparatus, an image processing method and apparatus, and a device and a storage medium. The neural network training method comprises: acquiring training samples of at least two types (S101); on the basis of a target sampling ratio between the training samples of different types in the at least two types, reading, from the acquired training samples of the at least two types, target training samples for each of multiple instances of training, wherein the number of the target training samples of different types that are read each time is in line with the target sampling ratio (S102); and training a target neural network on the basis of the target training samples that are read, wherein the target neural network is used to identify images to be identified of different types (S103).

Description

神经网络训练及图像处理的方法、装置、设备及存储介质Method, device, equipment and storage medium for neural network training and image processing
相关申请的交叉引用Cross References to Related Applications
本申请要求于2021年09月18日提交的、申请号为202111098594.0的中国专利申请的优先权,该中国专利申请公开的全部内容以引用的方式并入本文中。This application claims the priority of the Chinese patent application with application number 202111098594.0 filed on September 18, 2021, and the entire disclosure of the Chinese patent application is incorporated herein by reference.
技术领域technical field
本公开涉及文字识别技术领域,具体而言,涉及一种神经网络训练及图像处理的方法、装置、设备及存储介质。The present disclosure relates to the technical field of character recognition, and in particular, to a method, device, device and storage medium for neural network training and image processing.
背景技术Background technique
随着图像技术的迅速发展和市场需求的逐渐扩大,图像文字识别技术受到了广泛关注。图像文字识别即是将文字图像转换为一系列符号的过程,这些符号可以由计算机表示和处理。但目前用于进行文字图像识别的模型一经训练后,往往难以适用于多种使用场景,导致模型功能单一,难以适配多样化的场景需求。With the rapid development of image technology and the gradual expansion of market demand, image character recognition technology has received extensive attention. Image text recognition is the process of converting text images into a series of symbols that can be represented and processed by a computer. However, the current models used for text and image recognition are often difficult to apply to a variety of usage scenarios after training, resulting in a model with a single function and difficult to adapt to the needs of diverse scenarios.
发明内容Contents of the invention
本公开实施例至少提供一种神经网络训练及图像处理的方法、装置、设备及存储介质。Embodiments of the present disclosure at least provide a method, device, device, and storage medium for neural network training and image processing.
第一方面,本公开实施例提供了一种神经网络训练的方法,所述方法包括:In a first aspect, an embodiment of the present disclosure provides a method for training a neural network, the method comprising:
获取至少两种类型的训练样本;Obtain at least two types of training samples;
基于所述至少两种类型中不同类型的训练样本之间的目标采样比例,从获取的所述至少两种类型的训练样本中读取多次训练中每次训练用的目标训练样本;其中,每次读取的所述不同类型的目标训练样本的数量符合所述目标采样比例;Based on the target sampling ratio between different types of training samples in the at least two types, read the target training samples for each training in multiple trainings from the acquired training samples of the at least two types; wherein, The number of target training samples of different types read each time complies with the target sampling ratio;
基于读取的所述目标训练样本,训练目标神经网络,所述目标神经网络用于对不同类型的待识别图像进行识别。Based on the read target training samples, a target neural network is trained, and the target neural network is used to identify different types of images to be identified.
采用上述神经网络训练的方法,在获取到不同类型的训练样本的情况下,可以基于不同类型的训练样本之间的目标采样比例,从获取的不同类型的训练样本中读取每次训练用的目标训练样本,进而基于读取的目标训练样本,训练目标神经网络。由于不同类 型的训练样本之间的目标采样比例可以很好的控制不同类型的训练样本的选取数量,这一定程度上可以降低直接混合数据量差距比较大的训练样本对特征学习的影响,提升了目标神经网络的识别精度。Using the above neural network training method, in the case of obtaining different types of training samples, based on the target sampling ratio between different types of training samples, the training data used for each training can be read from the different types of training samples obtained. The target training samples are used to train the target neural network based on the read target training samples. Since the target sampling ratio between different types of training samples can well control the number of different types of training samples selected, this can reduce the impact of directly mixing training samples with a large gap in the amount of data on feature learning to a certain extent, and improve The recognition accuracy of the target neural network.
在一种可能的实施方式中,所述基于所述至少两种类型中不同类型的训练样本之间的目标采样比例,从获取的所述至少两种类型的训练样本中读取多次训练中每次训练用的目标训练样本,包括:In a possible implementation manner, based on the target sampling ratio between different types of training samples in the at least two types, reading multiple training samples from the acquired training samples of the at least two types Target training samples for each training, including:
基于不同类型的训练样本之间的目标采样比例,以及每次训练所需的训练样本数量,确定与每种类型的训练样本对应的采样数量;Determining the number of samples corresponding to each type of training sample based on the target sampling ratio between different types of training samples and the number of training samples required for each training;
按照确定的所述采样数量,从获取的每种类型的训练样本中读取训练样本。According to the determined number of samples, the training samples are read from the acquired training samples of each type.
这里,基于目标采样比例以及每次训练所需的训练样本数量,可以确定与每种类型的训练样本对应的采样数量,目标采样比例占比越大,所对应的采样数量也就越多,目标采样比例占比越小,所对应的采样数量也就越小,这样所读取的训练样本可以满足每次训练的需求。Here, based on the target sampling ratio and the number of training samples required for each training, the number of samples corresponding to each type of training sample can be determined. The larger the target sampling ratio is, the more corresponding samples will be. The target The smaller the sampling ratio, the smaller the corresponding sampling quantity, so that the read training samples can meet the needs of each training.
在一种可能的实施方式中,按照如下步骤确定每次训练所需的训练样本数量:In a possible implementation manner, the number of training samples required for each training is determined according to the following steps:
确定所述至少两种类型的训练样本对应的训练样本总量以及训练总次数;determining the total amount of training samples and the total number of training times corresponding to the at least two types of training samples;
基于所述训练样本总量以及训练总次数,确定每次训练所需的训练样本数量。Based on the total amount of training samples and the total number of training times, the number of training samples required for each training is determined.
在一种可能的实施方式中,按照如下步骤确定不同类型的训练样本之间的目标采样比例:In a possible implementation manner, the target sampling ratio between different types of training samples is determined according to the following steps:
在接收到训练任务的情况下,从训练配置文件中读取针对不同类型的训练样本设置的采样比例范围;In the case of receiving the training task, read the sampling ratio range set for different types of training samples from the training configuration file;
在每次训练中,从所述采样比例范围中选取所述目标采样比例。In each training, the target sampling ratio is selected from the sampling ratio range.
这里,可以基于训练配置文件中有关采样比例范围的配置来实现各次训练过程中的目标采样比例,也即,一次配置可以完成整个训练过程中的样本选取操作,在确保了识别精度的同时,还提升了识别的效率。Here, the target sampling ratio in each training process can be realized based on the configuration of the sampling ratio range in the training configuration file, that is, one configuration can complete the sample selection operation in the entire training process, while ensuring the recognition accuracy, It also improves the efficiency of recognition.
在一种可能的实施方式中,所述在每次训练中,从所述采样比例范围中选取所述目标采样比例,包括:In a possible implementation manner, in each training, selecting the target sampling ratio from the sampling ratio range includes:
在完成一次训练后,在所述采样比例范围内,基于预设调整步长对上一次训练使用的目标采样比例进行调整,得到本次训练使用的目标采样比例。After a training session is completed, within the range of the sampling ratio, the target sampling ratio used in the previous training is adjusted based on the preset adjustment step size to obtain the target sampling ratio used in the current training.
在一种可能的实施方式中,所述获取至少两种类型的训练样本,包括:In a possible implementation manner, the acquiring at least two types of training samples includes:
基于训练样本的类型与预先配置的各个存储文件之间的对应关系,从各个存储文件中读取对应类型的训练样本。Based on the corresponding relationship between the type of the training sample and each pre-configured storage file, the corresponding type of training sample is read from each storage file.
这里,可以基于训练样本的类型与存储文件之间的对应关系,实现训练样本的获取,在一种类型对应一个存储文件的情况下,可以快速的实现样本获取,进一步提升网络训练的效率。Here, the acquisition of training samples can be realized based on the correspondence between the type of training samples and the storage file. In the case of one type corresponding to one storage file, the acquisition of samples can be realized quickly, further improving the efficiency of network training.
在一种可能的实施方式中,所述待识别图像包括文字图像,所述目标训练样本包括目标图像样本;所述基于读取的所述目标训练样本,训练目标神经网络,包括:In a possible implementation manner, the image to be recognized includes a text image, and the target training sample includes a target image sample; the training of a target neural network based on the read target training sample includes:
针对每种类型的目标图像样本,将该目标图像样本作为待训练的神经网络的输入,将针对该目标图像样本的预先标注文字作为待训练的目标神经网络的输出,训练用于对不同类型的文字图像进行识别的目标神经网络。For each type of target image sample, the target image sample is used as the input of the neural network to be trained, and the pre-labeled text for the target image sample is used as the output of the target neural network to be trained, and the training is used for different types of A target neural network for text image recognition.
第二方面,本公开实施例还提供了一种图像处理的方法,所述方法包括:In a second aspect, an embodiment of the present disclosure also provides an image processing method, the method comprising:
获取待识别图像;Obtain the image to be recognized;
将所述待识别图像输入到利用第一方面及其各种方式任一所述的方法训练得到的目标神经网络中,输出图像处理结果。The image to be recognized is input into the target neural network trained by the method described in any one of the first aspect and its various modes, and an image processing result is output.
第三方面,本公开实施例还提供了一种神经网络训练的装置,所述装置包括:In the third aspect, the embodiment of the present disclosure also provides a neural network training device, the device comprising:
获取模块,用于获取至少两种类型的训练样本;An acquisition module, configured to acquire at least two types of training samples;
读取模块,用于基于所述至少两种类型中不同类型的训练样本之间的目标采样比例,从获取的所述至少两种类型的训练样本中读取多次训练中每次训练用的目标训练样本;其中,每次读取的所述不同类型的目标训练样本的数量符合所述目标采样比例;A reading module, configured to read the target sampling ratio between the at least two types of training samples of different types in the at least two types, and read the training samples used for each training in multiple training sessions from the acquired training samples. Target training samples; wherein, the number of target training samples of different types read each time complies with the target sampling ratio;
训练模块,用于基于读取的所述目标训练样本,训练目标神经网络,所述目标神经网络用于对不同类型的待识别图像进行识别。The training module is configured to train a target neural network based on the read target training samples, and the target neural network is used to identify different types of images to be identified.
第四方面,本公开实施例还提供了一种图像处理的装置,所述装置包括:In a fourth aspect, an embodiment of the present disclosure further provides an image processing device, the device comprising:
获取模块,用于获取待识别图像;An acquisition module, configured to acquire an image to be identified;
处理模块,用于将所述待识别图像输入到利用第一方面及其各种方式任一所述的方法训练得到的目标神经网络中,输出图像处理结果。The processing module is configured to input the image to be recognized into the target neural network trained by the method described in any one of the first aspect and its various modes, and output an image processing result.
第五方面,本公开实施例还提供了一种电子设备,包括:处理器、存储器和总线, 所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行如第一方面及其各种实施方式任一所述的神经网络训练的方法的步骤或者如第二方面所述的图像处理的方法的步骤。In a fifth aspect, an embodiment of the present disclosure further provides an electronic device, including: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the The processor communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, the steps of the neural network training method as described in the first aspect and any of its various implementations are executed or as described in The steps of the image processing method described in the second aspect.
第六方面,本公开实施例还提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行如第一方面及其各种实施方式任一所述的神经网络训练的方法的步骤或者如第二方面所述的图像处理的方法的步骤。In the sixth aspect, the embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor as in the first aspect and its various implementation modes Any of the steps of the neural network training method or the steps of the image processing method described in the second aspect.
关于上述装置、电子设备、及计算机可读存储介质的效果描述参见上述方法的说明,这里不再赘述。For the effect description of the above-mentioned device, electronic equipment, and computer-readable storage medium, refer to the description of the above-mentioned method, and details are not repeated here.
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments will be described in detail below together with the accompanying drawings.
附图说明Description of drawings
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the accompanying drawings used in the embodiments. The accompanying drawings here are incorporated into the specification and constitute a part of the specification. The drawings show the embodiments consistent with the present disclosure, and are used together with the description to explain the technical solution of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. For those skilled in the art, they can also make From these drawings other related drawings are obtained.
图1示出了本公开实施例所提供的一种神经网络训练的方法的流程图;FIG. 1 shows a flowchart of a method for neural network training provided by an embodiment of the present disclosure;
图2示出了本公开实施例所提供的一种神经网络训练的装置的示意图;FIG. 2 shows a schematic diagram of a neural network training device provided by an embodiment of the present disclosure;
图3示出了本公开实施例所提供的一种图像处理的装置的示意图;FIG. 3 shows a schematic diagram of an image processing device provided by an embodiment of the present disclosure;
图4示出了本公开实施例所提供的一种电子设备的示意图;FIG. 4 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure;
图5示出了本公开实施例所提供的另一种电子设备的示意图。Fig. 5 shows a schematic diagram of another electronic device provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中 附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only It is a part of the embodiments of the present disclosure, but not all of them. The components of the disclosed embodiments generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the claimed disclosure, but merely represents selected embodiments of the present disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative effort shall fall within the protection scope of the present disclosure.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that like numerals and letters denote similar items in the following figures, therefore, once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.
本文中术语“和/或”,仅仅是描述一种关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。The term "and/or" in this article only describes an association relationship, which means that there can be three kinds of relationships, for example, A and/or B can mean: there is A alone, A and B exist at the same time, and B exists alone. situation. In addition, the term "at least one" herein means any one of a variety or any combination of at least two of the more, for example, including at least one of A, B, and C, which may mean including from A, Any one or more elements selected from the set formed by B and C.
经研究发现,按照视觉特征的不同,可以将文字图像划分为不同的类型,例如可以是带有印刷文字的图像、带有手写文字的图像,还可以是带有自然场景文字的图像。在实际的应用场景中,一张文字图像(例如答题卷图像)上可能既带有印刷文字又带有手写文字。这需要所训练的文字识别模型对各种类型均具有较高的识别精度。It is found through research that text images can be divided into different types according to different visual features, for example, images with printed text, images with handwritten text, and images with natural scene text. In an actual application scenario, a text image (such as an answer sheet image) may contain both printed text and handwritten text. This requires that the trained text recognition model has high recognition accuracy for various types.
结合已有的适配单一应用场景的文字识别方案,可以将各种类型的文字图像作为训练样本,混合后进行文字识别模型的训练。然而,由于不同类型的文字图像的数据量很可能存在较大的差异,采用直接混合训练的方式将导致较少数据量的文字图像不能很好的进行特征学习,从而导致识别精度较低。Combined with the existing text recognition scheme adapted to a single application scenario, various types of text images can be used as training samples, and the text recognition model can be trained after mixing. However, since the amount of data of different types of text images is likely to be quite different, using a direct mixed training method will result in that text images with a small amount of data cannot perform feature learning well, resulting in low recognition accuracy.
基于上述研究,本公开提供了一种神经网络训练及图像处理的方法、装置、设备及存储介质,以提升识别精度。Based on the above research, the present disclosure provides a neural network training and image processing method, device, device, and storage medium to improve recognition accuracy.
为便于对本实施例进行理解,首先对本公开实施例所公开的一种神经网络训练的方法进行详细介绍,本公开实施例所提供的神经网络训练的方法的执行主体一般为具有一定计算能力的计算机设备,该计算机设备例如包括:终端设备或服务器或其它处理设备,终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字助理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备、可穿戴设备等。在一些可能的实现方式中,该神经网络训练的方法可以 通过处理器调用存储器中存储的计算机可读指令的方式来实现。In order to facilitate the understanding of this embodiment, a neural network training method disclosed in the embodiments of the present disclosure is first introduced in detail. The execution subject of the neural network training method provided in the embodiments of the present disclosure is generally a computer with certain computing power equipment, the computer equipment includes, for example: terminal equipment or server or other processing equipment, the terminal equipment can be user equipment (User Equipment, UE), mobile equipment, user terminal, terminal, cellular phone, cordless phone, personal digital assistant (Personal Digital Assistant) Assistant, PDA), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc. In some possible implementation manners, the neural network training method can be realized by calling a computer-readable instruction stored in a memory by a processor.
参见图1所示,为本公开实施例提供的神经网络训练的方法的流程图,方法包括步骤S101~S103:Referring to FIG. 1 , it is a flowchart of a neural network training method provided by an embodiment of the present disclosure. The method includes steps S101 to S103:
S101:获取至少两种类型的训练样本。S101: Acquire at least two types of training samples.
S102:基于至少两种类型中不同类型的训练样本之间的目标采样比例,从获取的至少两种类型的训练样本中读取多次训练中每次训练用的目标训练样本;其中,每次读取的不同类型的目标训练样本的数量符合目标采样比例。S102: Based on the target sampling ratio between different types of training samples in at least two types, read the target training samples used for each training in multiple trainings from the obtained at least two types of training samples; wherein, each time The number of different types of target training samples read conforms to the target sampling ratio.
S103:基于读取的目标训练样本,训练目标神经网络,目标神经网络用于对不同类型的待识别图像进行识别。S103: Based on the read target training samples, train the target neural network, where the target neural network is used to identify different types of images to be recognized.
为了便于理解本公开实施例提供的神经网络训练的方法,接下来首先对该方法的应用场景进行详细描述。上述神经网络训练的方法主要可以应用于需要完成各种类型的训练样本之间的混合训练的应用领域中,所对应的应用领域不同,这里的训练样本也不同。例如,可以应用于文字识别领域,训练样本对应为文字图像,可以包括印刷文字图像、手写文字图像、自然场景文字图像等。接下来多以文字识别领域进行示例说明。In order to facilitate the understanding of the neural network training method provided by the embodiment of the present disclosure, the application scenario of the method will be described in detail below. The above-mentioned neural network training method can be mainly applied in application fields that need to complete mixed training among various types of training samples. The corresponding application fields are different, and the training samples here are also different. For example, it can be applied to the field of text recognition, and the training samples correspond to text images, which can include printed text images, handwritten text images, natural scene text images, and the like. Next, we will give examples in the field of text recognition.
考虑到不同类型的文字图像存在较大的差异,直接将各种类型的文字图像进行混合后训练将很大可能性会导致具有不同数据量的文字图像类型之间存在不同的识别精度。在经过多轮神经网络训练的情况下,对于数据量较大的文字图像类型而言往往由于可以学习到更为丰富的图像特征而具有较高的识别精度,对于数据量较小的文字图像类型而言往往由于无法很好的学习到图像特征具有较低的识别精度。Considering that there are large differences in different types of text images, directly mixing various types of text images for training will very likely lead to different recognition accuracy among text image types with different data amounts. In the case of multiple rounds of neural network training, for the text image type with a large amount of data, it often has higher recognition accuracy because it can learn more abundant image features, and for the text image type with a small amount of data In terms of recognition, it often has low recognition accuracy due to the inability to learn image features well.
其中,文字图像类型(也即S101所指的训练样本的类型)指的是包含有文字的图像的类型,这里的文字可以是印刷类文字、手写类文字、自然场景类文字等,不同的文字可以对应不同的文字图像类型,例如,印刷类文字、手写类文字、自然场景类文字等可以对应于印刷文字图像、手写文字图像、自然场景文字图像等。实例性的,文字图像类型可以指向生成文字图像的方式或获取文字图像的渠道等,在此对于文字图像类型的具体种类、数量等不予限定。Wherein, the text image type (that is, the type of training sample referred to in S101) refers to the type of image containing text, where the text can be printed text, handwritten text, natural scene text, etc., different text It can correspond to different types of text images, for example, printed text, handwritten text, and natural scene text can correspond to printed text images, handwritten text images, and natural scene text images. Exemplarily, the type of text image may refer to a method of generating text image or a channel for obtaining text image, etc., and the specific type and quantity of text image types are not limited here.
需要说明的是,有关文字图像的识别精度不仅会受到数据量的影响,还可能受到图像处理难度的影响,例如,相对印刷文字图像而言,由于存在的个性化手写方式将导致手写文字图像处理难度较大。除此之外,识别精度还会受到各种因素的影响,在此不再赘述。It should be noted that the recognition accuracy of text images will not only be affected by the amount of data, but may also be affected by the difficulty of image processing. For example, compared with printed text images, the presence of personalized handwriting will lead to handwritten text image processing. It is more difficult. In addition, the recognition accuracy will also be affected by various factors, which will not be repeated here.
为了对不同类型的文字图像均有较高的识别精度,这里需要精细地控制训练时所使用的不同类型的文字图像的数量。因而,本公开实施例提供了一种基于不同类型的训练样本之间的目标采样比例来读取每次训练用的目标训练样本,进而基于读取的目标训练样本,训练目标神经网络的方案。In order to have high recognition accuracy for different types of text images, it is necessary to finely control the number of different types of text images used during training. Therefore, the embodiment of the present disclosure provides a scheme of reading the target training samples for each training based on the target sampling ratio among different types of training samples, and then training the target neural network based on the read target training samples.
其中,本公开实施例中不同类型的训练样本对应至少两种类型的训练样本,在具体应用中,可以从不同的存储文件中读取得到。Wherein, different types of training samples in the embodiments of the present disclosure correspond to at least two types of training samples, which can be read from different storage files in specific applications.
本公开实施例中,可以将不同类型的训练样本对应存储在不同的存储文件中,例如,针对不同的存储文件设置有不同的文件搜索路径,针对同一类型的训练样本可以记录在同一个文件搜索路径下,针对不同类型的训练样本可以记录在不同的文件搜索路径下。这样,在需要不同类型的训练样本的情况下,可以基于数据读取器从不同的存储文件读取相应类型的训练样本。In the embodiment of the present disclosure, different types of training samples can be correspondingly stored in different storage files, for example, different file search paths are set for different storage files, and training samples of the same type can be recorded in the same file search Under the path, different types of training samples can be recorded under different file search paths. In this way, in the case that different types of training samples are required, corresponding types of training samples can be read from different storage files based on the data reader.
在具体应用中,有关目标采样比例可以是基于不同的应用场景预先设置的,也可以是在一种特定的应用场景下,针对每次训练预先设置的。除此之外,还可以是在有关采样比例范围以及预设调整步长的约束下自动生成的。本公开实施例对此不做具体的限制。In a specific application, the relevant target sampling ratio may be preset based on different application scenarios, or may be preset for each training in a specific application scenario. In addition, it can also be automatically generated under the constraints of the sampling ratio range and the preset adjustment step size. Embodiments of the present disclosure do not specifically limit this.
在实际应用中,可以将目标采样比例作为训练任务中的固定参数,例如,可以在每次训练过程中均采用1:1、1:2等这一基于应用场景需求提前设置好的采样比例。还可以将目标采样比例作为训练任务中的半自动参数,例如,基于数据量的大小等影响训练次数的因素,预先设置每达到预先设定的训练次数对采样比例进行一次调整,比如,每经过10次训练调整一个新的采样比例。还可以将目标采样比例作为训练任务中的全自动参数,例如,每经过1次训练调整一个新的采样比例。In practical applications, the target sampling ratio can be used as a fixed parameter in the training task. For example, 1:1, 1:2, etc., which are pre-set sampling ratios based on application scenario requirements, can be used in each training process. The target sampling ratio can also be used as a semi-automatic parameter in the training task. For example, based on factors such as the size of the data volume that affect the number of training times, the sampling ratio is preset to be adjusted every time the preset number of training times is reached. For example, every 10 Adjust a new sampling ratio for each training session. The target sampling ratio can also be used as a fully automatic parameter in the training task, for example, a new sampling ratio is adjusted every time training is performed.
本公开实施例中,在确定不同类型的训练样本之间的目标采样比例的情况下,可以基于这一目标采样比例从获取的不同类型的训练样本中读取每次训练用的目标训练样本。这里所读取到的目标训练样本的数量是符合目标采样比例的。In the embodiment of the present disclosure, in the case of determining the target sampling ratio among different types of training samples, the target training samples for each training may be read from the acquired training samples of different types based on the target sampling ratio. The number of target training samples read here is in line with the target sampling ratio.
这里仍以文字识别为例,共计三种类型的训练样本,分别是10000张印刷文字图像、500张手写文字图像、3000张自然场景文字图像。假设三种类型的文字图像之间的目标采样比例是1:1:1的情况下,这里所读取的三种类型的目标训练样本的数量可以均是300张,假设三种类型的文字图像之间的目标采样比例是3:1:2的情况下,这里所读取的三种类型的目标训练样本的数量可以分别对应300张、100张和200张。Here we still take text recognition as an example. There are three types of training samples in total, namely 10,000 printed text images, 500 handwritten text images, and 3,000 natural scene text images. Assuming that the target sampling ratio between the three types of text images is 1:1:1, the number of the three types of target training samples read here can be 300, assuming that the three types of text images In the case where the target sampling ratio between is 3:1:2, the numbers of the three types of target training samples read here may correspond to 300, 100 and 200 samples respectively.
基于读取出的各种类型的目标训练样本,可以训练用于对不同类型的待识别图像进 行识别的目标神经网络。Based on the various types of target training samples read out, the target neural network for recognizing different types of images to be recognized can be trained.
这里的目标神经网络训练的可以是文字图像与图像上的文字之间的对应关系。在进行目标神经网络的训练之前,可以预先对训练样本进行文字标注以根据上述对应关系训练得到目标神经网络的网络参数值,这样,将待识别图像输入到训练好的目标神经网络即可以实现对待识别图像进行识别,例如,在待识别图像上既包括印刷文字又包括手写文字的情况下,可以同时对印刷文字和手写文字进行高精度的识别。The target neural network training here can be the correspondence between the text image and the text on the image. Before the training of the target neural network, the training samples can be pre-labeled to obtain the network parameter values of the target neural network according to the above corresponding relationship training. In this way, the image to be recognized can be input into the trained target neural network. Recognition by recognizing images, for example, when the image to be recognized includes both printed text and handwritten text, high-precision recognition of printed text and handwritten text can be performed at the same time.
本公开实施例提供的神经网络训练的方法可以按照如下步骤确定目标训练样本:The neural network training method provided by the embodiments of the present disclosure can determine the target training samples according to the following steps:
步骤一、基于不同类型的训练样本之间的目标采样比例,以及每次训练所需的训练样本数量,确定与每种类型的训练样本对应的采样数量;Step 1. Determine the number of samples corresponding to each type of training sample based on the target sampling ratio between different types of training samples and the number of training samples required for each training;
步骤二、按照确定的采样数量,从获取的每种类型的训练样本中读取训练样本。Step 2: Read the training samples from the obtained training samples of each type according to the determined number of samples.
这里,针对不同次训练所需的训练样本数量可以相同也可以不同。在具体应用中,随着训练次数的增加,所需的训练样本数量可以呈比例增加。Here, the number of training samples required for different training sessions may be the same or different. In specific applications, as the number of training times increases, the number of required training samples can increase proportionally.
本公开实施例中,针对每次训练而言,在确定该次训练所需的训练样本数量的情况下,基于不同类型的训练样本之间的目标采样比例可以确定与每种类型的训练样本对应的采样数量。这里仍以共计三种类型的文本图像的文字识别为例,分别是10000张印刷文字图像、500张手写文字图像、3000张自然场景文字图像。假设在当前次训练所需的训练样本数量为1000张,三种类型的文字图像之间的目标采样比例是1:1:1的情况下,所确定选取的三种类型的目标训练样本可以分别是333张。In the embodiment of the present disclosure, for each training, in the case of determining the number of training samples required for this training, based on the target sampling ratio between different types of training samples, it can be determined that each type of training samples corresponds to number of samples. Here we still take the text recognition of three types of text images as an example, which are 10,000 printed text images, 500 handwritten text images, and 3,000 natural scene text images. Assuming that the number of training samples required for the current training is 1000, and the target sampling ratio between the three types of text images is 1:1:1, the selected three types of target training samples can be respectively It is 333 sheets.
在实际应用中,各次训练所需的训练样本数量通常可以相同,这主要是为了尽可能的降低由于训练样本数量不均衡对不同轮次训练之间的性能评估所带来的不同影响。这里,可以首先确定至少两种类型的训练样本对应的训练样本总量以及训练总次数,进而基于训练样本总量以及训练总次数,确定每次训练所需的训练样本数量。In practical applications, the number of training samples required for each training can usually be the same, which is mainly to reduce as much as possible the different impacts of the unbalanced number of training samples on the performance evaluation between different rounds of training. Here, the total number of training samples and the total number of training times corresponding to at least two types of training samples can be determined first, and then based on the total number of training samples and the total number of training times, the number of training samples required for each training can be determined.
这里仍以文字识别为例,在确定共计13500张训练样本的情况向下,若共计训练10次,则每次训练所需的训练样本数量为1350张。Still taking text recognition as an example here, in the case of determining a total of 13,500 training samples, if a total of 10 trainings are performed, the number of training samples required for each training is 1,350.
为了获得更高的识别精度,这里可以采用采样比例范围实现目标采样比例的搜索,具体可以通过如下步骤实现:In order to obtain higher recognition accuracy, the sampling ratio range can be used here to realize the search of the target sampling ratio, which can be achieved through the following steps:
步骤一、在接收到训练任务的情况下,从训练配置文件中读取针对不同类型的训练样本设置的采样比例范围;Step 1. In the case of receiving the training task, read the sampling ratio range set for different types of training samples from the training configuration file;
步骤二、在每次训练中,从采样比例范围中选取目标采样比例。Step 2. In each training, select the target sampling ratio from the sampling ratio range.
本公开实施例中,由于训练配置文件随着训练任务的展开而自动调用。通过读取训练配置文件中设置的采样比例范围可以确定每次训练选取的目标采样比例。In the embodiment of the present disclosure, the training configuration file is automatically invoked with the development of the training task. The target sampling ratio selected for each training can be determined by reading the sampling ratio range set in the training configuration file.
在应用中,上述采样比例范围可以是与各次训练的目标采样比例对应设置的,还可以是设置有最小采样比例和最大采样比例,各次训练的目标采样比例可以是随机从上述最小采样比例和最大采样比例之间选取的,除此之外,还可以是结合预设调整步长设置完成的。In the application, the above-mentioned sampling ratio range can be set corresponding to the target sampling ratio of each training, or a minimum sampling ratio and a maximum sampling ratio can be set, and the target sampling ratio of each training can be randomly selected from the above-mentioned minimum sampling ratio and the maximum sampling ratio, in addition, it can also be done in combination with the preset adjustment step setting.
这里,基于最小采样比例和最大采样比例可以确定对应的采样比例范围,有关采样比例范围和预设调整步长可以是结合不同的训练需求来确定的。例如,在意图实现针对某一种特定类型样本的高识别率的情况下,可以将采样比例范围偏向这一特定类型样本,再如,在意图实现针对整体样本的较高识别率的情况下,可以将采样比例范围平均指向各个类型样本。Here, the corresponding sampling ratio range can be determined based on the minimum sampling ratio and the maximum sampling ratio, and the relevant sampling ratio range and preset adjustment step size can be determined in combination with different training requirements. For example, in the case of intending to achieve a high recognition rate for a specific type of sample, the sampling ratio range can be biased towards this specific type of sample; for example, in the case of intending to achieve a higher recognition rate for the overall sample, The range of sampling scales can be averaged across the various types of samples.
以具有两种类型的训练样本为例,在意图实现针对第一种类型的训练样本的高识别率的情况下,可以设置的采样比例范围为(0.6,0.9),在意图实现整体样本的较高识别率的情况下,可以设置的采样比例范围为(0.4,0.6)。Taking two types of training samples as an example, in the case of intending to achieve a high recognition rate for the first type of training samples, the range of sampling ratios that can be set is (0.6, 0.9). In the case of high recognition rate, the sampling ratio range that can be set is (0.4, 0.6).
有关预设调整步长越小,一定程度上可以使得所训练得到的神经网络的精度更高,然而会耗费更多的计算量,本公开实施例可以在兼顾精度和计算量的情况下,对调整步长进行设置。例如,可以设置调整步长为0.1,在采样比例范围为(0.6,0.9)的情况下,可以从0.6这一最小采样比例,依次按照步长递增,直至遍历到最大采样比例;再如,可以设置调整步长为0.01,具体的遍历过程与上述描述过程类似,在此不再赘述。The smaller the preset adjustment step size, the higher the accuracy of the trained neural network can be to a certain extent, but it will consume more calculations. Adjust the step size to set it. For example, the adjustment step can be set to 0.1, and in the case of a sampling ratio range of (0.6, 0.9), the minimum sampling ratio of 0.6 can be incremented according to the step size until the maximum sampling ratio is traversed; another example can be Set the adjustment step to 0.01, and the specific traversal process is similar to the process described above, and will not be repeated here.
在具体应用中,每完成一次训练后,可以在采样比例范围内,基于预设调整步长对上一次训练使用的目标采样比例进行调整,得到本次训练使用的目标采样比例。In a specific application, after each training is completed, the target sampling ratio used in the previous training can be adjusted based on the preset adjustment step within the range of the sampling ratio to obtain the target sampling ratio used in this training.
为了便于理解上述结合预设调整步长实现目标采样比例的调整以进行目标神经网络训练的过程,接下来可以以两种类型的训练样本为例进行说明。In order to facilitate the understanding of the above-mentioned process of adjusting the target sampling ratio in combination with the preset adjustment step size for training the target neural network, two types of training samples may be used as examples for illustration.
这里,可以将训练配置文件中设置的针对两种类型的训练样本的最小采样比例(如0.4)作为首次训练对应的目标采样比例,从获取的两种类型的训练样本中读取首次训练用的目标训练样本,并基于读取的目标训练样本,进行首次训练目标神经网络的步骤;在完成首次训练的情况下,基于预设调整步长对最小采样比例进行调整,得到调整后的采样比例,并作为下一次训练对应的目标采样比例;然后再按照下一次训练对应的目标 采样比例从获取的两种类型的训练样本中读取下一次训练用的目标训练样本,并基于读取的目标训练样本,进行下一次训练目标神经网络的步骤,以此类推直至达到最大采样比例(如0.9)。Here, the minimum sampling ratio (such as 0.4) for the two types of training samples set in the training configuration file can be used as the target sampling ratio for the first training, and the first training can be read from the obtained two types of training samples. Target training samples, and based on the read target training samples, perform the step of training the target neural network for the first time; in the case of completing the first training, adjust the minimum sampling ratio based on the preset adjustment step to obtain the adjusted sampling ratio, And as the target sampling ratio corresponding to the next training; then read the target training samples for the next training from the two types of training samples obtained according to the target sampling ratio corresponding to the next training, and based on the read target training Samples are used for the next step of training the target neural network, and so on until the maximum sampling ratio (such as 0.9) is reached.
这里,基于读取的不同类型的目标训练样本,可以训练得到目标神经网络。在待识别图像是文字图像的情况下,目标训练样本可以是目标图像样本,这样,针对每种类型的目标图像样本,将该目标图像样本作为待训练的神经网络的输入,将针对该目标图像样本的预先标注文字作为待训练的目标神经网络的输出,训练用于对不同类型的文字图像进行识别的目标神经网络。Here, based on the read target training samples of different types, the target neural network can be trained. In the case that the image to be recognized is a text image, the target training sample can be a target image sample, so, for each type of target image sample, the target image sample is used as the input of the neural network to be trained, and the target image The pre-labeled text of the sample is used as the output of the target neural network to be trained, and the target neural network used to recognize different types of text images is trained.
可知的是,本公开实施例中的目标神经网络训练的是输入的图像与标注文字之间的对应关系,基于这一对应关系可以确定出目标神经网络的网络参数,进而实现有关不同类型的文字图像的高精度识别。It can be seen that the target neural network in the embodiment of the present disclosure trains the corresponding relationship between the input image and the marked text, and based on this corresponding relationship, the network parameters of the target neural network can be determined, and then different types of text can be realized. High-precision recognition of images.
在训练得到目标神经网络的情况下,本公开实施例可以将获取的待识别图像输入到训练得到的目标神经网络中,可以输出图像处理结果,这里的图像处理结果可以是从待识别图像中识别得到的文字内容。In the case where the target neural network is trained, the embodiment of the present disclosure can input the acquired image to be recognized into the trained target neural network, and can output the image processing result, where the image processing result can be identified from the image to be recognized Get the text content.
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。Those skilled in the art can understand that in the above method of specific implementation, the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process. The specific execution order of each step should be based on its function and possible The inner logic is OK.
基于同一发明构思,本公开实施例中还提供了与方法对应的装置,由于本公开实施例中的装置解决问题的原理与本公开实施例上述方法相似,因此装置的实施可以参见方法的实施,重复之处不再赘述。Based on the same inventive concept, the embodiment of the present disclosure also provides a device corresponding to the method. Since the problem-solving principle of the device in the embodiment of the present disclosure is similar to the above-mentioned method of the embodiment of the present disclosure, the implementation of the device can refer to the implementation of the method. Repeated points will not be repeated.
参照图2所示,为本公开实施例提供的一种神经网络训练的装置的示意图,装置包括:获取模块201、读取模块202、训练模块203;其中,Referring to FIG. 2 , it is a schematic diagram of a neural network training device provided by an embodiment of the present disclosure. The device includes: an acquisition module 201, a reading module 202, and a training module 203; wherein,
获取模块201,用于获取至少两种类型的训练样本;An acquisition module 201, configured to acquire at least two types of training samples;
读取模块202,用于基于至少两种类型中不同类型的训练样本之间的目标采样比例,从获取的至少两种类型的训练样本中读取多次训练中每次训练用的目标训练样本;其中,每次读取的不同类型的目标训练样本的数量符合目标采样比例;The reading module 202 is used to read the target training samples for each training in multiple trainings from the obtained at least two types of training samples based on the target sampling ratio between different types of training samples in the at least two types ;wherein, the number of different types of target training samples read each time complies with the target sampling ratio;
训练模块203,用于基于读取的目标训练样本,训练目标神经网络,目标神经网络用于对不同类型的待识别图像进行识别。The training module 203 is configured to train the target neural network based on the read target training samples, and the target neural network is used to identify different types of images to be recognized.
采用上述神经网络训练的装置,在获取到不同类型的训练样本的情况下,可以基于不同类型的训练样本之间的目标采样比例,从获取的不同类型的训练样本中读取每次训练用的目标训练样本,进而基于读取的目标训练样本,训练目标神经网络。由于不同类型的训练样本之间的目标采样比例可以很好的控制不同类型的训练样本的选取数量,这一定程度上可以降低直接混合数据量差距比较大的训练样本对特征学习的影响,提升了目标神经网络的识别精度。Using the above-mentioned neural network training device, in the case of obtaining different types of training samples, based on the target sampling ratio between different types of training samples, the training data used for each training can be read from the obtained different types of training samples. The target training samples are used to train the target neural network based on the read target training samples. Since the target sampling ratio between different types of training samples can well control the number of different types of training samples selected, this can reduce the impact of directly mixing training samples with a large gap in the amount of data on feature learning to a certain extent, and improve The recognition accuracy of the target neural network.
在一种可能的实施方式中,读取模块202,用于按照以下步骤基于至少两种类型中不同类型的训练样本之间的目标采样比例,从获取的至少两种类型的训练样本中读取多次训练中每次训练用的目标训练样本:In a possible implementation manner, the reading module 202 is configured to read from the obtained at least two types of training samples based on the target sampling ratio between different types of training samples in the at least two types according to the following steps: Target training samples for each of the multiple training sessions:
基于至少两种类型中不同类型的训练样本之间的目标采样比例,以及每次训练所需的训练样本数量,确定与每种类型的训练样本对应的采样数量;determining the number of samples corresponding to each type of training sample based on a target sampling ratio between different types of training samples of at least two types, and the number of training samples required for each training;
按照确定的采样数量,从获取的每种类型的训练样本中读取训练样本。According to the determined number of samples, the training samples are read from the acquired training samples of each type.
在一种可能的实施方式中,读取模块202,用于按照如下步骤确定每次训练所需的训练样本数量:In a possible implementation manner, the reading module 202 is configured to determine the number of training samples required for each training according to the following steps:
确定至少两种类型的训练样本对应的训练样本总量以及训练总次数;Determine the total amount of training samples and the total number of training times corresponding to at least two types of training samples;
基于训练样本总量以及训练总次数,确定每次训练所需的训练样本数量。Based on the total number of training samples and the total number of training times, determine the number of training samples required for each training.
在一种可能的实施方式中,读取模块202,用于按照如下步骤确定不同类型的训练样本之间的目标采样比例:In a possible implementation manner, the reading module 202 is configured to determine the target sampling ratio between different types of training samples according to the following steps:
在接收到训练任务的情况下,从训练配置文件中读取针对不同类型的训练样本设置的采样比例范围;In the case of receiving the training task, read the sampling ratio range set for different types of training samples from the training configuration file;
在每次训练中,从采样比例范围中选取目标采样比例。At each training session, a target sampling ratio is chosen from a range of sampling ratios.
在一种可能的实施方式中,读取模块202,用于按照以下步骤在每次训练中,从采样比例范围中选取目标采样比例:In a possible implementation manner, the reading module 202 is configured to select the target sampling ratio from the sampling ratio range in each training according to the following steps:
在完成一次训练后,在采样比例范围内,基于预设调整步长对上一次训练使用的目标采样比例进行调整,得到本次训练使用的目标采样比例。After a training session is completed, within the range of the sampling rate, the target sampling rate used in the previous training is adjusted based on the preset adjustment step to obtain the target sampling rate used in the current training.
在一种可能的实施方式中,获取模块201,用于按照以下步骤获取至少两种类型的训练样本:In a possible implementation manner, the obtaining module 201 is configured to obtain at least two types of training samples according to the following steps:
基于训练样本的类型与预先配置的各个存储文件之间的对应关系,从各个存储文件 中读取对应类型的训练样本。Based on the corresponding relationship between the type of training samples and each pre-configured storage file, read the corresponding type of training samples from each storage file.
在一种可能的实施方式中,待识别图像包括文字图像,目标训练样本包括目标图像样本;训练模块203,用于按照以下步骤基于读取的目标训练样本,训练目标神经网络:In a possible implementation manner, the image to be recognized includes a text image, and the target training sample includes a target image sample; the training module 203 is configured to train the target neural network based on the read target training sample according to the following steps:
针对每种类型的目标图像样本,将该目标图像样本作为待训练的神经网络的输入,将针对该目标图像样本的预先标注文字作为待训练的目标神经网络的输出,训练用于对不同类型的文字图像进行识别的目标神经网络。For each type of target image sample, the target image sample is used as the input of the neural network to be trained, and the pre-labeled text for the target image sample is used as the output of the target neural network to be trained, and the training is used for different types of A target neural network for text image recognition.
参照图3所示,为本公开实施例提供的一种图像处理的装置的示意图,装置包括:获取模块301、处理模块302;其中,Referring to FIG. 3 , which is a schematic diagram of an image processing device provided by an embodiment of the present disclosure, the device includes: an acquisition module 301 and a processing module 302; wherein,
获取模块301,用于获取待识别图像;An acquisition module 301, configured to acquire an image to be identified;
处理模块302,用于将待识别图像输入到利用上述神经网络训练的方法训练得到的目标神经网络中,输出图像处理结果。The processing module 302 is configured to input the image to be recognized into the target neural network trained by the above neural network training method, and output the image processing result.
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。For the description of the processing flow of each module in the device and the interaction flow between the modules, reference may be made to the relevant description in the above method embodiment, and details will not be described here.
本公开实施例还提供了一种电子设备,如图4所示,为本公开实施例提供的电子设备结构示意图,包括:处理器401、存储器402、和总线403。存储器402存储有处理器401可执行的机器可读指令(比如,图2中的装置中获取模块201、读取模块202、训练模块203对应的执行指令等),当电子设备运行时,处理器401与存储器402之间通过总线403通信,机器可读指令被处理器401执行时执行如下处理:An embodiment of the present disclosure also provides an electronic device, as shown in FIG. 4 , which is a schematic structural diagram of the electronic device provided by the embodiment of the present disclosure, including: a processor 401 , a memory 402 , and a bus 403 . The memory 402 stores machine-readable instructions executable by the processor 401 (for example, execution instructions corresponding to the acquisition module 201, the reading module 202, and the training module 203 in the device in FIG. 2 ), and when the electronic device is running, the processor 401 communicates with the memory 402 through the bus 403, and when the machine-readable instructions are executed by the processor 401, the following processes are performed:
获取至少两种类型的训练样本;Obtain at least two types of training samples;
基于至少两种类型中不同类型的训练样本之间的目标采样比例,从获取的至少两种类型的训练样本中读取多次训练中每次训练用的目标训练样本;其中,每次读取的不同类型的目标训练样本的数量符合目标采样比例;Based on the target sampling ratio between different types of training samples in at least two types, read the target training samples for each training in multiple trainings from the obtained at least two types of training samples; wherein, each read The number of different types of target training samples conforms to the target sampling ratio;
基于读取的目标训练样本,训练目标神经网络,目标神经网络用于对不同类型的待识别图像进行识别。Based on the read target training samples, the target neural network is trained, and the target neural network is used to identify different types of images to be recognized.
本公开实施例还提供了另一种电子设备,如图5所示,为本公开实施例提供的电子设备结构示意图,包括:处理器501、存储器502、和总线503。存储器502存储有处理器501可执行的机器可读指令(比如,图3中的装置中获取模块301、处理模块302 对应的执行指令等),当电子设备运行时,处理器501与存储器502之间通过总线503通信,机器可读指令被处理器501执行时执行如下处理:The embodiment of the present disclosure also provides another electronic device, as shown in FIG. 5 , which is a schematic structural diagram of the electronic device provided by the embodiment of the present disclosure, including: a processor 501 , a memory 502 , and a bus 503 . The memory 502 stores machine-readable instructions executable by the processor 501 (for example, execution instructions corresponding to the acquisition module 301 and the processing module 302 in the device in FIG. Communication between the bus 503, the machine-readable instructions are executed by the processor 501 to perform the following processing:
获取待识别图像;Obtain the image to be recognized;
将待识别图像输入到利用上述神经网络训练的方法训练得到的目标神经网络中,输出图像处理结果。Input the image to be recognized into the target neural network trained by the above neural network training method, and output the image processing result.
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中所述的方法的步骤。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。Embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the methods described in the foregoing method embodiments are executed. Wherein, the storage medium may be a volatile or non-volatile computer-readable storage medium.
本公开实施例还提供一种计算机程序产品,该计算机程序产品承载有程序代码,所述程序代码包括的指令可用于执行上述方法实施例中所述的方法的步骤,具体可参见上述方法实施例,在此不再赘述。Embodiments of the present disclosure also provide a computer program product, the computer program product carries a program code, and the instructions included in the program code can be used to execute the steps of the method described in the above method embodiment, for details, please refer to the above method embodiment , which will not be repeated here.
其中,上述计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。Wherein, the above-mentioned computer program product may be specifically implemented by means of hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。Those skilled in the art can clearly understand that for the convenience and brevity of description, the specific working process of the above-described system and device can refer to the corresponding process in the foregoing method embodiments, which will not be repeated here. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some communication interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor. Based on this understanding, the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。Finally, it should be noted that: the above-mentioned embodiments are only specific implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure, rather than limit them, and the protection scope of the present disclosure is not limited thereto, although referring to the aforementioned The embodiments have described the present disclosure in detail, and those skilled in the art should understand that any person familiar with the technical field can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present disclosure Changes can be easily imagined, or equivalent replacements can be made to some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be included in this disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be defined by the protection scope of the claims.

Claims (12)

  1. 一种神经网络训练的方法,其特征在于,所述方法包括:A method for neural network training, characterized in that the method comprises:
    获取至少两种类型的训练样本;Obtain at least two types of training samples;
    基于所述至少两种类型中不同类型的训练样本之间的目标采样比例,从获取的所述至少两种类型的训练样本中读取多次训练中每次训练用的目标训练样本;其中,每次读取的所述不同类型的目标训练样本的数量符合所述目标采样比例;Based on the target sampling ratio between different types of training samples in the at least two types, read the target training samples for each training in multiple trainings from the acquired training samples of the at least two types; wherein, The number of target training samples of different types read each time complies with the target sampling ratio;
    基于读取的所述目标训练样本,训练目标神经网络,所述目标神经网络用于对不同类型的待识别图像进行识别。Based on the read target training samples, a target neural network is trained, and the target neural network is used to identify different types of images to be identified.
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述至少两种类型中不同类型的训练样本之间的目标采样比例,从获取的所述至少两种类型的训练样本中读取多次训练中每次训练用的目标训练样本,包括:The method according to claim 1, characterized in that, based on the target sampling ratio between different types of training samples in the at least two types, read from the acquired training samples of at least two types The target training samples for each training in multiple trainings, including:
    基于所述至少两种类型中不同类型的训练样本之间的目标采样比例,以及每次训练所需的训练样本数量,确定与每种类型的训练样本对应的采样数量;determining the number of samples corresponding to each type of training sample based on the target sampling ratio between different types of training samples of the at least two types and the number of training samples required for each training;
    按照确定的所述采样数量,从获取的每种类型的训练样本中读取训练样本。According to the determined number of samples, the training samples are read from the acquired training samples of each type.
  3. 根据权利要求2所述的方法,其特征在于,按照如下步骤确定每次训练所需的训练样本数量:The method according to claim 2, wherein the number of training samples required for each training is determined according to the following steps:
    确定所述至少两种类型的训练样本对应的训练样本总量以及训练总次数;determining the total amount of training samples and the total number of training times corresponding to the at least two types of training samples;
    基于所述训练样本总量以及训练总次数,确定每次训练所需的训练样本数量。Based on the total amount of training samples and the total number of training times, the number of training samples required for each training is determined.
  4. 根据权利要求1所述的方法,其特征在于,按照如下步骤确定不同类型的训练样本之间的目标采样比例:The method according to claim 1, wherein the target sampling ratio between different types of training samples is determined according to the following steps:
    在接收到训练任务的情况下,从训练配置文件中读取针对不同类型的训练样本设置的采样比例范围;In the case of receiving the training task, read the sampling ratio range set for different types of training samples from the training configuration file;
    在每次训练中,从所述采样比例范围中选取所述目标采样比例。In each training, the target sampling ratio is selected from the sampling ratio range.
  5. 根据权利要求4所述的方法,其特征在于,所述在每次训练中,从所述采样比例范围中选取所述目标采样比例,包括:The method according to claim 4, wherein, in each training, selecting the target sampling ratio from the sampling ratio range includes:
    在完成一次训练后,在所述采样比例范围内,基于预设调整步长对上一次训练使用的目标采样比例进行调整,得到本次训练使用的目标采样比例。After a training session is completed, within the range of the sampling ratio, the target sampling ratio used in the previous training is adjusted based on the preset adjustment step size to obtain the target sampling ratio used in the current training.
  6. 根据权利要求1至5任一所述的方法,其特征在于,所述获取至少两种类型的训练样本,包括:The method according to any one of claims 1 to 5, wherein said obtaining at least two types of training samples comprises:
    基于训练样本的类型与预先配置的各个存储文件之间的对应关系,从各个存储文件中读取对应类型的训练样本。Based on the corresponding relationship between the type of the training sample and each pre-configured storage file, the corresponding type of training sample is read from each storage file.
  7. 根据权利要求1至6任一所述的方法,其特征在于,所述待识别图像包括文字图像;所述基于读取的所述目标训练样本,训练目标神经网络,包括:The method according to any one of claims 1 to 6, wherein the image to be recognized includes a text image; and the training of the target neural network based on the read target training sample includes:
    针对每种类型的文字图像,将该文字图像作为待训练的神经网络的输入,将针对该文字图像的预先标注文字作为待训练的目标神经网络的输出,训练用于对不同类型的文字图像进行识别的目标神经网络。For each type of text image, the text image is used as the input of the neural network to be trained, and the pre-labeled text for the text image is used as the output of the target neural network to be trained, and the training is used for different types of text images. Recognized target neural network.
  8. 一种图像处理的方法,其特征在于,所述方法包括:A method for image processing, characterized in that the method comprises:
    获取待识别图像;Obtain the image to be recognized;
    将所述待识别图像输入到利用权利要求1至7任一所述的方法训练得到的目标神经网络中,输出图像处理结果。The image to be recognized is input into the target neural network trained by the method described in any one of claims 1 to 7, and the image processing result is output.
  9. 一种神经网络训练的装置,其特征在于,所述装置包括:A device for neural network training, characterized in that the device comprises:
    获取模块,用于获取至少两种类型的训练样本;An acquisition module, configured to acquire at least two types of training samples;
    读取模块,用于基于所述至少两种类型中不同类型的训练样本之间的目标采样比例,从获取的所述至少两种类型的训练样本中读取多次训练中每次训练用的目标训练样本;其中,每次读取的所述不同类型的目标训练样本的数量符合所述目标采样比例;A reading module, configured to read the target sampling ratio between the at least two types of training samples of different types in the at least two types, and read the training samples used for each training in multiple training sessions from the acquired training samples. Target training samples; wherein, the number of target training samples of different types read each time complies with the target sampling ratio;
    训练模块,用于基于读取的所述目标训练样本,训练目标神经网络,所述目标神经网络用于对不同类型的待识别图像进行识别。The training module is configured to train a target neural network based on the read target training samples, and the target neural network is used to identify different types of images to be identified.
  10. 一种图像处理的装置,其特征在于,所述装置包括:An image processing device, characterized in that the device comprises:
    获取模块,用于获取待识别图像;An acquisition module, configured to acquire an image to be identified;
    处理模块,用于将所述待识别图像输入到利用权利要求1至7任一所述的方法训练 得到的目标神经网络中,输出图像处理结果。The processing module is used to input the image to be recognized into the target neural network obtained by using the method training described in any one of claims 1 to 7, and output the image processing result.
  11. 一种电子设备,其特征在于,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行如权利要求1至7任一所述的神经网络训练的方法的步骤或者如权利要求8所述的图像处理的方法的步骤。An electronic device, characterized in that it includes: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the connection between the processor and the memory Communication through the bus, when the machine-readable instructions are executed by the processor, execute the steps of the neural network training method according to any one of claims 1 to 7 or the image processing method according to claim 8 step.
  12. 一种计算机可读存储介质,其特征在于,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行如权利要求1至7任一所述的神经网络训练的方法的步骤或者如权利要求8所述的图像处理的方法的步骤。A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the neural network training method according to any one of claims 1 to 7 is executed. Step or the step of the image processing method as claimed in claim 8.
PCT/CN2022/114983 2021-09-18 2022-08-26 Neural network training method and apparatus, image processing method and apparatus, and device and storage medium WO2023040629A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111098594.0A CN113792734A (en) 2021-09-18 2021-09-18 Neural network training and image processing method, device, equipment and storage medium
CN202111098594.0 2021-09-18

Publications (1)

Publication Number Publication Date
WO2023040629A1 true WO2023040629A1 (en) 2023-03-23

Family

ID=78878972

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/114983 WO2023040629A1 (en) 2021-09-18 2022-08-26 Neural network training method and apparatus, image processing method and apparatus, and device and storage medium

Country Status (2)

Country Link
CN (1) CN113792734A (en)
WO (1) WO2023040629A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113792734A (en) * 2021-09-18 2021-12-14 深圳市商汤科技有限公司 Neural network training and image processing method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106530284A (en) * 2016-10-21 2017-03-22 广州视源电子科技股份有限公司 Solder joint type detection method and apparatus based on image identification
CN109472345A (en) * 2018-09-28 2019-03-15 深圳百诺名医汇网络技术有限公司 A kind of weight update method, device, computer equipment and storage medium
US20190096385A1 (en) * 2017-09-28 2019-03-28 Baidu Online Network Technology (Beijing) Co., Ltd Method and apparatus for generating speech synthesis model
CN112419303A (en) * 2020-12-09 2021-02-26 上海联影医疗科技股份有限公司 Neural network training method, system, readable storage medium and device
CN113313110A (en) * 2021-05-25 2021-08-27 北京易华录信息技术股份有限公司 License plate type recognition model construction and license plate type recognition method
CN113792734A (en) * 2021-09-18 2021-12-14 深圳市商汤科技有限公司 Neural network training and image processing method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106530284A (en) * 2016-10-21 2017-03-22 广州视源电子科技股份有限公司 Solder joint type detection method and apparatus based on image identification
US20190096385A1 (en) * 2017-09-28 2019-03-28 Baidu Online Network Technology (Beijing) Co., Ltd Method and apparatus for generating speech synthesis model
CN109472345A (en) * 2018-09-28 2019-03-15 深圳百诺名医汇网络技术有限公司 A kind of weight update method, device, computer equipment and storage medium
CN112419303A (en) * 2020-12-09 2021-02-26 上海联影医疗科技股份有限公司 Neural network training method, system, readable storage medium and device
CN113313110A (en) * 2021-05-25 2021-08-27 北京易华录信息技术股份有限公司 License plate type recognition model construction and license plate type recognition method
CN113792734A (en) * 2021-09-18 2021-12-14 深圳市商汤科技有限公司 Neural network training and image processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN113792734A (en) 2021-12-14

Similar Documents

Publication Publication Date Title
CN109961780B (en) A man-machine interaction method a device(s) Server and storage medium
CN107566914B (en) Bullet screen display control method, electronic equipment and storage medium
CN110490721B (en) Financial voucher generating method and related product
CN107229559B (en) Detection method and device for testing integrity of service system
CN114170468B (en) Text recognition method, storage medium and computer terminal
WO2023040629A1 (en) Neural network training method and apparatus, image processing method and apparatus, and device and storage medium
CN111753744B (en) Method, apparatus, device and readable storage medium for bill image classification
CN113836885A (en) Text matching model training method, text matching device and electronic equipment
CN107844728A (en) Identify method and device, computer installation and the computer-readable recording medium of Quick Response Code
CN110139149A (en) A kind of video optimized method, apparatus, electronic equipment
CN111340640A (en) Insurance claim settlement material auditing method, device and equipment
CN111680761B (en) Information feedback method and device and electronic equipment
CN111383651A (en) Voice noise reduction method and device and terminal equipment
CN111783415A (en) Template configuration method and device
KR102003221B1 (en) System for generating note data and method for generating note data using the system
CN105912510A (en) Method and device for judging answers to test questions and well as server
CN114078471A (en) Network model processing method, device, equipment and computer readable storage medium
CN111049735B (en) Group head portrait display method, device, equipment and storage medium
CN115221037A (en) Interactive page testing method and device, computer equipment and program product
CN111782792A (en) Method and apparatus for information processing
CN116776839A (en) Teaching-oriented handwritten medical record accurate feedback method, system and storage device
CN116343221A (en) Certificate information automatic input method and device, electronic equipment and storage medium
CN110633457B (en) Content replacement method and device, electronic equipment and readable storage medium
US11232161B1 (en) Methods and apparatuses for electronically stamping document
CN107844549A (en) Information saving method, device, computer installation and computer-readable recording medium

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE