WO2020108368A1

WO2020108368A1 - Neural network model training method and electronic device

Info

Publication number: WO2020108368A1
Application number: PCT/CN2019/119815
Authority: WO
Inventors: 谢淼; 施烈航; 姚恒志; 勾军委
Original assignee: 华为技术有限公司
Priority date: 2018-11-29
Filing date: 2019-11-21
Publication date: 2020-06-04

Abstract

The present application provides a neural network model training method and an electronic device. The method comprises the following steps: the electronic device inputs n training samples into a reference neural network model for processing to obtain a first text recognition result corresponding to each training sample; then, the electronic device inputs n training samples into an initial neural network model for processing to obtain a second text recognition result corresponding to each training sample; at last, the electronic device adjusts parameters in the initial neural network model according to the first text recognition result, the second text recognition result and a manually annotated result of a first part of training samples in n training samples to obtain a target neural network model.

Description

A neural network model training method and electronic equipment

This application requires the priority of the Chinese patent application submitted to the State Intellectual Property Office of China on November 29, 2018 with the application number 201811443331.7 and the invention titled "An Optical Character Recognition Method", and filed on February 26, 2019 The priority of the Chinese patent application of the National Patent Office, the application number is 201910139681.2, and the name of the invention is "a neural network model training method and electronic equipment", the entire content of which is incorporated by reference in this application.

Technical field

This application relates to the field of machine learning technology, in particular to a neural network model training method and electronic equipment.

Background technique

Deep neural network is a hot research direction in recent years. It simulates the human brain's multi-layered computing architecture system from the perspective of bionics. It is the direction closest to artificial intelligence. It can better characterize the most essential signal. Variable characteristics. In recent years, deep learning has achieved good results in speech processing, visual processing, and image processing. Optical character recognition (OCR) is difficult and important in the field of visual processing due to its diverse characters (for example, there are nearly 7,000 commonly used Chinese characters in Chinese), the complex scenes and the rich semantic information.

The current neural network model used for OCR has a large amount of calculation and requires high hardware processing capabilities of the device, so it is generally integrated on the server side. For example, Baidu's neural network model for OCR is integrated in Baidu's cloud server. Limited by the processing power of the artificial intelligence dedicated chip on the terminal side, the neural network model for OCR currently integrated on the server side is not suitable for running directly on the terminal side. To this end, the terminal side needs to rebuild the neural network model for OCR that is suitable for the terminal side, but it is currently more difficult to build a neural network model suitable for the terminal side from scratch, because the construction of the neural network model depends on a large number of different scenarios. The labeling result of the training sample, and the labeling result of the training sample includes manually labeling the text area and text content, so the workload of manual labeling is very large, resulting in a very high training cost and low efficiency of the neural network model.

Summary of the invention

The present application provides a neural network model training method and electronic equipment to provide a method for quickly training a neural network model suitable for a terminal side.

In the first aspect, an embodiment of the present application provides a method for training a neural network model. The method includes: an electronic device inputs n training samples to a reference neural network model for processing to obtain a first text recognition corresponding to each training sample As a result, the electronic device inputs n training samples to the initial neural network model for processing to obtain a second text recognition result corresponding to each training sample. Finally, the electronic device recognizes the first text recognition result and the second text recognition result. And the artificial labeling result of the first part of the training samples in the n training samples, the parameters in the initial neural network model are adjusted to obtain the target neural network model.

In the embodiment of the present application, the electronic device can obtain the recognition result of the training sample by calling the reference neural network model, so only a part of the training sample needs to be manually labeled, which can reduce the workload of manual labeling to a certain extent and save training costs. When the electronic device integrates the target neural network model, the user can quickly identify the text area and text content in the image through the electronic device without networking, which can protect the user's privacy to a certain extent and has high security.

In a possible design, for any one of the first part of the training samples, that is, for the first training sample, the electronic device may recognize the second text recognition result of the first training sample and the manual annotation of the first training sample As a result, a first loss function value is obtained; and according to the first text recognition result of the first training sample and the second text recognition result of the first training sample, a second loss function value is obtained; then the value of the first loss function Perform a smoothing or weighting process with the second loss function value to obtain the processed first loss function value and the second loss function value; adjust according to the processed first loss function value and the processed second loss function value The parameters in this initial neural network model.

In the embodiment of the present application, the electronic device performs model training based on the results of manual labeling and reference to the output of the neural network model, which can improve the long text blanking problem and low recall rate of the neural network model currently used for OCR to a certain extent. problem.

In a possible design, for any one of the n training samples except the first part of the training samples, that is, for the second training sample, according to the first text recognition result of the second training sample and the second training For the second text recognition result of the sample, the parameters in the initial neural network model are adjusted using a stochastic gradient descent algorithm.

In the embodiment of the present application, the electronic device can solve the problem of long text blanking and the low recall rate according to the manual labeling result and the output result of the reference neural network model.

In a possible design, the electronic device inputs the n training samples to the initial neural network model to learn multiple tasks, and the multiple tasks may include classification tasks and regression tasks. The electronic device obtains the second text recognition result of each training sample according to the learning results of the multiple tasks, so that the second text recognition result includes the position of the text frame corresponding to the regression task, and whether each pixel corresponding to the classification task Is text.

In the embodiment of the present application, the electronic device achieves a balance between the output result of the initial neural network model and the reference neural network model and the manual annotation result through multi-task integrated learning.

In a possible design, the electronic device calls the open interface of the reference neural network model, inputs the n training samples into the reference neural network model, and receives the output of the reference neural network model corresponding to each training sample The first text recognition result.

In the embodiment of the present application, by using an existing reference neural network model interface in the cloud, a neural network model suitable for the end side can be quickly constructed.

In a possible design, the electronic device determines the initial neural network model according to the configuration information of the current operating environment of the electronic device and the hardware information of the artificial intelligence chip, wherein the structure, parameters and The size can be supported by the electronic device.

In the embodiment of the present application, a neural network model suitable for the end-side can be quickly constructed. The user can quickly identify the text area and text content in the image only through the electronic device without networking, which can protect the user's privacy and safety to a certain extent. Sexuality is higher.

In a second aspect, an embodiment of the present application provides an electronic device, including a processor and a memory. Wherein, the memory is used to store one or more computer programs; when the one or more computer programs stored in the memory are executed by the processor, the electronic device can implement any possible design method of any one of the above aspects.

In a third aspect, an embodiment of the present application further provides an apparatus, the apparatus including a module/unit that executes any one of the above-mentioned possible design methods. These modules/units can be implemented by hardware, and can also be implemented by hardware executing corresponding software.

In a fourth aspect, a computer-readable storage medium is also provided in an embodiment of the present application. The computer-readable storage medium includes a computer program, and when the computer program runs on an electronic device, the electronic device performs any of the above aspects Any possible design method.

According to a fifth aspect, an embodiment of the present application further provides a method that includes a computer program product that, when the computer program product runs on a terminal, causes the electronic device to perform any one of the possible designs of any of the above aspects.

According to a sixth aspect, an embodiment of the present application further provides a chip coupled to a memory, for executing a computer program stored in the memory, so that the electronic device performs any possible design method of any of the above aspects .

These or other aspects of the present application will be more concise and understandable in the description of the following embodiments.

BRIEF DESCRIPTION

FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application;

2 is a schematic structural diagram of a neural network model provided by an embodiment of the present application;

3 is a schematic flowchart of a neural network model training method provided by an embodiment of the present application;

4 is a schematic flowchart of another neural network model training method provided by an embodiment of the present application;

5 is a schematic diagram of a training sample and a text recognition result of a training sample provided by an embodiment of the present application;

6 is a schematic structural diagram of a mobile phone provided by an embodiment of the present application;

7 is a schematic structural diagram of a neural network model training device provided by an embodiment of the present application;

FIG. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

detailed description

The embodiment of the present application provides a neural network model training method, which can use the current commercial neural network model on the server side as a reference neural network model, that is to say, the electronic device can first call the commercial neural network model pair n on the server side Processing training samples to obtain the first text recognition result corresponding to each training sample, and then input the n training samples into the initial neural network model to be trained for processing, to obtain the second corresponding to each training sample The text recognition result, the final electronic device adjusts the parameters in the initial neural network model according to the first text recognition result, the second text recognition result, and the manual labeling results of some training samples in the n training samples to obtain the target Neural network model.

It can be seen that in the above neural network model training, because the recognition result of the training sample can be obtained by calling the reference neural network model, only a part of the training sample needs to be manually labeled, which can reduce the workload of manual labeling to a certain extent and save training costs . When the electronic device integrates the target neural network model, the user can view the text area in the image only through the electronic device without networking. The rapid recognition of the text content can protect the user's privacy to a certain extent, and the security is high.

The neural network model training method provided in the embodiments of the present application can be applied to the application scenario shown in FIG. 1, where the application scenario includes a development server 101, a server 102, and a terminal device 103.

The development server 101 is used to determine the initial neural network model according to the configuration information of the current operating environment of the terminal device 103 and the hardware information of the artificial intelligence chip.

For example, the terminal device 103 is a Mate 10 model mobile phone equipped with a Kirin 970 chip, which only supports activation functions such as convolution and relu, so the development server 101 outputs an initial neural network model suitable for the mobile phone. For example, the backbone structure of the initial neural network model is a 15-layer fully convolutional network, and finally a 1/8-dimensional feature map.

In addition, the development server 101 is also pre-integrated with n training samples. The development server 101 is also used to call the interface of the reference neural network model in the server 102 to transfer n training samples to the reference neural network model of the server 102, and Obtain the first text recognition result of the reference neural network model for each training sample. Among them, there can be multiple reference neural network models, so the development server can obtain the first text recognition result of each training sample returned by multiple servers 102, and finally generate the first of all reference neural network models corresponding to each training sample. A data collection of text recognition results.

Further, the development server 101 is used to input n training samples into the initial neural network model for processing to obtain a second text recognition result corresponding to each training sample, and then recognize the second text according to the first text recognition result and the second text recognition result. The results, as well as the manual labeling results of some of the n training samples, adjust the parameters in the initial neural network model to obtain the target neural network model. Normally, the parameters set by the initial neural network model before starting the training process are hyperparameters, that is, parameter data not obtained through training. The training process will optimize the hyperparameters. The essence of the training process is to select a set of optimal parameters for the model to improve the performance and effect of learning.

Finally, the development server 101 installs the generated target neural network model into the terminal device 103, and the user can view the text area in the image only through the terminal device 103 without network connection. Quick recognition of text content. For example, the terminal device 103 is installed with an application that integrates the target neural network model. When text recognition is required, the user imports an image containing the text to be recognized in the application, where the image can be a photo of an ID card, a PDF format document, etc. Screenshots of product purchase information, etc., and then the terminal device 103 can directly recognize the text in the image. Exemplarily, the identification result of the photo of the ID card is: Name: Wang Li, Gender: Female, Ethnicity: Han, Birth: 1988-08-11, Address: Room 1, Building 1, Lane 353, Lujiazui Road, Shanghai, ID The number 370405198805XXXXXX.

It should be noted that the development server 101 inputs n training samples to the reference neural network model, and inputs n training samples to the initial neural network model. There is no absolute order of these two steps, you can first execute n One training sample is input to the reference neural network model, or n training samples may be input to the initial neural network model first, or simultaneously.

Among them, the development server 101 and the server 102 are connected through a wireless network, and the terminal device 103 is a terminal device capable of network communication. The terminal device may be a smart phone, a tablet computer, or a portable personal computer. The server 102 may be a server, or a server cluster or cloud computing center composed of several servers.

As shown in FIG. 2, a neural network structure diagram of an initial neural network model is exemplarily shown, including a convolution layer, a pooling layer, and an up-sampling layer. The convolutional layer is used to extract image features from the input image, and the pooling layer is used for feature dimensionality reduction, compressing the number of data and parameters, reducing overfitting, and improving the model's fault tolerance. The purpose of the upsampling layer is to enlarge the original image so that it can be displayed on a higher resolution display device. Finally, the initial neural network model performs classification task and regression task learning on n training samples. According to the learning results of the multiple tasks, a second text recognition result for each training sample is obtained, and the second text recognition result includes regression The position of the text border corresponding to the task, and whether each pixel corresponding to the classification task is text.

It should be noted that the convolutional layer in the initial neural network model can be one or more layers, and the pooling layer and the upsampling layer can also be one or more layers.

Based on the application scenario diagram shown in FIG. 1 and the neural network structure diagram shown in FIG. 2, an embodiment of the present application provides a flow of a method for training a neural network model. As shown in FIG. 3, the flow of this method may be performed by an electronic device To execute, the method includes the following steps:

Step 301: The electronic device inputs n training samples to the reference neural network model for processing, to obtain a first text recognition result corresponding to each training sample, where n is a positive integer.

The electronic device may be the development server 101 in FIG. 1. Specifically, a developer may log in to the development platform of the reference neural network model on the development server 101 in advance, and obtain permission to call the interface of the reference neural network model. Then, by calling the interface, the development server 101 inputs n training samples into the reference neural network model, and receives the first text recognition result corresponding to each training sample output by the reference neural network model. In the embodiment of the present application, the reference neural network model may be a currently used neural network model for OCR, such as Hanwang

Baidu

Wait. Finally, the electronic device generates a data set of the first text recognition results of all reference neural network models corresponding to each training sample.

Step 302: The electronic device inputs the n training samples to the initial neural network model for processing, and obtains a second text recognition result corresponding to each training sample.

First, before performing step 302, the electronic device needs to generate an initial neural network model to be trained according to the configuration information of the current operating environment of the electronic device and the hardware information of the artificial intelligence chip. For example, the development server 101 generates an initial neural network model to be trained according to the configuration information of the current operating environment of the terminal device 103 and the hardware information of the artificial intelligence chip. Then the server 101 inputs the n training samples into the initial neural network model for processing. Specifically, the server 101 can input n training samples into the neural network model shown in FIG. 2 and perform classification tasks and regression The task obtains the second text recognition result corresponding to each training sample.

Step 303: The electronic device adjusts the parameters in the initial neural network model according to the first text recognition result, the second text recognition result, and the manual labeling result of some training samples in the n training samples to obtain the target nerve Network model.

Before performing step 303, the developer also pre-manually labels some of the n training samples to obtain artificial labeling results. For example, the manual annotation result may include the position of the text area, and whether the pixel is text.

On the one hand, for any training sample in the manually labeled first part of the training samples, assuming that any training sample is the first training sample, the electronic device recognizes the second text recognition result of the first training sample and the Manually label the results to obtain the first loss function value, and at the same time, the electronic device obtains the second loss function value according to the first text recognition result of the first training sample and the second text recognition result of the first training sample, and then the electronic device The loss function value and the second loss function value are smoothed or weighted to obtain the processed first loss function value and the second loss function value, and then the electronic device according to the processed first loss function value and the processed The second loss function value adjusts the parameters in the initial neural network model. The adjusted parameters make the second recognition result output by the neural network model as similar as possible to the first text recognition result and the manual annotation result, for example, the first similarity between the second recognition result and the first text recognition result, and the second recognition The second similarity between the result and the manual annotation result is greater than the set threshold, the purpose is to effectively improve the prediction accuracy of the target neural network model.

For example, as shown in Table 1, suppose there are 3 training samples, each training sample is input to the initial neural network model, the initial neural network model outputs the predicted value corresponding to each training sample (ie, the second text Recognition results), in addition, the electronic device also obtained the reference value (ie the first text recognition result) corresponding to each training sample and the true values of these three training samples (ie manual annotation) before starting to train the initial neural network model result).

Table 1

样本sample	预测值Predictive value	真实值actual value	参考值Reference
样本1Sample 1	1515	1111	1616
样本2Sample 2	1717	1212	1414
样本3Sample 3	1919	1313	1818

Specifically, when the electronic device inputs the predicted value and the true value of the above sample 1 to the loss function, the first loss function value is obtained, and when the electronic device inputs the predicted value and the reference value of the above sample 1 to the loss function, the first The second loss function value, and then obtain the processed first loss function value and the processed second loss function value as follows. The electronic device then adjusts the parameters of the neural network model based on the processed first loss function value and the processed second loss function value. Similarly, for sample 2, the electronic device also calculates the processed first loss function value and the processed second loss function value corresponding to sample 2, and then based on the processed first loss function value and the processed first Two loss function values, adjust the parameters of the neural network model until the last training sample.

On the other hand, for any one of the n training samples except the first part of the training samples, assuming that any one of the training samples is the second training sample, the electronic device recognizes the result based on the second text of the second training sample and The first text recognition result of the second training sample obtains the loss function value. For classification tasks, electronic equipment can use Softmax-like loss functions, for regression tasks, electronic equipment can use Smooth L1 for regression (regression smoothing) and can be combined with KL divergence (also known as relative entropy); finally through gradient descent algorithm, update the initial The parameters in the neural network model minimize the difference between the two text recognition results.

The following embodiments of the present application further elaborate on the specific process of the above neural network model training method in conjunction with the processes shown in FIGS. 1 and 4. The specific process of the method process may include:

Step 401, the development server 101 inputs n training samples into the reference neural network model in each server 102 by calling the interface of the reference neural network model, and receives the output result of each server 102, the output result is related to each training sample Corresponding to the first text recognition result, the electronic device generates a data set of the first text recognition result of all reference neural network models corresponding to each training sample.

For example, n training samples are all images that include text to be recognized, assuming 2000 images. The first text recognition result includes the text area on the image and the text content. For example, as shown in FIG. 5, the electronic device inputs the image shown in FIG. 5a into the reference neural network model, and outputs the text area information and text content shown in FIG. 5b.

Step 402: The electronic device determines the applicable initial neural network model according to the configuration information of its own operating environment and the hardware information of the artificial intelligence chip.

For example, in the embodiment of the present application, the developer inputs the Mate 10 system information and the Kirin 970 chip into the network builder, and the network builder automatically outputs the initial neural network model based on the input information.

For example, output a 15-layer fully convolutional network, and finally a 1/8-dimensional feature map, in which the fc3 layer and fc4 layer are on the feature layer with the network scaling size of 1/4 and 1/8 respectively. It is extracted and spliced back to the network backbone structure by eltwise operator after deconv45 layer and deconv5 layer respectively. The branch structure with scaled sizes of 1/4 and 1/8 enables the network to detect small and medium-sized text, and the feature layer with scaled size of 1/16 ensures that the neural network detects large-sized text. Perform a regression and a classification task on the spliced feature layer; among them: classification task: determine whether the area is text in the original input image mapped by the feature layer; regression task: determine the feature layer In the mapped original input image, the four corner points of the text boundary of the area.

Step 403: The electronic device inputs n training samples into the initial neural network model to obtain a second text recognition result corresponding to each training sample.

Step 404: For the training samples with artificial annotation results, adjust the parameters in the initial neural network model according to the first text recognition result and the second text recognition result, and the manual annotation results; for the training samples without artificial annotation results, according to the A text recognition result and a second text recognition result adjust the parameters in the initial neural network model.

In step 405, the electronic device continuously performs the previous step iteratively until the set condition is met, that is, the effect of the target neural network model output in step 404 in the verification data set reaches the standard, or the threshold of the specified iteration number is reached, then the training is terminated and the target nerve is output Network model.

In some embodiments of the present application, the electronic device may be a portable electronic device that also includes other functions such as personal digital assistant and/or music player functions, such as a mobile phone, a tablet computer, and a wearable device with wireless communication function (such as a smart watch) )Wait. Exemplary embodiments of portable electronic devices include, but are not limited to

Or portable electronic devices of other operating systems. The above portable electronic device may also be other portable electronic devices, such as a laptop with a touch-sensitive surface (for example, a touch panel) and the like. It should also be understood that, in some other embodiments of the present application, the electronic device may not be a portable electronic device, but a desktop computer with a touch-sensitive surface (such as a touch panel).

Taking the electronic device as a mobile phone as an example, FIG. 6 shows a schematic structural diagram of the mobile phone 100.

The mobile phone 100 may include a processor 110, an external memory interface 120, an internal memory 121, a USB interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, The audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone jack 170D, the sensor module 180, the key 190, the motor 191, the indicator 192, the camera 193, the display screen 194, and the SIM card interface 195, etc.

It can be understood that the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the mobile phone 100. In other embodiments of the present application, the mobile phone 100 may include more or fewer components than shown, or combine some components, or split some components, or arrange different components. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), and an image signal processor (image)signal processor (ISP), controller, memory, video codec, digital signal processor (DSP), baseband processor, and/or neural network processor (Neural-network Processing Unit, NPU) Wait. Among them, different processing units may be independent devices, or may be integrated in one or more processors. The controller may be the nerve center and command center of the mobile phone 100. The controller can generate the operation control signal according to the instruction operation code and the timing signal to complete the control of fetching instructions and executing instructions.

The processor 110 may also be provided with a memory for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may store instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Avoid repeated access, reduce the waiting time of the processor 110, thus improving the efficiency of the system.

The processor 110 may run the neural network model training method provided in the embodiment of the present application, using the first text recognition result output by the reference neural network model, and the second text recognition result output by the initial neural network model, and n training samples The first part of the manual labeling results of the training samples adjusts the parameters in the initial neural network model to obtain the target neural network model. The training method may be executed by a general-purpose processor, a dedicated processor, or a general-purpose processor and a dedicated processor. For example, when the processor 110 integrates different devices, such as integrating a CPU and an NPU, the CPU and the NPU may cooperate to execute the neural network model training method provided in the embodiments of the present application, for example, some algorithms in the neural network model training method are executed by the CPU and The algorithm is executed by the NPU to obtain faster processing efficiency.

The display screen 194 is used to display images, videos and the like. The display screen 194 includes a display panel. The display panel may use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light-emitting diode or an active matrix organic light-emitting diode (active-matrix organic light) emitting diode, AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diode (QLED), etc. In some embodiments, the mobile phone 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.

The display screen 194 may display text recognized by the target neural network model, and the display screen 194 may also display samples to be trained.

The camera 193 (front camera or rear camera) is used to capture still images or video. Generally, the camera 193 may include a photosensitive element such as a lens group and an image sensor, where the lens group includes a plurality of lenses (convex lens or concave lens) for collecting the light signal reflected by the object to be photographed and transmitting the collected light signal to the image sensor . The image sensor generates an original image of the object to be captured according to the light signal. After the camera 193 collects the original image, the original image may be sent to the processor 110, and the processor 110 uses it as a training sample, and runs the neural network model training algorithm provided by the embodiment of the present application to obtain a recognition result.

The internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions. The processor 110 executes instructions stored in the internal memory 121 to execute various functional applications and data processing of the mobile phone 100. The internal memory 121 may include a storage program area and a storage data area. Among them, the storage program area may store codes of the operating system and application programs (such as camera applications, WeChat applications, etc.). The storage data area may store data created during the use of the mobile phone 100 (such as n training samples and text recognition results).

The internal memory 121 may also store the code corresponding to the training algorithm provided by the embodiment of the present application. When the code of the training algorithm stored in the internal memory 121 is executed by the processor 110, the initial neural network model can be trained.

In addition, the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and so on.

The mobile phone 100 also includes the function of the sensor module 180. For example, it includes a gyro sensor, an acceleration sensor, a proximity light sensor, etc. Of course, the mobile phone 100 may also include other sensors, such as a pressure sensor, an acceleration sensor, a gyro sensor, an ambient light sensor, a bone conduction sensor, etc. (not shown) .

The wireless communication function of the mobile phone 100 can be realized through the antenna 1, the antenna 2, the mobile communication module 151, the wireless communication module 152, the modem processor, and the baseband processor. In addition, the mobile phone 100 can realize audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, a headphone interface 170D, and an application processor. For example, music playback, recording, etc. The mobile phone 100 may receive key 190 input and generate key signal input related to user settings and function control of the mobile phone 100. The mobile phone 100 can use the motor 191 to generate vibration prompts (such as incoming call vibration prompts). The indicator 192 in the mobile phone 100 can be an indicator light, which can be used to indicate the charging state, the power change, and can also be used to indicate messages, missed calls, notifications, and the like. The SIM card interface 195 in the mobile phone 100 is used to connect a SIM card. The SIM card can be inserted into or removed from the SIM card interface 195 to achieve contact and separation with the mobile phone 100.

It should be understood that in practical applications, the mobile phone 100 may include more or fewer components than those shown in FIG. 6, which is not limited in the embodiments of the present application.

An embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium includes a computer program. When the computer program runs on an electronic device, the electronic device is caused to perform any of the above neural network model training methods. Possible implementation.

An embodiment of the present application further provides a computer program product that, when the computer program product runs on an electronic device, causes the electronic device to perform any possible implementation of the above neural network model training method.

In some embodiments of the present application, the embodiments of the present application disclose a neural network model training device. As shown in FIG. 7, the device is used to implement the methods described in the above method embodiments, which includes: a transceiver module 701 、Processing module 702. The transceiver module 701 is used to support the electronic device to input n training samples to the reference neural network model and receive the output result of the reference neural network model, and the processing module 702 is used to support the electronic device to input the n training samples to the initial nerve The network model performs processing to obtain a second text recognition result corresponding to each training sample, based on the first text recognition result, the second text recognition result, and the manual training of some of the n training samples Mark the results, adjust the parameters in the initial neural network model to obtain the target neural network model. All relevant content of each step involved in the above method embodiments can be referred to the function description of the corresponding function module, which will not be repeated here.

In other embodiments of the present application, the embodiments of the present application disclose an electronic device. As shown in FIG. 8, the electronic device may include: one or more processors 801; a memory 802; a display 803; one or more Application programs (not shown); and one or more computer programs 804, the above devices can be connected through one or more communication buses 805. The one or more computer programs 804 are stored in the above-mentioned memory 802 and configured to be executed by the one or more processors 801. The one or more computer programs 804 include instructions, and the above-mentioned instructions may be used to execute 3 and FIG. 4 and the steps in the corresponding embodiment.

Through the description of the above embodiments, those skilled in the art can clearly understand that, for convenience and conciseness of description, only the above-mentioned division of each functional module is used as an example for illustration. In practical applications, the above-mentioned functions can be allocated as needed It is completed by different functional modules, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. For the specific working processes of the system, device and unit described above, reference may be made to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

The functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The above integrated unit may be implemented in the form of hardware or software functional unit.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on such an understanding, the technical solutions of the embodiments of the present application may essentially be part of or contribute to the existing technology, or all or part of the technical solutions may be embodied in the form of software products, and the computer software products are stored in a storage The medium includes several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or processor to perform all or part of the steps of the methods described in the embodiments of the present application. The foregoing storage media include: flash memory, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other media that can store program codes.

The above is only the specific implementation of the embodiments of the present application, but the scope of protection of the embodiments of the present application is not limited to this, any changes or replacements within the technical scope disclosed in the embodiments of the present application should be covered in this Within the scope of protection of the application examples. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.

Claims

A neural network model training method, which is applied to electronic equipment, and is characterized by including:

Input n training samples to the reference neural network model for processing to obtain the first text recognition result corresponding to each training sample, n is a positive integer;

Input the n training samples into an initial neural network model for processing to obtain a second text recognition result corresponding to each training sample;

Adjust the parameters in the initial neural network model according to the first text recognition result, the second text recognition result, and the manual labeling result of the first part of the n training samples to obtain the target Neural network model.
The method of claim 1, wherein the electronic device is based on the first text recognition result, the second text recognition result, and the manual labeling of the first part of the n training samples As a result, the parameters in the initial neural network model are adjusted, including:

For the first training sample, a first loss function value is obtained according to the second text recognition result of the first training sample and the manual labeling result of the first training sample, where the first training sample is the first Any one of a part of training samples;

Obtaining a second loss function value according to the first text recognition result of the first training sample and the second text recognition result of the first training sample;

Smoothing or weighting the first loss function value and the second loss function value to obtain the processed first loss function value and the second loss function value;

Adjust the parameters in the initial neural network model according to the processed first loss function value and the processed second loss function value.
The method according to claim 1, wherein the electronic device is based on the first text recognition result, the second text recognition result, and a manual labeling result of part of the n training samples To adjust the parameters in the initial neural network model, including:

For the second training sample, according to the first text recognition result of the second training sample and the second text recognition result of the second training sample, a parameter in the initial neural network model is adjusted using a stochastic gradient descent algorithm , Where the second training sample is any one of the n training samples except the first part of the training samples.
The method according to any one of claims 1 to 3, wherein the electronic device inputs the n training samples to an initial neural network model for processing to obtain a second text recognition corresponding to each training sample The results include:

Input the n training samples into the initial neural network model to perform multiple task learning, the multiple tasks include classification tasks and regression tasks;

According to the learning results of the multiple tasks, a second text recognition result of each training sample is obtained, the second text recognition result includes the position of the text frame corresponding to the regression task, and whether each pixel corresponding to the classification task is text.
The method according to any one of claims 1 to 4, wherein the electronic device inputs n training samples to a reference neural network model for processing to obtain a first text recognition result corresponding to each training sample, include:

Input the n training samples into the reference neural network model by calling the open interface of the reference neural network model;

Receiving the first text recognition result corresponding to each training sample output by the reference neural network model.
The method according to any one of claims 1 to 4, further comprising:

The initial neural network model is determined according to the configuration information of the current operating environment of the electronic device and the hardware information of the artificial intelligence chip, wherein the structure, parameters, and size of the initial neural network model can be supported by the electronic device.
An electronic device, characterized in that it includes a processor and a memory;

The memory is used to store one or more computer programs;

When one or more computer programs stored in the memory are executed by the processor, the electronic device is caused to execute:

Input n training samples to the reference neural network model for processing to obtain the first text recognition result corresponding to each training sample;

Input the n training samples into an initial neural network model for processing to obtain a second text recognition result corresponding to each training sample;

Adjust the parameters in the initial neural network model according to the first text recognition result, the second text recognition result, and the manual labeling result of the first part of the n training samples to obtain the target Neural network model.
The electronic device according to claim 7, wherein when the one or more computer programs stored in the memory are executed by the processor, the electronic device is also caused to execute:

For the first training sample, a first loss function value is obtained according to the second text recognition result of the first training sample and the manual labeling result of the first training sample, where the first training sample is the first Any one of a part of training samples;

Obtaining a second loss function value according to the first text recognition result of the first training sample and the second text recognition result of the first training sample;

Smoothing or weighting the first loss function value and the second loss function value to obtain the processed first loss function value and the second loss function value;

Adjust the parameters in the initial neural network model according to the processed first loss function value and the processed second loss function value.
The electronic device according to claim 7, wherein when the one or more computer programs stored in the memory are executed by the processor, the electronic device is also caused to execute:

For the second training sample, according to the first text recognition result of the second training sample and the second text recognition result of the second training sample, a parameter in the initial neural network model is adjusted using a stochastic gradient descent algorithm , Where the second training sample is any one of the n training samples except the first part of the training samples.
The electronic device according to any one of claims 7 to 9, wherein when one or more computer programs stored in the memory are executed by the processor, the electronic device is also caused to execute:

Input the n training samples into the initial neural network model to perform multiple task learning, the multiple tasks include classification tasks and regression tasks;

According to the learning results of the multiple tasks, a second text recognition result of each training sample is obtained, the second text recognition result includes the position of the text frame corresponding to the regression task, and whether each pixel corresponding to the classification task is text.
The electronic device according to any one of claims 7 to 10, wherein when one or more computer programs stored in the memory are executed by the processor, the electronic device is also caused to execute:

Input the n training samples into the reference neural network model by calling the open interface of the reference neural network model;

Receiving the first text recognition result corresponding to each training sample output by the reference neural network model.
The electronic device according to any one of claims 7 to 10, wherein when one or more computer programs stored in the memory are executed by the processor, the electronic device is also caused to execute:

The initial neural network model is determined according to the configuration information of the current operating environment of the electronic device and the hardware information of the artificial intelligence chip, wherein the structure, parameters, and size of the initial neural network model can be supported by the electronic device.
A computer storage medium, wherein the computer-readable storage medium includes a computer program, and when the computer program is run on an electronic device, the electronic device is caused to perform the method according to any one of claims 1 to 6.
A chip, characterized in that the chip is coupled to a memory for executing a computer program stored in the memory to perform the method according to any one of claims 1 to 6.