WO2020108368A1 - Neural network model training method and electronic device - Google Patents

Neural network model training method and electronic device Download PDF

Info

Publication number
WO2020108368A1
WO2020108368A1 PCT/CN2019/119815 CN2019119815W WO2020108368A1 WO 2020108368 A1 WO2020108368 A1 WO 2020108368A1 CN 2019119815 W CN2019119815 W CN 2019119815W WO 2020108368 A1 WO2020108368 A1 WO 2020108368A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
network model
recognition result
electronic device
text recognition
Prior art date
Application number
PCT/CN2019/119815
Other languages
French (fr)
Chinese (zh)
Inventor
谢淼
施烈航
姚恒志
勾军委
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201910139681.2A external-priority patent/CN111242273B/en
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020108368A1 publication Critical patent/WO2020108368A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass

Definitions

  • This application relates to the field of machine learning technology, in particular to a neural network model training method and electronic equipment.
  • Deep neural network is a hot research direction in recent years. It simulates the human brain's multi-layered computing architecture system from the perspective of bionics. It is the direction closest to artificial intelligence. It can better characterize the most essential signal. Variable characteristics. In recent years, deep learning has achieved good results in speech processing, visual processing, and image processing. Optical character recognition (OCR) is difficult and important in the field of visual processing due to its diverse characters (for example, there are nearly 7,000 commonly used Chinese characters in Chinese), the complex scenes and the rich semantic information.
  • OCR optical character recognition
  • the current neural network model used for OCR has a large amount of calculation and requires high hardware processing capabilities of the device, so it is generally integrated on the server side.
  • Baidu's neural network model for OCR is integrated in Baidu's cloud server. Limited by the processing power of the artificial intelligence dedicated chip on the terminal side, the neural network model for OCR currently integrated on the server side is not suitable for running directly on the terminal side. To this end, the terminal side needs to rebuild the neural network model for OCR that is suitable for the terminal side, but it is currently more difficult to build a neural network model suitable for the terminal side from scratch, because the construction of the neural network model depends on a large number of different scenarios.
  • the labeling result of the training sample, and the labeling result of the training sample includes manually labeling the text area and text content, so the workload of manual labeling is very large, resulting in a very high training cost and low efficiency of the neural network model.
  • the present application provides a neural network model training method and electronic equipment to provide a method for quickly training a neural network model suitable for a terminal side.
  • an embodiment of the present application provides a method for training a neural network model.
  • the method includes: an electronic device inputs n training samples to a reference neural network model for processing to obtain a first text recognition corresponding to each training sample As a result, the electronic device inputs n training samples to the initial neural network model for processing to obtain a second text recognition result corresponding to each training sample. Finally, the electronic device recognizes the first text recognition result and the second text recognition result. And the artificial labeling result of the first part of the training samples in the n training samples, the parameters in the initial neural network model are adjusted to obtain the target neural network model.
  • the electronic device can obtain the recognition result of the training sample by calling the reference neural network model, so only a part of the training sample needs to be manually labeled, which can reduce the workload of manual labeling to a certain extent and save training costs.
  • the electronic device integrates the target neural network model, the user can quickly identify the text area and text content in the image through the electronic device without networking, which can protect the user's privacy to a certain extent and has high security.
  • the electronic device may recognize the second text recognition result of the first training sample and the manual annotation of the first training sample As a result, a first loss function value is obtained; and according to the first text recognition result of the first training sample and the second text recognition result of the first training sample, a second loss function value is obtained; then the value of the first loss function Perform a smoothing or weighting process with the second loss function value to obtain the processed first loss function value and the second loss function value; adjust according to the processed first loss function value and the processed second loss function value The parameters in this initial neural network model.
  • the electronic device performs model training based on the results of manual labeling and reference to the output of the neural network model, which can improve the long text blanking problem and low recall rate of the neural network model currently used for OCR to a certain extent. problem.
  • the parameters in the initial neural network model are adjusted using a stochastic gradient descent algorithm.
  • the electronic device can solve the problem of long text blanking and the low recall rate according to the manual labeling result and the output result of the reference neural network model.
  • the electronic device inputs the n training samples to the initial neural network model to learn multiple tasks, and the multiple tasks may include classification tasks and regression tasks.
  • the electronic device obtains the second text recognition result of each training sample according to the learning results of the multiple tasks, so that the second text recognition result includes the position of the text frame corresponding to the regression task, and whether each pixel corresponding to the classification task Is text.
  • the electronic device achieves a balance between the output result of the initial neural network model and the reference neural network model and the manual annotation result through multi-task integrated learning.
  • the electronic device calls the open interface of the reference neural network model, inputs the n training samples into the reference neural network model, and receives the output of the reference neural network model corresponding to each training sample The first text recognition result.
  • a neural network model suitable for the end side can be quickly constructed.
  • the electronic device determines the initial neural network model according to the configuration information of the current operating environment of the electronic device and the hardware information of the artificial intelligence chip, wherein the structure, parameters and The size can be supported by the electronic device.
  • a neural network model suitable for the end-side can be quickly constructed.
  • the user can quickly identify the text area and text content in the image only through the electronic device without networking, which can protect the user's privacy and safety to a certain extent. Sexuality is higher.
  • an embodiment of the present application provides an electronic device, including a processor and a memory.
  • the memory is used to store one or more computer programs; when the one or more computer programs stored in the memory are executed by the processor, the electronic device can implement any possible design method of any one of the above aspects.
  • an embodiment of the present application further provides an apparatus, the apparatus including a module/unit that executes any one of the above-mentioned possible design methods.
  • These modules/units can be implemented by hardware, and can also be implemented by hardware executing corresponding software.
  • a computer-readable storage medium is also provided in an embodiment of the present application.
  • the computer-readable storage medium includes a computer program, and when the computer program runs on an electronic device, the electronic device performs any of the above aspects Any possible design method.
  • an embodiment of the present application further provides a method that includes a computer program product that, when the computer program product runs on a terminal, causes the electronic device to perform any one of the possible designs of any of the above aspects.
  • an embodiment of the present application further provides a chip coupled to a memory, for executing a computer program stored in the memory, so that the electronic device performs any possible design method of any of the above aspects .
  • FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of a neural network model provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a neural network model training method provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of another neural network model training method provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a training sample and a text recognition result of a training sample provided by an embodiment of the present application
  • FIG. 6 is a schematic structural diagram of a mobile phone provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a neural network model training device provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the embodiment of the present application provides a neural network model training method, which can use the current commercial neural network model on the server side as a reference neural network model, that is to say, the electronic device can first call the commercial neural network model pair n on the server side Processing training samples to obtain the first text recognition result corresponding to each training sample, and then input the n training samples into the initial neural network model to be trained for processing, to obtain the second corresponding to each training sample
  • the text recognition result the final electronic device adjusts the parameters in the initial neural network model according to the first text recognition result, the second text recognition result, and the manual labeling results of some training samples in the n training samples to obtain the target Neural network model.
  • the recognition result of the training sample can be obtained by calling the reference neural network model, only a part of the training sample needs to be manually labeled, which can reduce the workload of manual labeling to a certain extent and save training costs .
  • the electronic device integrates the target neural network model, the user can view the text area in the image only through the electronic device without networking. The rapid recognition of the text content can protect the user's privacy to a certain extent, and the security is high.
  • the neural network model training method provided in the embodiments of the present application can be applied to the application scenario shown in FIG. 1, where the application scenario includes a development server 101, a server 102, and a terminal device 103.
  • the development server 101 is used to determine the initial neural network model according to the configuration information of the current operating environment of the terminal device 103 and the hardware information of the artificial intelligence chip.
  • the terminal device 103 is a Mate 10 model mobile phone equipped with a Kirin 970 chip, which only supports activation functions such as convolution and relu, so the development server 101 outputs an initial neural network model suitable for the mobile phone.
  • the backbone structure of the initial neural network model is a 15-layer fully convolutional network, and finally a 1/8-dimensional feature map.
  • the development server 101 is also pre-integrated with n training samples.
  • the development server 101 is also used to call the interface of the reference neural network model in the server 102 to transfer n training samples to the reference neural network model of the server 102, and Obtain the first text recognition result of the reference neural network model for each training sample.
  • the development server 101 is used to input n training samples into the initial neural network model for processing to obtain a second text recognition result corresponding to each training sample, and then recognize the second text according to the first text recognition result and the second text recognition result.
  • the results, as well as the manual labeling results of some of the n training samples adjust the parameters in the initial neural network model to obtain the target neural network model.
  • the parameters set by the initial neural network model before starting the training process are hyperparameters, that is, parameter data not obtained through training.
  • the training process will optimize the hyperparameters.
  • the essence of the training process is to select a set of optimal parameters for the model to improve the performance and effect of learning.
  • the development server 101 installs the generated target neural network model into the terminal device 103, and the user can view the text area in the image only through the terminal device 103 without network connection.
  • Quick recognition of text content For example, the terminal device 103 is installed with an application that integrates the target neural network model.
  • the user imports an image containing the text to be recognized in the application, where the image can be a photo of an ID card, a PDF format document, etc. Screenshots of product purchase information, etc., and then the terminal device 103 can directly recognize the text in the image.
  • the identification result of the photo of the ID card is: Name: Wang Li, Gender: Female, Ethnicity: Han, Birth: 1988-08-11, Address: Room 1, Building 1, Lane 353, Lujiazui Road, Shanghai, ID The number 370405198805XXXXX.
  • the development server 101 inputs n training samples to the reference neural network model, and inputs n training samples to the initial neural network model. There is no absolute order of these two steps, you can first execute n One training sample is input to the reference neural network model, or n training samples may be input to the initial neural network model first, or simultaneously.
  • the development server 101 and the server 102 are connected through a wireless network
  • the terminal device 103 is a terminal device capable of network communication.
  • the terminal device may be a smart phone, a tablet computer, or a portable personal computer.
  • the server 102 may be a server, or a server cluster or cloud computing center composed of several servers.
  • a neural network structure diagram of an initial neural network model including a convolution layer, a pooling layer, and an up-sampling layer.
  • the convolutional layer is used to extract image features from the input image
  • the pooling layer is used for feature dimensionality reduction, compressing the number of data and parameters, reducing overfitting, and improving the model's fault tolerance.
  • the purpose of the upsampling layer is to enlarge the original image so that it can be displayed on a higher resolution display device.
  • the initial neural network model performs classification task and regression task learning on n training samples. According to the learning results of the multiple tasks, a second text recognition result for each training sample is obtained, and the second text recognition result includes regression The position of the text border corresponding to the task, and whether each pixel corresponding to the classification task is text.
  • the convolutional layer in the initial neural network model can be one or more layers, and the pooling layer and the upsampling layer can also be one or more layers.
  • an embodiment of the present application provides a flow of a method for training a neural network model. As shown in FIG. 3, the flow of this method may be performed by an electronic device To execute, the method includes the following steps:
  • Step 301 The electronic device inputs n training samples to the reference neural network model for processing, to obtain a first text recognition result corresponding to each training sample, where n is a positive integer.
  • the electronic device may be the development server 101 in FIG. 1. Specifically, a developer may log in to the development platform of the reference neural network model on the development server 101 in advance, and obtain permission to call the interface of the reference neural network model. Then, by calling the interface, the development server 101 inputs n training samples into the reference neural network model, and receives the first text recognition result corresponding to each training sample output by the reference neural network model.
  • the reference neural network model may be a currently used neural network model for OCR, such as Hanwang Baidu Wait.
  • the electronic device generates a data set of the first text recognition results of all reference neural network models corresponding to each training sample.
  • Step 302 The electronic device inputs the n training samples to the initial neural network model for processing, and obtains a second text recognition result corresponding to each training sample.
  • the electronic device needs to generate an initial neural network model to be trained according to the configuration information of the current operating environment of the electronic device and the hardware information of the artificial intelligence chip.
  • the development server 101 generates an initial neural network model to be trained according to the configuration information of the current operating environment of the terminal device 103 and the hardware information of the artificial intelligence chip.
  • the server 101 inputs the n training samples into the initial neural network model for processing.
  • the server 101 can input n training samples into the neural network model shown in FIG. 2 and perform classification tasks and regression The task obtains the second text recognition result corresponding to each training sample.
  • Step 303 The electronic device adjusts the parameters in the initial neural network model according to the first text recognition result, the second text recognition result, and the manual labeling result of some training samples in the n training samples to obtain the target nerve Network model.
  • the developer Before performing step 303, the developer also pre-manually labels some of the n training samples to obtain artificial labeling results.
  • the manual annotation result may include the position of the text area, and whether the pixel is text.
  • the electronic device recognizes the second text recognition result of the first training sample and the Manually label the results to obtain the first loss function value, and at the same time, the electronic device obtains the second loss function value according to the first text recognition result of the first training sample and the second text recognition result of the first training sample, and then the electronic device
  • the loss function value and the second loss function value are smoothed or weighted to obtain the processed first loss function value and the second loss function value, and then the electronic device according to the processed first loss function value and the processed
  • the second loss function value adjusts the parameters in the initial neural network model.
  • the adjusted parameters make the second recognition result output by the neural network model as similar as possible to the first text recognition result and the manual annotation result, for example, the first similarity between the second recognition result and the first text recognition result, and the second recognition
  • the second similarity between the result and the manual annotation result is greater than the set threshold, the purpose is to effectively improve the prediction accuracy of the target neural network model.
  • each training sample is input to the initial neural network model
  • the initial neural network model outputs the predicted value corresponding to each training sample (ie, the second text Recognition results)
  • the electronic device also obtained the reference value (ie the first text recognition result) corresponding to each training sample and the true values of these three training samples (ie manual annotation) before starting to train the initial neural network model result).
  • the electronic device when the electronic device inputs the predicted value and the true value of the above sample 1 to the loss function, the first loss function value is obtained, and when the electronic device inputs the predicted value and the reference value of the above sample 1 to the loss function, the first The second loss function value, and then obtain the processed first loss function value and the processed second loss function value as follows.
  • the electronic device then adjusts the parameters of the neural network model based on the processed first loss function value and the processed second loss function value.
  • the electronic device also calculates the processed first loss function value and the processed second loss function value corresponding to sample 2, and then based on the processed first loss function value and the processed first Two loss function values, adjust the parameters of the neural network model until the last training sample.
  • the electronic device recognizes the result based on the second text of the second training sample and The first text recognition result of the second training sample obtains the loss function value.
  • electronic equipment can use Softmax-like loss functions
  • electronic equipment can use Smooth L1 for regression (regression smoothing) and can be combined with KL divergence (also known as relative entropy); finally through gradient descent algorithm, update the initial The parameters in the neural network model minimize the difference between the two text recognition results.
  • the following embodiments of the present application further elaborate on the specific process of the above neural network model training method in conjunction with the processes shown in FIGS. 1 and 4.
  • the specific process of the method process may include:
  • Step 401 the development server 101 inputs n training samples into the reference neural network model in each server 102 by calling the interface of the reference neural network model, and receives the output result of each server 102, the output result is related to each training sample
  • the electronic device corresponds to the first text recognition result
  • the electronic device generates a data set of the first text recognition result of all reference neural network models corresponding to each training sample.
  • n training samples are all images that include text to be recognized, assuming 2000 images.
  • the first text recognition result includes the text area on the image and the text content.
  • the electronic device inputs the image shown in FIG. 5a into the reference neural network model, and outputs the text area information and text content shown in FIG. 5b.
  • Step 402 The electronic device determines the applicable initial neural network model according to the configuration information of its own operating environment and the hardware information of the artificial intelligence chip.
  • the developer inputs the Mate 10 system information and the Kirin 970 chip into the network builder, and the network builder automatically outputs the initial neural network model based on the input information.
  • output a 15-layer fully convolutional network, and finally a 1/8-dimensional feature map, in which the fc3 layer and fc4 layer are on the feature layer with the network scaling size of 1/4 and 1/8 respectively. It is extracted and spliced back to the network backbone structure by eltwise operator after deconv45 layer and deconv5 layer respectively.
  • the branch structure with scaled sizes of 1/4 and 1/8 enables the network to detect small and medium-sized text, and the feature layer with scaled size of 1/16 ensures that the neural network detects large-sized text.
  • classification task determine whether the area is text in the original input image mapped by the feature layer
  • regression task determine the feature layer In the mapped original input image, the four corner points of the text boundary of the area.
  • Step 403 The electronic device inputs n training samples into the initial neural network model to obtain a second text recognition result corresponding to each training sample.
  • Step 404 For the training samples with artificial annotation results, adjust the parameters in the initial neural network model according to the first text recognition result and the second text recognition result, and the manual annotation results; for the training samples without artificial annotation results, according to the A text recognition result and a second text recognition result adjust the parameters in the initial neural network model.
  • step 405 the electronic device continuously performs the previous step iteratively until the set condition is met, that is, the effect of the target neural network model output in step 404 in the verification data set reaches the standard, or the threshold of the specified iteration number is reached, then the training is terminated and the target nerve is output Network model.
  • the electronic device may be a portable electronic device that also includes other functions such as personal digital assistant and/or music player functions, such as a mobile phone, a tablet computer, and a wearable device with wireless communication function (such as a smart watch) )Wait.
  • portable electronic devices include, but are not limited to Or portable electronic devices of other operating systems.
  • the above portable electronic device may also be other portable electronic devices, such as a laptop with a touch-sensitive surface (for example, a touch panel) and the like. It should also be understood that, in some other embodiments of the present application, the electronic device may not be a portable electronic device, but a desktop computer with a touch-sensitive surface (such as a touch panel).
  • FIG. 6 shows a schematic structural diagram of the mobile phone 100.
  • the mobile phone 100 may include a processor 110, an external memory interface 120, an internal memory 121, a USB interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, The audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone jack 170D, the sensor module 180, the key 190, the motor 191, the indicator 192, the camera 193, the display screen 194, and the SIM card interface 195, etc.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the mobile phone 100.
  • the mobile phone 100 may include more or fewer components than shown, or combine some components, or split some components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), and an image signal processor (image)signal processor (ISP), controller, memory, video codec, digital signal processor (DSP), baseband processor, and/or neural network processor (Neural-network Processing Unit, NPU) Wait. Among them, different processing units may be independent devices, or may be integrated in one or more processors.
  • the controller may be the nerve center and command center of the mobile phone 100. The controller can generate the operation control signal according to the instruction operation code and the timing signal to complete the control of fetching instructions and executing instructions.
  • the processor 110 may also be provided with a memory for storing instructions and data.
  • the memory in the processor 110 is a cache memory.
  • the memory may store instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Avoid repeated access, reduce the waiting time of the processor 110, thus improving the efficiency of the system.
  • the processor 110 may run the neural network model training method provided in the embodiment of the present application, using the first text recognition result output by the reference neural network model, and the second text recognition result output by the initial neural network model, and n training samples The first part of the manual labeling results of the training samples adjusts the parameters in the initial neural network model to obtain the target neural network model.
  • the training method may be executed by a general-purpose processor, a dedicated processor, or a general-purpose processor and a dedicated processor.
  • the processor 110 when the processor 110 integrates different devices, such as integrating a CPU and an NPU, the CPU and the NPU may cooperate to execute the neural network model training method provided in the embodiments of the present application, for example, some algorithms in the neural network model training method are executed by the CPU and The algorithm is executed by the NPU to obtain faster processing efficiency.
  • the display screen 194 is used to display images, videos and the like.
  • the display screen 194 includes a display panel.
  • the display panel may use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light-emitting diode or an active matrix organic light-emitting diode (active-matrix organic light) emitting diode, AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diode (QLED), etc.
  • the mobile phone 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
  • the display screen 194 may display text recognized by the target neural network model, and the display screen 194 may also display samples to be trained.
  • the camera 193 (front camera or rear camera) is used to capture still images or video.
  • the camera 193 may include a photosensitive element such as a lens group and an image sensor, where the lens group includes a plurality of lenses (convex lens or concave lens) for collecting the light signal reflected by the object to be photographed and transmitting the collected light signal to the image sensor .
  • the image sensor generates an original image of the object to be captured according to the light signal.
  • the original image may be sent to the processor 110, and the processor 110 uses it as a training sample, and runs the neural network model training algorithm provided by the embodiment of the present application to obtain a recognition result.
  • the internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions.
  • the processor 110 executes instructions stored in the internal memory 121 to execute various functional applications and data processing of the mobile phone 100.
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area may store codes of the operating system and application programs (such as camera applications, WeChat applications, etc.).
  • the storage data area may store data created during the use of the mobile phone 100 (such as n training samples and text recognition results).
  • the internal memory 121 may also store the code corresponding to the training algorithm provided by the embodiment of the present application.
  • the code of the training algorithm stored in the internal memory 121 is executed by the processor 110, the initial neural network model can be trained.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and so on.
  • a non-volatile memory such as at least one disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and so on.
  • the mobile phone 100 also includes the function of the sensor module 180. For example, it includes a gyro sensor, an acceleration sensor, a proximity light sensor, etc. Of course, the mobile phone 100 may also include other sensors, such as a pressure sensor, an acceleration sensor, a gyro sensor, an ambient light sensor, a bone conduction sensor, etc. (not shown) .
  • the wireless communication function of the mobile phone 100 can be realized through the antenna 1, the antenna 2, the mobile communication module 151, the wireless communication module 152, the modem processor, and the baseband processor.
  • the mobile phone 100 can realize audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, a headphone interface 170D, and an application processor. For example, music playback, recording, etc.
  • the mobile phone 100 may receive key 190 input and generate key signal input related to user settings and function control of the mobile phone 100.
  • the mobile phone 100 can use the motor 191 to generate vibration prompts (such as incoming call vibration prompts).
  • the indicator 192 in the mobile phone 100 can be an indicator light, which can be used to indicate the charging state, the power change, and can also be used to indicate messages, missed calls, notifications, and the like.
  • the SIM card interface 195 in the mobile phone 100 is used to connect a SIM card. The SIM card can be inserted into or removed from the SIM card interface 195 to achieve contact and separation with the mobile phone 100.
  • the mobile phone 100 may include more or fewer components than those shown in FIG. 6, which is not limited in the embodiments of the present application.
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium includes a computer program.
  • the computer program runs on an electronic device, the electronic device is caused to perform any of the above neural network model training methods. Possible implementation.
  • An embodiment of the present application further provides a computer program product that, when the computer program product runs on an electronic device, causes the electronic device to perform any possible implementation of the above neural network model training method.
  • the embodiments of the present application disclose a neural network model training device. As shown in FIG. 7, the device is used to implement the methods described in the above method embodiments, which includes: a transceiver module 701 ⁇ Processing module 702.
  • the transceiver module 701 is used to support the electronic device to input n training samples to the reference neural network model and receive the output result of the reference neural network model
  • the processing module 702 is used to support the electronic device to input the n training samples to the initial nerve
  • the network model performs processing to obtain a second text recognition result corresponding to each training sample, based on the first text recognition result, the second text recognition result, and the manual training of some of the n training samples Mark the results, adjust the parameters in the initial neural network model to obtain the target neural network model. All relevant content of each step involved in the above method embodiments can be referred to the function description of the corresponding function module, which will not be repeated here.
  • the embodiments of the present application disclose an electronic device.
  • the electronic device may include: one or more processors 801; a memory 802; a display 803; one or more Application programs (not shown); and one or more computer programs 804, the above devices can be connected through one or more communication buses 805.
  • the one or more computer programs 804 are stored in the above-mentioned memory 802 and configured to be executed by the one or more processors 801.
  • the one or more computer programs 804 include instructions, and the above-mentioned instructions may be used to execute 3 and FIG. 4 and the steps in the corresponding embodiment.
  • the functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
  • the above integrated unit may be implemented in the form of hardware or software functional unit.
  • the integrated unit may be stored in a computer-readable storage medium.
  • the technical solutions of the embodiments of the present application may essentially be part of or contribute to the existing technology, or all or part of the technical solutions may be embodied in the form of software products, and the computer software products are stored in a storage
  • the medium includes several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or processor to perform all or part of the steps of the methods described in the embodiments of the present application.
  • the foregoing storage media include: flash memory, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The present application provides a neural network model training method and an electronic device. The method comprises the following steps: the electronic device inputs n training samples into a reference neural network model for processing to obtain a first text recognition result corresponding to each training sample; then, the electronic device inputs n training samples into an initial neural network model for processing to obtain a second text recognition result corresponding to each training sample; at last, the electronic device adjusts parameters in the initial neural network model according to the first text recognition result, the second text recognition result and a manually annotated result of a first part of training samples in n training samples to obtain a target neural network model.

Description

一种神经网络模型训练方法及电子设备A neural network model training method and electronic equipment
本申请要求在2018年11月29日提交中国国家知识产权局、申请号为201811443331.7、发明名称为“一种光学字符识别方法”的中国专利申请的优先权,和在2019年2月26日提交国家专利局、申请号为201910139681.2、发明名称为“一种神经网络模型训练方法及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application submitted to the State Intellectual Property Office of China on November 29, 2018 with the application number 201811443331.7 and the invention titled "An Optical Character Recognition Method", and filed on February 26, 2019 The priority of the Chinese patent application of the National Patent Office, the application number is 201910139681.2, and the name of the invention is "a neural network model training method and electronic equipment", the entire content of which is incorporated by reference in this application.
技术领域Technical field
本申请涉及机器学习技术领域,尤其涉及一种神经网络模型训练方法及电子设备。This application relates to the field of machine learning technology, in particular to a neural network model training method and electronic equipment.
背景技术Background technique
深度神经网络是近几年来比较热的一个研究方向,它从仿生学的角度模拟人脑的分多层计算架构体系,是最接近人工智能的一个方向,它更能表征信号的最本质的不变特征。近几年在语音处理、视觉处理领域及图像处理领域,深度学习均取得了较好的结果。而光学字符识别(optical character recognition,OCR)由于其字符多样(例如中文有近7000个常用汉字)、出现场景复杂、语义信息丰富的特性成为视觉处理领域中的难点和重点。Deep neural network is a hot research direction in recent years. It simulates the human brain's multi-layered computing architecture system from the perspective of bionics. It is the direction closest to artificial intelligence. It can better characterize the most essential signal. Variable characteristics. In recent years, deep learning has achieved good results in speech processing, visual processing, and image processing. Optical character recognition (OCR) is difficult and important in the field of visual processing due to its diverse characters (for example, there are nearly 7,000 commonly used Chinese characters in Chinese), the complex scenes and the rich semantic information.
目前用于OCR的神经网络模型因运算量大,对设备的硬件处理能力要求较高,所以一般集成在服务器侧,例如百度用于OCR的神经网络模型集成在百度的云端服务器。受限于终端侧的人工智能专用芯片的处理能力,所以目前集成在服务器侧的用于OCR的神经网络模型并不适用于直接在终端侧运行。为此,终端侧需要重新构建适用于终端侧的用于OCR的神经网络模型,但是目前从零开始构建适合终端侧的神经网络模型难度较大,原因是构建神经网络模型需要依赖大量不同场景的训练样本的标注结果,而训练样本的标注结果包括人工标注文字区域和文字内容,所以人工标注的工作量很大,导致神经网络模型训练成本非常高,且效率较低。The current neural network model used for OCR has a large amount of calculation and requires high hardware processing capabilities of the device, so it is generally integrated on the server side. For example, Baidu's neural network model for OCR is integrated in Baidu's cloud server. Limited by the processing power of the artificial intelligence dedicated chip on the terminal side, the neural network model for OCR currently integrated on the server side is not suitable for running directly on the terminal side. To this end, the terminal side needs to rebuild the neural network model for OCR that is suitable for the terminal side, but it is currently more difficult to build a neural network model suitable for the terminal side from scratch, because the construction of the neural network model depends on a large number of different scenarios. The labeling result of the training sample, and the labeling result of the training sample includes manually labeling the text area and text content, so the workload of manual labeling is very large, resulting in a very high training cost and low efficiency of the neural network model.
发明内容Summary of the invention
本申请提供一种神经网络模型训练方法及电子设备,用以提供一种快速训练适用于终端侧的神经网络模型的方法。The present application provides a neural network model training method and electronic equipment to provide a method for quickly training a neural network model suitable for a terminal side.
第一方面,本申请实施例提供了一种神经网络模型训练方法,该方法包括:电子设备将n个训练样本输入到参考神经网络模型进行处理,得到与每个训练样本对应的第一文本识别结果,然后电子设备将n个训练样本输入到初始神经网络模型进行处理,得到与每个训练样本对应的第二文本识别结果,最后电子设备根据该第一文本识别结果、第二文本识别结果,以及n个训练样本中的第一部分训练样本的人工标注结果,对该初始神经网络模型中的参数进行调整,得到目标神经网络模型。In the first aspect, an embodiment of the present application provides a method for training a neural network model. The method includes: an electronic device inputs n training samples to a reference neural network model for processing to obtain a first text recognition corresponding to each training sample As a result, the electronic device inputs n training samples to the initial neural network model for processing to obtain a second text recognition result corresponding to each training sample. Finally, the electronic device recognizes the first text recognition result and the second text recognition result. And the artificial labeling result of the first part of the training samples in the n training samples, the parameters in the initial neural network model are adjusted to obtain the target neural network model.
本申请实施例中,电子设备可以通过调用参考神经网络模型得到训练样本的识别结果,所以只需要对部分训练样本进行人工标注,一定程度上可以减少人工标注的工作量,节省训练成本。当电子设备集成该目标神经网络模型时,用户无需联网仅通过该电子设备就可以对图像中的文本区域、文本内容进行快速识别,一定程度上可以保护用户的隐私,安全性较高。In the embodiment of the present application, the electronic device can obtain the recognition result of the training sample by calling the reference neural network model, so only a part of the training sample needs to be manually labeled, which can reduce the workload of manual labeling to a certain extent and save training costs. When the electronic device integrates the target neural network model, the user can quickly identify the text area and text content in the image through the electronic device without networking, which can protect the user's privacy to a certain extent and has high security.
在一种可能的设计中,电子设备针对该第一部分训练样本中的任意一个,即针对第一训 练样本,可以根据该第一训练样本的第二文本识别结果和该第一训练样本的人工标注结果,得到第一损失函数值;并根据该第一训练样本的第一文本识别结果和该第一训练样本的第二文本识别结果,得到第二损失函数值;然后对该第一损失函数值和该第二损失函数值进行平滑处理或加权处理,得到处理后的第一损失函数值和第二损失函数值;根据处理后的第一损失函数值和处理后的第二损失函数值,调整该初始神经网络模型中的参数。In a possible design, for any one of the first part of the training samples, that is, for the first training sample, the electronic device may recognize the second text recognition result of the first training sample and the manual annotation of the first training sample As a result, a first loss function value is obtained; and according to the first text recognition result of the first training sample and the second text recognition result of the first training sample, a second loss function value is obtained; then the value of the first loss function Perform a smoothing or weighting process with the second loss function value to obtain the processed first loss function value and the second loss function value; adjust according to the processed first loss function value and the processed second loss function value The parameters in this initial neural network model.
本申请实施例中,电子设备依据人工标注结果和参考神经网络模型的输出结果进行模型训练,一定程度上可以改善目前用于OCR的神经网络模型所存在的长文本留白问题和召回率低的问题。In the embodiment of the present application, the electronic device performs model training based on the results of manual labeling and reference to the output of the neural network model, which can improve the long text blanking problem and low recall rate of the neural network model currently used for OCR to a certain extent. problem.
在一种可能的设计中,针对除了第一部分训练样本之外的n个训练样本中的任意一个,即针对第二训练样本,根据该第二训练样本的第一文本识别结果和该第二训练样本的第二文本识别结果,利用随机梯度下降算法对该初始神经网络模型中的参数进行调整。In a possible design, for any one of the n training samples except the first part of the training samples, that is, for the second training sample, according to the first text recognition result of the second training sample and the second training For the second text recognition result of the sample, the parameters in the initial neural network model are adjusted using a stochastic gradient descent algorithm.
本申请实施例中,电子设备依据人工标注结果和参考神经网络模型的输出结果,可以解决长文本留白问题和召回率低的问题。In the embodiment of the present application, the electronic device can solve the problem of long text blanking and the low recall rate according to the manual labeling result and the output result of the reference neural network model.
在一种可能的设计中,电子设备将该n个训练样本输入到初始神经网络模型进行多个任务学习,多个任务可以包括分类任务和回归任务。电子设备根据该多个任务的学习结果,得到每个训练样本的第二文本识别结果,这样第二文本识别结果就包括回归任务对应的文本边框的位置,以及分类任务对应的每个像素点是否是文本。In a possible design, the electronic device inputs the n training samples to the initial neural network model to learn multiple tasks, and the multiple tasks may include classification tasks and regression tasks. The electronic device obtains the second text recognition result of each training sample according to the learning results of the multiple tasks, so that the second text recognition result includes the position of the text frame corresponding to the regression task, and whether each pixel corresponding to the classification task Is text.
本申请实施例中,电子设备通过多任务集成学习使得初始神经网络模型与参考神经网络模型的输出结果和人工标注结果达成平衡。In the embodiment of the present application, the electronic device achieves a balance between the output result of the initial neural network model and the reference neural network model and the manual annotation result through multi-task integrated learning.
在一种可能的设计中,电子设备通过调用参考神经网络模型的开放接口,将该n个训练样本输入到参考神经网络模型中,并接收该参考神经网络模型输出的与每个训练样本对应的第一文本识别结果。In a possible design, the electronic device calls the open interface of the reference neural network model, inputs the n training samples into the reference neural network model, and receives the output of the reference neural network model corresponding to each training sample The first text recognition result.
本申请实施例中,通过采用云端已有的参考神经网络模型的接口,可以快速构建适用于端侧的神经网络模型。In the embodiment of the present application, by using an existing reference neural network model interface in the cloud, a neural network model suitable for the end side can be quickly constructed.
在一种可能的设计中,电子设备根据所述电子设备当前运行环境的配置信息和人工智能芯片的硬件信息,确定所述初始神经网络模型,其中,所述初始神经网络模型的结构、参数和大小能够被所述电子设备支持。In a possible design, the electronic device determines the initial neural network model according to the configuration information of the current operating environment of the electronic device and the hardware information of the artificial intelligence chip, wherein the structure, parameters and The size can be supported by the electronic device.
本申请实施例中可以快速构建适用于端侧的神经网络模型,用户无需联网仅通过该电子设备就可以对图像中的文本区域、文本内容进行快速识别,一定程度上可以保护用户的隐私,安全性较高。In the embodiment of the present application, a neural network model suitable for the end-side can be quickly constructed. The user can quickly identify the text area and text content in the image only through the electronic device without networking, which can protect the user's privacy and safety to a certain extent. Sexuality is higher.
第二方面,本申请实施例提供一种电子设备,包括处理器和存储器。其中,存储器用于存储一个或多个计算机程序;当存储器存储的一个或多个计算机程序被处理器执行时,使得该电子设备能够实现上述任一方面的任意一种可能的设计的方法。In a second aspect, an embodiment of the present application provides an electronic device, including a processor and a memory. Wherein, the memory is used to store one or more computer programs; when the one or more computer programs stored in the memory are executed by the processor, the electronic device can implement any possible design method of any one of the above aspects.
第三方面,本申请实施例还提供一种装置,该装置包括执行上述任一方面的任意一种可能的设计的方法的模块/单元。这些模块/单元可以通过硬件实现,也可以通过硬件执行相应的软件实现。In a third aspect, an embodiment of the present application further provides an apparatus, the apparatus including a module/unit that executes any one of the above-mentioned possible design methods. These modules/units can be implemented by hardware, and can also be implemented by hardware executing corresponding software.
第四方面,本申请实施例中还提供一种计算机可读存储介质,所述计算机可读存储介质包括计算机程序,当计算机程序在电子设备上运行时,使得所述电子设备执行上述任一方面的任意一种可能的设计的方法。In a fourth aspect, a computer-readable storage medium is also provided in an embodiment of the present application. The computer-readable storage medium includes a computer program, and when the computer program runs on an electronic device, the electronic device performs any of the above aspects Any possible design method.
第五方面,本申请实施例还提供一种包含计算机程序产品,当所述计算机程序产品在终端上运行时,使得所述电子设备执行上述任一方面的任意一种可能的设计的方法。According to a fifth aspect, an embodiment of the present application further provides a method that includes a computer program product that, when the computer program product runs on a terminal, causes the electronic device to perform any one of the possible designs of any of the above aspects.
第六方面,本申请实施例还提供一种芯片,芯片与存储器耦合,用于执行所述存储器中存储的计算机程序,使得所述电子设备执行上述任一方面的任意一种可能的设计的方法。According to a sixth aspect, an embodiment of the present application further provides a chip coupled to a memory, for executing a computer program stored in the memory, so that the electronic device performs any possible design method of any of the above aspects .
本申请的这些方面或其他方面在以下实施例的描述中会更加简明易懂。These or other aspects of the present application will be more concise and understandable in the description of the following embodiments.
附图说明BRIEF DESCRIPTION
图1为本申请实施例提供的一种应用场景示意图;FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application;
图2为本申请实施例提供的一种神经网络模型的结构示意图;2 is a schematic structural diagram of a neural network model provided by an embodiment of the present application;
图3为本申请实施例提供的一种神经网络模型训练方法的流程示意图;3 is a schematic flowchart of a neural network model training method provided by an embodiment of the present application;
图4为本申请实施例提供的另一种神经网络模型训练方法的流程示意图;4 is a schematic flowchart of another neural network model training method provided by an embodiment of the present application;
图5为本申请实施例提供的一种训练样本和训练样本的文字识别结果示意图;5 is a schematic diagram of a training sample and a text recognition result of a training sample provided by an embodiment of the present application;
图6为本申请实施例提供的一种手机结构示意图;6 is a schematic structural diagram of a mobile phone provided by an embodiment of the present application;
图7为本申请实施例提供的一种神经网络模型训练装置结构示意图;7 is a schematic structural diagram of a neural network model training device provided by an embodiment of the present application;
图8为本申请实施例提供的一种电子设备结构示意图。FIG. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
具体实施方式detailed description
本申请实施例提供一种神经网络模型训练方法,该方法可以将目前服务器侧的商用的神经网络模型作为参考神经网络模型,也就是说电子设备可以先调用服务器侧的商用的神经网络模型对n个训练样本进行处理,得到与每个训练样本对应的第一文本识别结果,然后再将该n个训练样本输入到待训练的初始神经网络模型中进行处理,得到每个训练样本对应的第二文本识别结果,最终电子设备根据该第一文本识别结果、第二文本识别结果,以及n个训练样本中的部分训练样本的人工标注结果,对该初始神经网络模型中的参数进行调整,得到目标神经网络模型。The embodiment of the present application provides a neural network model training method, which can use the current commercial neural network model on the server side as a reference neural network model, that is to say, the electronic device can first call the commercial neural network model pair n on the server side Processing training samples to obtain the first text recognition result corresponding to each training sample, and then input the n training samples into the initial neural network model to be trained for processing, to obtain the second corresponding to each training sample The text recognition result, the final electronic device adjusts the parameters in the initial neural network model according to the first text recognition result, the second text recognition result, and the manual labeling results of some training samples in the n training samples to obtain the target Neural network model.
可见,在上述神经网络模型训练中,因可以通过调用参考神经网络模型得到训练样本的识别结果,所以只需要对部分训练样本进行人工标注,一定程度上可以减少人工标注的工作量,节省训练成本。当电子设备集成该目标神经网络模型时,用户无需联网仅通过该电子设备就可以对图像中的文本区域。文本内容进行快速识别,一定程度上可以保护用户的隐私,安全性较高。It can be seen that in the above neural network model training, because the recognition result of the training sample can be obtained by calling the reference neural network model, only a part of the training sample needs to be manually labeled, which can reduce the workload of manual labeling to a certain extent and save training costs . When the electronic device integrates the target neural network model, the user can view the text area in the image only through the electronic device without networking. The rapid recognition of the text content can protect the user's privacy to a certain extent, and the security is high.
本申请实施例中所提供的神经网络模型训练方法可以应用于如图1所示的应用场景,该应用场景中包括开发服务器101、服务器102、终端设备103。The neural network model training method provided in the embodiments of the present application can be applied to the application scenario shown in FIG. 1, where the application scenario includes a development server 101, a server 102, and a terminal device 103.
其中,开发服务器101,用于根据终端设备103当前运行环境的配置信息和人工智能芯片的硬件信息,确定所述初始神经网络模型。The development server 101 is used to determine the initial neural network model according to the configuration information of the current operating environment of the terminal device 103 and the hardware information of the artificial intelligence chip.
例如,终端设备103是一台Mate 10型号的手机,该手机搭载麒麟970芯片,仅支持卷积和relu等激活函数,因此开发服务器101输出适用该手机的初始神经网络模型。例如初始神经网络模型的主干结构为一个15层全卷积网络,最后是1/8维度的特征图。For example, the terminal device 103 is a Mate 10 model mobile phone equipped with a Kirin 970 chip, which only supports activation functions such as convolution and relu, so the development server 101 outputs an initial neural network model suitable for the mobile phone. For example, the backbone structure of the initial neural network model is a 15-layer fully convolutional network, and finally a 1/8-dimensional feature map.
另外,开发服务器101还预先集成有n个训练样本,开发服务器101,还用于调用服务器102中的参考神经网络模型的接口,将n个训练样本传输到服务器102的参考神经网络模型中,并获取该参考神经网络模型对每个训练样本的第一文本识别结果。其中,参考神经网络模型可以是多个,因此开发服务器可以获取多个服务器102返回的每个训练样本的第一文本识别结果,最终生成与每个训练样本所对应的所有参考神经网络模型的第一文本识别结果的数据集合。In addition, the development server 101 is also pre-integrated with n training samples. The development server 101 is also used to call the interface of the reference neural network model in the server 102 to transfer n training samples to the reference neural network model of the server 102, and Obtain the first text recognition result of the reference neural network model for each training sample. Among them, there can be multiple reference neural network models, so the development server can obtain the first text recognition result of each training sample returned by multiple servers 102, and finally generate the first of all reference neural network models corresponding to each training sample. A data collection of text recognition results.
进一步地,开发服务器101,用于将n个训练样本输入到初始神经网络模型进行处理, 得到与每个训练样本对应的第二文本识别结果,然后根据该第一文本识别结果、第二文本识别结果,以及该n个训练样本中的部分训练样本的人工标注结果,对该初始神经网络模型中的参数进行调整,得到目标神经网络模型。通常情况下,在开始训练过程之前初始神经网络模型所设置的参数为超参数,即不是通过训练得到的参数数据。训练过程会对超参数进行优化,训练过程本质是给模型选择一组最优参数,以提高学习的性能和效果。Further, the development server 101 is used to input n training samples into the initial neural network model for processing to obtain a second text recognition result corresponding to each training sample, and then recognize the second text according to the first text recognition result and the second text recognition result. The results, as well as the manual labeling results of some of the n training samples, adjust the parameters in the initial neural network model to obtain the target neural network model. Normally, the parameters set by the initial neural network model before starting the training process are hyperparameters, that is, parameter data not obtained through training. The training process will optimize the hyperparameters. The essence of the training process is to select a set of optimal parameters for the model to improve the performance and effect of learning.
最终,开发服务器101将生成的目标神经网络模型安装到终端设备103中,用户无需联网仅通过该终端设备103就可以对图像中的文本区域。文本内容进行快速识别。例如终端设备103中被安装集成目标神经网络模型的应用程序,在需要识别文本时,用户在应用程序中导入包含待识别文本的图像,其中图像可以是身份证的照片、PDF格式的文档等、商品购买信息截图等,然后终端设备103可以直接对图像中的文本进行识别。示例性地,身份证的照片的识别结果为:姓名:王丽,性别:女,民族:汉,出生:1988-08-11,住址:上海市陆家嘴路353弄1号楼1室,身份证号码370405198805XXXXXX。Finally, the development server 101 installs the generated target neural network model into the terminal device 103, and the user can view the text area in the image only through the terminal device 103 without network connection. Quick recognition of text content. For example, the terminal device 103 is installed with an application that integrates the target neural network model. When text recognition is required, the user imports an image containing the text to be recognized in the application, where the image can be a photo of an ID card, a PDF format document, etc. Screenshots of product purchase information, etc., and then the terminal device 103 can directly recognize the text in the image. Exemplarily, the identification result of the photo of the ID card is: Name: Wang Li, Gender: Female, Ethnicity: Han, Birth: 1988-08-11, Address: Room 1, Building 1, Lane 353, Lujiazui Road, Shanghai, ID The number 370405198805XXXXXX.
需要说明的是,开发服务器101将n个训练样本输入到参考神经网络模型,以及将n个训练样本输入到初始神经网络模型,这两个步骤并不存在绝对的先后顺序,可以先执行将n个训练样本输入到参考神经网络模型,也可以先执行将n个训练样本输入到初始神经网络模型,或者同时执行。It should be noted that the development server 101 inputs n training samples to the reference neural network model, and inputs n training samples to the initial neural network model. There is no absolute order of these two steps, you can first execute n One training sample is input to the reference neural network model, or n training samples may be input to the initial neural network model first, or simultaneously.
其中,开发服务器101和服务器102通过无线网络连接,终端设备103是具备网络通信能力的终端设备,该终端设备可以是智能手机、平板电脑或便携式个人计算机等。服务器102可以是一台服务器,或者是若干台服务器组成的服务器集群或云计算中心。Among them, the development server 101 and the server 102 are connected through a wireless network, and the terminal device 103 is a terminal device capable of network communication. The terminal device may be a smart phone, a tablet computer, or a portable personal computer. The server 102 may be a server, or a server cluster or cloud computing center composed of several servers.
如图2所示,示例性地示出了初始神经网络模型的一种神经网络结构图,包括卷积层、池化层、上采样层。卷积层用于从输入的图像中提取图像特征,池化层用于用于特征降维,压缩数据和参数的数量,减小过拟合,同时提高模型的容错性。上采样层目的是放大原图像,从而可以显示在更高分辨率的显示设备上。最终初始神经网络模型对n个训练样本进行分类任务和回归任务的学习,根据所述多个任务的学习结果,得到每个训练样本的第二文本识别结果,所述第二文本识别结果包括回归任务对应的文本边框的位置,以及分类任务对应的每个像素点是否是文本。As shown in FIG. 2, a neural network structure diagram of an initial neural network model is exemplarily shown, including a convolution layer, a pooling layer, and an up-sampling layer. The convolutional layer is used to extract image features from the input image, and the pooling layer is used for feature dimensionality reduction, compressing the number of data and parameters, reducing overfitting, and improving the model's fault tolerance. The purpose of the upsampling layer is to enlarge the original image so that it can be displayed on a higher resolution display device. Finally, the initial neural network model performs classification task and regression task learning on n training samples. According to the learning results of the multiple tasks, a second text recognition result for each training sample is obtained, and the second text recognition result includes regression The position of the text border corresponding to the task, and whether each pixel corresponding to the classification task is text.
需要说明的是,初始神经网络模型中的卷积层可以为一层或多层、池化层和上采样层也可以为一层或多层。It should be noted that the convolutional layer in the initial neural network model can be one or more layers, and the pooling layer and the upsampling layer can also be one or more layers.
基于图1所示的应用场景图和图2所示的神经网络结构图,本申请实施例提供了一种神经网络模型训练方法的流程,如图3所示,该方法的流程可以由电子设备执行,该方法包括以下步骤:Based on the application scenario diagram shown in FIG. 1 and the neural network structure diagram shown in FIG. 2, an embodiment of the present application provides a flow of a method for training a neural network model. As shown in FIG. 3, the flow of this method may be performed by an electronic device To execute, the method includes the following steps:
步骤301,电子设备将n个训练样本输入到参考神经网络模型进行处理,得到与每个训练样本对应的第一文本识别结果,n为正整数。Step 301: The electronic device inputs n training samples to the reference neural network model for processing, to obtain a first text recognition result corresponding to each training sample, where n is a positive integer.
该电子设备可以是图1中的开发服务器101,具体来说,开发人员可以预先在开发服务器101上登录参考神经网络模型的开发平台,并获得调用该参考神经网络模型的接口的权限。然后开发服务器101通过调用该接口,将n个训练样本输入到参考神经网络模型中,并接收该参考神经网络模型输出的与每个训练样本对应的第一文本识别结果。本申请实施例中,参考神经网络模型可以是目前已经商用的用于OCR的神经网络模型,例如汉王
Figure PCTCN2019119815-appb-000001
百度
Figure PCTCN2019119815-appb-000002
等。最终电子设备生成与每个训练样本所对应的所有参考神经网络模型的第一文本识别结果的数据集合。
The electronic device may be the development server 101 in FIG. 1. Specifically, a developer may log in to the development platform of the reference neural network model on the development server 101 in advance, and obtain permission to call the interface of the reference neural network model. Then, by calling the interface, the development server 101 inputs n training samples into the reference neural network model, and receives the first text recognition result corresponding to each training sample output by the reference neural network model. In the embodiment of the present application, the reference neural network model may be a currently used neural network model for OCR, such as Hanwang
Figure PCTCN2019119815-appb-000001
Baidu
Figure PCTCN2019119815-appb-000002
Wait. Finally, the electronic device generates a data set of the first text recognition results of all reference neural network models corresponding to each training sample.
步骤302,电子设备将该n个训练样本输入到初始神经网络模型进行处理,得到与每个 训练样本对应的第二文本识别结果。Step 302: The electronic device inputs the n training samples to the initial neural network model for processing, and obtains a second text recognition result corresponding to each training sample.
首先,在执行步骤302之前,电子设备需要根据电子设备当前运行环境的配置信息和人工智能芯片的硬件信息,生成待训练的初始神经网络模型。例如,开发服务器101根据终端设备103当前运行环境的配置信息和人工智能芯片的硬件信息,生成待训练的初始神经网络模型。然后开服务器101将该n个训练样本输入到初始神经网络模型进行处理,具体地,开服务器101可以将n个训练样本输入到如图2所示的神经网络模型中,并执行分类任务和回归任务得到与每个训练样本对应的第二文本识别结果。First, before performing step 302, the electronic device needs to generate an initial neural network model to be trained according to the configuration information of the current operating environment of the electronic device and the hardware information of the artificial intelligence chip. For example, the development server 101 generates an initial neural network model to be trained according to the configuration information of the current operating environment of the terminal device 103 and the hardware information of the artificial intelligence chip. Then the server 101 inputs the n training samples into the initial neural network model for processing. Specifically, the server 101 can input n training samples into the neural network model shown in FIG. 2 and perform classification tasks and regression The task obtains the second text recognition result corresponding to each training sample.
步骤303,电子设备根据该第一文本识别结果、该第二文本识别结果以及该n个训练样本中的部分训练样本的人工标注结果,对该初始神经网络模型中的参数进行调整,得到目标神经网络模型。Step 303: The electronic device adjusts the parameters in the initial neural network model according to the first text recognition result, the second text recognition result, and the manual labeling result of some training samples in the n training samples to obtain the target nerve Network model.
在执行步骤303之前,开发人员还预先对n个训练样本中的部分训练样本进行人工标注,得到人工标注果。例如该人工标注结果可以包括文本区域的位置,以及像素点是否是文本。Before performing step 303, the developer also pre-manually labels some of the n training samples to obtain artificial labeling results. For example, the manual annotation result may include the position of the text area, and whether the pixel is text.
一方面,针对被人工标注的第一部分训练样本中的任意一个训练样本,假设任意一个训练样本是第一训练样本,则电子设备根据第一训练样本的第二文本识别结果和第一训练样本的人工标注结果,得到第一损失函数值,同时电子设备根据第一训练样本的第一文本识别结果和第一训练样本的第二文本识别结果,得到第二损失函数值,然后电子设备对第一损失函数值和第二损失函数值进行平滑处理,或者进行加权处理,得到处理后的第一损失函数值和第二损失函数值,继而电子设备根据处理后的第一损失函数值和处理后的第二损失函数值,调整该初始神经网络模型中的参数。调整之后的参数使得神经网络模型输出的第二识别结果与第一文本识别结果和人工标注结果尽量相似,例如第二识别结果与第一文本识别结果之间的第一相似度,以及第二识别结果与人工标注结果之间的第二相似度均大于设定阈值,目的是有效提高目标神经网络模型的预测精度。On the one hand, for any training sample in the manually labeled first part of the training samples, assuming that any training sample is the first training sample, the electronic device recognizes the second text recognition result of the first training sample and the Manually label the results to obtain the first loss function value, and at the same time, the electronic device obtains the second loss function value according to the first text recognition result of the first training sample and the second text recognition result of the first training sample, and then the electronic device The loss function value and the second loss function value are smoothed or weighted to obtain the processed first loss function value and the second loss function value, and then the electronic device according to the processed first loss function value and the processed The second loss function value adjusts the parameters in the initial neural network model. The adjusted parameters make the second recognition result output by the neural network model as similar as possible to the first text recognition result and the manual annotation result, for example, the first similarity between the second recognition result and the first text recognition result, and the second recognition The second similarity between the result and the manual annotation result is greater than the set threshold, the purpose is to effectively improve the prediction accuracy of the target neural network model.
举例来说,如表1所示,假设有3个训练样本,每个训练样本被输入到初始神经网络模型后,初始神经网络模型输出了与每个训练样本对应的预测值(即第二文本识别结果),另外,电子设备在开始训练初始神经网络模型之前也也获得了每个训练样本对应的参考值(即第一文本识别结果),以及这三个训练样本的真实值(即人工标注结果)。For example, as shown in Table 1, suppose there are 3 training samples, each training sample is input to the initial neural network model, the initial neural network model outputs the predicted value corresponding to each training sample (ie, the second text Recognition results), in addition, the electronic device also obtained the reference value (ie the first text recognition result) corresponding to each training sample and the true values of these three training samples (ie manual annotation) before starting to train the initial neural network model result).
表1Table 1
样本sample 预测值Predictive value 真实值actual value 参考值Reference
样本1Sample 1 1515 1111 1616
样本2Sample 2 1717 1212 1414
样本3Sample 3 1919 1313 1818
具体地,当电子设备将上述样本1的预测值和真实值输入到损失函数后,得到第一损失函数值,当电子设备将上述样本1的预测值和参考值输入到损失函数后,得到第二损失函数值,然后按照如下方式得到处理后的第一损失函数值和处理后的第二损失函数值。然后电子设备依据处理后的第一损失函数值和处理后的第二损失函数值,调整神经网络模型的参数。同样地,针对样本2,电子设备同样计算出与样本2对应的处理后的第一损失函数值和处理后的第二损失函数值,然后依据处理后的第一损失函数值和处理后的第二损失函数值,调整神经网络模型的参数,直至最后一个训练样本。Specifically, when the electronic device inputs the predicted value and the true value of the above sample 1 to the loss function, the first loss function value is obtained, and when the electronic device inputs the predicted value and the reference value of the above sample 1 to the loss function, the first The second loss function value, and then obtain the processed first loss function value and the processed second loss function value as follows. The electronic device then adjusts the parameters of the neural network model based on the processed first loss function value and the processed second loss function value. Similarly, for sample 2, the electronic device also calculates the processed first loss function value and the processed second loss function value corresponding to sample 2, and then based on the processed first loss function value and the processed first Two loss function values, adjust the parameters of the neural network model until the last training sample.
另一方面,针对除了第一部分训练样本之外的n个训练样本中的任意一个训练样本,假设任意一个训练样本是第二训练样本,则电子设备根据第二训练样本的第二文本识别结果和 第二训练样本的第一文本识别结果,得到损失函数值。对于分类任务,电子设备可以采用类Softmax损失函数,对于回归任务,电子设备可以采用Smooth L1for regression(回归平滑处理)并可结合KL散度(又称相对熵);最终通过梯度下降算法,更新初始神经网络模型中参数,使得两个文本识别结果之间的差异最小化。On the other hand, for any one of the n training samples except the first part of the training samples, assuming that any one of the training samples is the second training sample, the electronic device recognizes the result based on the second text of the second training sample and The first text recognition result of the second training sample obtains the loss function value. For classification tasks, electronic equipment can use Softmax-like loss functions, for regression tasks, electronic equipment can use Smooth L1 for regression (regression smoothing) and can be combined with KL divergence (also known as relative entropy); finally through gradient descent algorithm, update the initial The parameters in the neural network model minimize the difference between the two text recognition results.
本申请如下实施例中进一步结合图1和图4所示的流程对上述神经网络模型训练方法的具体过程进行详细阐述,该方法流程的具体流程可以包括:The following embodiments of the present application further elaborate on the specific process of the above neural network model training method in conjunction with the processes shown in FIGS. 1 and 4. The specific process of the method process may include:
步骤401,开发服务器101通过调用参考神经网络模型的接口,将n个训练样本输入到各个服务器102中的参考神经网络模型中,并接收各个服务器102的输出结果,输出结果是与每个训练样本对应的第一文本识别结果,电子设备生成与每个训练样本所对应的所有参考神经网络模型的第一文本识别结果的数据集合。 Step 401, the development server 101 inputs n training samples into the reference neural network model in each server 102 by calling the interface of the reference neural network model, and receives the output result of each server 102, the output result is related to each training sample Corresponding to the first text recognition result, the electronic device generates a data set of the first text recognition result of all reference neural network models corresponding to each training sample.
例如,n个训练样本均是包括待识别文本的图像,假设是2000张图像。第一文本识别结果则包括图像上的文字区域,以及文字内容。比如图5所示,电子设备将图5a所示的图像输入到参考神经网络模型中,输出为图5b所示的文字区域信息和文字内容。For example, n training samples are all images that include text to be recognized, assuming 2000 images. The first text recognition result includes the text area on the image and the text content. For example, as shown in FIG. 5, the electronic device inputs the image shown in FIG. 5a into the reference neural network model, and outputs the text area information and text content shown in FIG. 5b.
步骤402,电子设备根据自身运行环境的配置信息信息和人工智能芯片的硬件信息,确定适用的初始神经网络模型。Step 402: The electronic device determines the applicable initial neural network model according to the configuration information of its own operating environment and the hardware information of the artificial intelligence chip.
例如,本申请实施例中,开发人员将Mate 10系统信息,麒麟970芯片,输入到网络构建器中,该网络构建器根据输入的信息,自动输出初始神经网络模型。For example, in the embodiment of the present application, the developer inputs the Mate 10 system information and the Kirin 970 chip into the network builder, and the network builder automatically outputs the initial neural network model based on the input information.
例如,输出一个15层全卷积网络,最后是1/8维度的特征图,其中fc3层、fc4层分别在网络缩放尺寸为1/4和为1/8的特征图层上将特征图层进行提取,并分别在deconv45层、deconv5层后通过eltwise算子拼接回网络主干结构中。其中缩放尺寸为1/4和1/8的分支结构使得网络能够对较小和中等尺寸的文字进行检测,缩放尺寸为1/16的特征图层保证神经网络对尺寸较大的文本进行检测。在拼接完成的特征图层上,分别执行一个回归、一个分类任务;其中:分类任务:在于判别特征图层所映射的原始输入图像中,该区域是否为文本;回归任务:在于判别特征图层所映射的原始输入图像中,该区域的文本边界的四个角点。For example, output a 15-layer fully convolutional network, and finally a 1/8-dimensional feature map, in which the fc3 layer and fc4 layer are on the feature layer with the network scaling size of 1/4 and 1/8 respectively. It is extracted and spliced back to the network backbone structure by eltwise operator after deconv45 layer and deconv5 layer respectively. The branch structure with scaled sizes of 1/4 and 1/8 enables the network to detect small and medium-sized text, and the feature layer with scaled size of 1/16 ensures that the neural network detects large-sized text. Perform a regression and a classification task on the spliced feature layer; among them: classification task: determine whether the area is text in the original input image mapped by the feature layer; regression task: determine the feature layer In the mapped original input image, the four corner points of the text boundary of the area.
步骤403,电子设备将n个训练样本输入到初始神经网络模型中,得到与每个训练样本对应的第二文本识别结果。Step 403: The electronic device inputs n training samples into the initial neural network model to obtain a second text recognition result corresponding to each training sample.
步骤404,针对有人工标注结果的训练样本,根据第一文本识别结果和第二文本识别结果,以及人工标注结果,调整初始神经网络模型中的参数;针对没有人工标注结果的训练样本,根据第一文本识别结果和第二文本识别结果,调整初始神经网络模型中的参数。Step 404: For the training samples with artificial annotation results, adjust the parameters in the initial neural network model according to the first text recognition result and the second text recognition result, and the manual annotation results; for the training samples without artificial annotation results, according to the A text recognition result and a second text recognition result adjust the parameters in the initial neural network model.
步骤405,电子设备不断迭代执行上个步骤,直到满足设定条件,即步骤404输出的目标神经网络模型在验证数据集中的效果达标,或者达到指定迭代轮数阈值,则训练终止,输出目标神经网络模型。In step 405, the electronic device continuously performs the previous step iteratively until the set condition is met, that is, the effect of the target neural network model output in step 404 in the verification data set reaches the standard, or the threshold of the specified iteration number is reached, then the training is terminated and the target nerve is output Network model.
在本申请一些实施例中,电子设备可以是还包含其他功能诸如个人数字助理和/或音乐播放器功能的便携式电子设备,诸如手机、平板电脑、具备无线通讯功能的可穿戴设备(如智能手表)等。便携式电子设备的示例性实施例包括但不限于搭载
Figure PCTCN2019119815-appb-000003
或者其他操作系统的便携式电子设备。上述便携式电子设备也可以是其他便携式电子设备,诸如具有触敏表面(例如触控面板)的膝上型计算机(laptop)等。还应当理解的是,在本申请其他一些实施例中,上述电子设备也可以不是便携式电子设备,而是具有触敏表面(例如触控面板)的台式计算机。
In some embodiments of the present application, the electronic device may be a portable electronic device that also includes other functions such as personal digital assistant and/or music player functions, such as a mobile phone, a tablet computer, and a wearable device with wireless communication function (such as a smart watch) )Wait. Exemplary embodiments of portable electronic devices include, but are not limited to
Figure PCTCN2019119815-appb-000003
Or portable electronic devices of other operating systems. The above portable electronic device may also be other portable electronic devices, such as a laptop with a touch-sensitive surface (for example, a touch panel) and the like. It should also be understood that, in some other embodiments of the present application, the electronic device may not be a portable electronic device, but a desktop computer with a touch-sensitive surface (such as a touch panel).
以电子设备是手机为例,图6示出了手机100的结构示意图。Taking the electronic device as a mobile phone as an example, FIG. 6 shows a schematic structural diagram of the mobile phone 100.
手机100可以包括处理器110,外部存储器接口120,内部存储器121,USB接口130, 充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及SIM卡接口195等。The mobile phone 100 may include a processor 110, an external memory interface 120, an internal memory 121, a USB interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, The audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone jack 170D, the sensor module 180, the key 190, the motor 191, the indicator 192, the camera 193, the display screen 194, and the SIM card interface 195, etc.
可以理解的是,本申请实施例示意的结构并不构成对手机100的具体限定。在本申请另一些实施例中,手机100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the mobile phone 100. In other embodiments of the present application, the mobile phone 100 may include more or fewer components than shown, or combine some components, or split some components, or arrange different components. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(Neural-network Processing Unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。其中,控制器可以是手机100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), and an image signal processor (image)signal processor (ISP), controller, memory, video codec, digital signal processor (DSP), baseband processor, and/or neural network processor (Neural-network Processing Unit, NPU) Wait. Among them, different processing units may be independent devices, or may be integrated in one or more processors. The controller may be the nerve center and command center of the mobile phone 100. The controller can generate the operation control signal according to the instruction operation code and the timing signal to complete the control of fetching instructions and executing instructions.
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。The processor 110 may also be provided with a memory for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may store instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Avoid repeated access, reduce the waiting time of the processor 110, thus improving the efficiency of the system.
处理器110可以运行本申请实施例提供的神经网络模型训练方法,利用参考神经网络模型输出的第一文本识别结果,以及初始神经网络模型输出的第二文本识别结果,和n个训练样本中的第一部分训练样本的人工标注结果,对初始神经网络模型中的参数进行调整,得到目标神经网络模型。还训练方法可以由通用处理器执行,也可以由专用处理器执行,也可以由通用处理器和专用处理器一起执行。例如当处理器110集成不同的器件,比如集成CPU和NPU时,CPU和NPU可以配合执行本申请实施例提供的神经网络模型训练方法,比如神经网络模型训练方法中部分算法由CPU执行,另一部分算法由NPU执行,以得到较快的处理效率。The processor 110 may run the neural network model training method provided in the embodiment of the present application, using the first text recognition result output by the reference neural network model, and the second text recognition result output by the initial neural network model, and n training samples The first part of the manual labeling results of the training samples adjusts the parameters in the initial neural network model to obtain the target neural network model. The training method may be executed by a general-purpose processor, a dedicated processor, or a general-purpose processor and a dedicated processor. For example, when the processor 110 integrates different devices, such as integrating a CPU and an NPU, the CPU and the NPU may cooperate to execute the neural network model training method provided in the embodiments of the present application, for example, some algorithms in the neural network model training method are executed by the CPU and The algorithm is executed by the NPU to obtain faster processing efficiency.
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,手机100可以包括1个或N个显示屏194,N为大于1的正整数。The display screen 194 is used to display images, videos and the like. The display screen 194 includes a display panel. The display panel may use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light-emitting diode or an active matrix organic light-emitting diode (active-matrix organic light) emitting diode, AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diode (QLED), etc. In some embodiments, the mobile phone 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
显示屏194可以显示被目标神经网络模型所识别的文本,显示屏194也可以显示待训练样本。The display screen 194 may display text recognized by the target neural network model, and the display screen 194 may also display samples to be trained.
摄像头193(前置摄像头或者后置摄像头)用于捕获静态图像或视频。通常,摄像头193可以包括感光元件比如镜头组和图像传感器,其中,镜头组包括多个透镜(凸透镜或凹透镜),用于采集待拍摄物体反射的光信号,并将采集的光信号传递给图像传感器。图像传感器根据所述光信号生成待拍摄物体的原始图像。摄像头193采集到原始图像后,可以将原始图像发送给处理器110,处理器110将其作为训练样本,运行本申请实施例提供的神经网络模型训练算法,得到识别结果。The camera 193 (front camera or rear camera) is used to capture still images or video. Generally, the camera 193 may include a photosensitive element such as a lens group and an image sensor, where the lens group includes a plurality of lenses (convex lens or concave lens) for collecting the light signal reflected by the object to be photographed and transmitting the collected light signal to the image sensor . The image sensor generates an original image of the object to be captured according to the light signal. After the camera 193 collects the original image, the original image may be sent to the processor 110, and the processor 110 uses it as a training sample, and runs the neural network model training algorithm provided by the embodiment of the present application to obtain a recognition result.
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。 处理器110通过运行存储在内部存储器121的指令,从而执行手机100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,应用程序(比如相机应用,微信应用等)的代码等。存储数据区可存储手机100使用过程中所创建的数据(比如n个训练样本和文本识别结果)等。The internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions. The processor 110 executes instructions stored in the internal memory 121 to execute various functional applications and data processing of the mobile phone 100. The internal memory 121 may include a storage program area and a storage data area. Among them, the storage program area may store codes of the operating system and application programs (such as camera applications, WeChat applications, etc.). The storage data area may store data created during the use of the mobile phone 100 (such as n training samples and text recognition results).
内部存储器121还可以存储本申请实施例提供的训练算法对应的代码。当内部存储器121中存储的训练算法的代码被处理器110运行时,可以对初始神经网络模型进行训练。The internal memory 121 may also store the code corresponding to the training algorithm provided by the embodiment of the present application. When the code of the training algorithm stored in the internal memory 121 is executed by the processor 110, the initial neural network model can be trained.
此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。In addition, the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and so on.
手机100还包括传感器模块180的功能。例如包括陀螺仪传感器、加速度传感器、接近光传感器等,当然,手机100还可以包括其它传感器,比如压力传感器、加速度传感器、陀螺仪传感器、环境光传感器、骨传导传感器等(图中未示出)。The mobile phone 100 also includes the function of the sensor module 180. For example, it includes a gyro sensor, an acceleration sensor, a proximity light sensor, etc. Of course, the mobile phone 100 may also include other sensors, such as a pressure sensor, an acceleration sensor, a gyro sensor, an ambient light sensor, a bone conduction sensor, etc. (not shown) .
手机100的无线通信功能可以通过天线1,天线2,移动通信模块151,无线通信模块152,调制解调处理器以及基带处理器等实现。另外,手机100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。手机100可以接收按键190输入,产生与手机100的用户设置以及功能控制有关的键信号输入。手机100可以利用马达191产生振动提示(比如来电振动提示)。手机100中的指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。手机100中的SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和手机100的接触和分离。The wireless communication function of the mobile phone 100 can be realized through the antenna 1, the antenna 2, the mobile communication module 151, the wireless communication module 152, the modem processor, and the baseband processor. In addition, the mobile phone 100 can realize audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, a headphone interface 170D, and an application processor. For example, music playback, recording, etc. The mobile phone 100 may receive key 190 input and generate key signal input related to user settings and function control of the mobile phone 100. The mobile phone 100 can use the motor 191 to generate vibration prompts (such as incoming call vibration prompts). The indicator 192 in the mobile phone 100 can be an indicator light, which can be used to indicate the charging state, the power change, and can also be used to indicate messages, missed calls, notifications, and the like. The SIM card interface 195 in the mobile phone 100 is used to connect a SIM card. The SIM card can be inserted into or removed from the SIM card interface 195 to achieve contact and separation with the mobile phone 100.
应理解,在实际应用中,手机100可以包括比图6所示的更多或更少的部件,本申请实施例不作限定。It should be understood that in practical applications, the mobile phone 100 may include more or fewer components than those shown in FIG. 6, which is not limited in the embodiments of the present application.
本申请实施例中还提供一种计算机可读存储介质,所述计算机可读存储介质包括计算机程序,当计算机程序在电子设备上运行时,使得所述电子设备执行上述神经网络模型训练方法任意一种可能的实现。An embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium includes a computer program. When the computer program runs on an electronic device, the electronic device is caused to perform any of the above neural network model training methods. Possible implementation.
本申请实施例还提供一种包含计算机程序产品,当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行上述神经网络模型训练方法任意一种可能的实现。An embodiment of the present application further provides a computer program product that, when the computer program product runs on an electronic device, causes the electronic device to perform any possible implementation of the above neural network model training method.
在本申请的一些实施例中,本申请实施例公开了一种神经网络模型训练装置,如图7所示,该装置用于实现以上各个方法实施例中记载的方法,其包括:收发模块701、处理模块702。收发模块701用于支持电子设备执行将n个训练样本输入到参考神经网络模型,以及接收参考神经网络模型的输出结果,处理模块702用于支持电子设备将所述n个训练样本输入到初始神经网络模型进行处理,得到与每个训练样本对应的第二文本识别结果,根据所述第一文本识别结果、所述第二文本识别结果,以及所述n个训练样本中的部分训练样本的人工标注结果,对所述初始神经网络模型中的参数进行调整,得到目标神经网络模型。上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。In some embodiments of the present application, the embodiments of the present application disclose a neural network model training device. As shown in FIG. 7, the device is used to implement the methods described in the above method embodiments, which includes: a transceiver module 701 、Processing module 702. The transceiver module 701 is used to support the electronic device to input n training samples to the reference neural network model and receive the output result of the reference neural network model, and the processing module 702 is used to support the electronic device to input the n training samples to the initial nerve The network model performs processing to obtain a second text recognition result corresponding to each training sample, based on the first text recognition result, the second text recognition result, and the manual training of some of the n training samples Mark the results, adjust the parameters in the initial neural network model to obtain the target neural network model. All relevant content of each step involved in the above method embodiments can be referred to the function description of the corresponding function module, which will not be repeated here.
在本申请的另一些实施例中,本申请实施例公开了一种电子设备,如图8所示,该电子设备可以包括:一个或多个处理器801;存储器802;显示器803;一个或多个应用程序(未示出);以及一个或多个计算机程序804,上述各器件可以通过一个或多个通信总线805连接。其中该一个或多个计算机程序804被存储在上述存储器802中并被配置为被该一个或多个处理器801执行,该一个或多个计算机程序804包括指令,上述指令可以用于执行如图3和图4及相应实施例中的各个步骤。In other embodiments of the present application, the embodiments of the present application disclose an electronic device. As shown in FIG. 8, the electronic device may include: one or more processors 801; a memory 802; a display 803; one or more Application programs (not shown); and one or more computer programs 804, the above devices can be connected through one or more communication buses 805. The one or more computer programs 804 are stored in the above-mentioned memory 802 and configured to be executed by the one or more processors 801. The one or more computer programs 804 include instructions, and the above-mentioned instructions may be used to execute 3 and FIG. 4 and the steps in the corresponding embodiment.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和 简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Through the description of the above embodiments, those skilled in the art can clearly understand that, for convenience and conciseness of description, only the above-mentioned division of each functional module is used as an example for illustration. In practical applications, the above-mentioned functions can be allocated as needed It is completed by different functional modules, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. For the specific working processes of the system, device and unit described above, reference may be made to the corresponding processes in the foregoing method embodiments, which will not be repeated here.
在本申请实施例各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。The functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The above integrated unit may be implemented in the form of hardware or software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:快闪存储器、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on such an understanding, the technical solutions of the embodiments of the present application may essentially be part of or contribute to the existing technology, or all or part of the technical solutions may be embodied in the form of software products, and the computer software products are stored in a storage The medium includes several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or processor to perform all or part of the steps of the methods described in the embodiments of the present application. The foregoing storage media include: flash memory, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other media that can store program codes.
以上所述,仅为本申请实施例的具体实施方式,但本申请实施例的保护范围并不局限于此,任何在本申请实施例揭露的技术范围内的变化或替换,都应涵盖在本申请实施例的保护范围之内。因此,本申请实施例的保护范围应以所述权利要求的保护范围为准。The above is only the specific implementation of the embodiments of the present application, but the scope of protection of the embodiments of the present application is not limited to this, any changes or replacements within the technical scope disclosed in the embodiments of the present application should be covered in this Within the scope of protection of the application examples. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.

Claims (14)

  1. 一种神经网络模型训练方法,该方法应用于电子设备,其特征在于,包括:A neural network model training method, which is applied to electronic equipment, and is characterized by including:
    将n个训练样本输入到参考神经网络模型进行处理,得到与每个训练样本对应的第一文本识别结果,n为正整数;Input n training samples to the reference neural network model for processing to obtain the first text recognition result corresponding to each training sample, n is a positive integer;
    将所述n个训练样本输入到初始神经网络模型进行处理,得到与每个训练样本对应的第二文本识别结果;Input the n training samples into an initial neural network model for processing to obtain a second text recognition result corresponding to each training sample;
    根据所述第一文本识别结果、所述第二文本识别结果,以及所述n个训练样本中的第一部分训练样本的人工标注结果,对所述初始神经网络模型中的参数进行调整,得到目标神经网络模型。Adjust the parameters in the initial neural network model according to the first text recognition result, the second text recognition result, and the manual labeling result of the first part of the n training samples to obtain the target Neural network model.
  2. 如权利要求1所述的方法,其特征在于,所述电子设备根据所述第一文本识别结果、所述第二文本识别结果,以及所述n个训练样本中的第一部分训练样本的人工标注结果,对所述初始神经网络模型中的参数进行调整,包括:The method of claim 1, wherein the electronic device is based on the first text recognition result, the second text recognition result, and the manual labeling of the first part of the n training samples As a result, the parameters in the initial neural network model are adjusted, including:
    针对第一训练样本,根据所述第一训练样本的第二文本识别结果和所述第一训练样本的人工标注结果,得到第一损失函数值,其中,所述第一训练样本为所述第一部分训练样本中的任意一个;For the first training sample, a first loss function value is obtained according to the second text recognition result of the first training sample and the manual labeling result of the first training sample, where the first training sample is the first Any one of a part of training samples;
    根据所述第一训练样本的第一文本识别结果和所述第一训练样本的第二文本识别结果,得到第二损失函数值;Obtaining a second loss function value according to the first text recognition result of the first training sample and the second text recognition result of the first training sample;
    对所述第一损失函数值和所述第二损失函数值进行平滑处理或加权处理,得到处理后的第一损失函数值和第二损失函数值;Smoothing or weighting the first loss function value and the second loss function value to obtain the processed first loss function value and the second loss function value;
    根据处理后的第一损失函数值和处理后的第二损失函数值,调整所述初始神经网络模型中的参数。Adjust the parameters in the initial neural network model according to the processed first loss function value and the processed second loss function value.
  3. 如权利要求1所述的方法,其特征在于,所述电子设备根据所述第一文本识别结果、所述第二文本识别结果,以及所述n个训练样本中的部分训练样本的人工标注结果,对所述初始神经网络模型中的参数进行调整,包括:The method according to claim 1, wherein the electronic device is based on the first text recognition result, the second text recognition result, and a manual labeling result of part of the n training samples To adjust the parameters in the initial neural network model, including:
    针对第二训练样本,根据所述第二训练样本的第一文本识别结果和所述第二训练样本的第二文本识别结果,利用随机梯度下降算法对所述初始神经网络模型中的参数进行调整,其中,所述第二训练样本为除了所述第一部分训练样本之外所述n个训练样本中的任意一个。For the second training sample, according to the first text recognition result of the second training sample and the second text recognition result of the second training sample, a parameter in the initial neural network model is adjusted using a stochastic gradient descent algorithm , Where the second training sample is any one of the n training samples except the first part of the training samples.
  4. 如权利要求1至3任一项所述的方法,其特征在于,所述电子设备将所述n个训练样本输入到初始神经网络模型进行处理,得到与每个训练样本对应的第二文本识别结果,包括:The method according to any one of claims 1 to 3, wherein the electronic device inputs the n training samples to an initial neural network model for processing to obtain a second text recognition corresponding to each training sample The results include:
    将所述n个训练样本输入到初始神经网络模型进行多个任务学习,所述多个任务包括分类任务和回归任务;Input the n training samples into the initial neural network model to perform multiple task learning, the multiple tasks include classification tasks and regression tasks;
    根据所述多个任务的学习结果,得到每个训练样本的第二文本识别结果,所述第二文本识别结果包括回归任务对应的文本边框的位置,以及分类任务对应的每个像素点是否是文本。According to the learning results of the multiple tasks, a second text recognition result of each training sample is obtained, the second text recognition result includes the position of the text frame corresponding to the regression task, and whether each pixel corresponding to the classification task is text.
  5. 如权利要求1至4任一项所述的方法,其特征在于,所述电子设备将n个训练样本输入到参考神经网络模型进行处理,得到与每个训练样本对应的第一文本识别结果,包括:The method according to any one of claims 1 to 4, wherein the electronic device inputs n training samples to a reference neural network model for processing to obtain a first text recognition result corresponding to each training sample, include:
    通过调用参考神经网络模型的开放接口,将所述n个训练样本输入所述参考神经网络模型中;Input the n training samples into the reference neural network model by calling the open interface of the reference neural network model;
    接收所述参考神经网络模型输出的与每个训练样本对应的第一文本识别结果。Receiving the first text recognition result corresponding to each training sample output by the reference neural network model.
  6. 如权利要求1至4任一项所述的方法,其特征在于,还包括:The method according to any one of claims 1 to 4, further comprising:
    根据所述电子设备当前运行环境的配置信息和人工智能芯片的硬件信息,确定所述初始神经网络模型,其中,所述初始神经网络模型的结构、参数和大小能够被所述电子设备支持。The initial neural network model is determined according to the configuration information of the current operating environment of the electronic device and the hardware information of the artificial intelligence chip, wherein the structure, parameters, and size of the initial neural network model can be supported by the electronic device.
  7. 一种电子设备,其特征在于,包括处理器和存储器;An electronic device, characterized in that it includes a processor and a memory;
    所述存储器用于存储一个或多个计算机程序;The memory is used to store one or more computer programs;
    当所述存储器存储的一个或多个计算机程序被所述处理器执行时,使得所述电子设备执行:When one or more computer programs stored in the memory are executed by the processor, the electronic device is caused to execute:
    将n个训练样本输入到参考神经网络模型进行处理,得到与每个训练样本对应的第一文本识别结果;Input n training samples to the reference neural network model for processing to obtain the first text recognition result corresponding to each training sample;
    将所述n个训练样本输入到初始神经网络模型进行处理,得到与每个训练样本对应的第二文本识别结果;Input the n training samples into an initial neural network model for processing to obtain a second text recognition result corresponding to each training sample;
    根据所述第一文本识别结果、所述第二文本识别结果,以及所述n个训练样本中的第一部分训练样本的人工标注结果,对所述初始神经网络模型中的参数进行调整,得到目标神经网络模型。Adjust the parameters in the initial neural network model according to the first text recognition result, the second text recognition result, and the manual labeling result of the first part of the n training samples to obtain the target Neural network model.
  8. 如权利要求7所述的电子设备,其特征在于,当所述存储器存储的一个或多个计算机程序被所述处理器执行时,还使得所述电子设备执行:The electronic device according to claim 7, wherein when the one or more computer programs stored in the memory are executed by the processor, the electronic device is also caused to execute:
    针对第一训练样本,根据所述第一训练样本的第二文本识别结果和所述第一训练样本的人工标注结果,得到第一损失函数值,其中,所述第一训练样本为所述第一部分训练样本中的任意一个;For the first training sample, a first loss function value is obtained according to the second text recognition result of the first training sample and the manual labeling result of the first training sample, where the first training sample is the first Any one of a part of training samples;
    根据所述第一训练样本的第一文本识别结果和所述第一训练样本的第二文本识别结果,得到第二损失函数值;Obtaining a second loss function value according to the first text recognition result of the first training sample and the second text recognition result of the first training sample;
    对所述第一损失函数值和所述第二损失函数值进行平滑处理或加权处理,得到处理后的第一损失函数值和第二损失函数值;Smoothing or weighting the first loss function value and the second loss function value to obtain the processed first loss function value and the second loss function value;
    根据处理后的第一损失函数值和处理后的第二损失函数值,调整所述初始神经网络模型中的参数。Adjust the parameters in the initial neural network model according to the processed first loss function value and the processed second loss function value.
  9. 如权利要求7所述的电子设备,其特征在于,当所述存储器存储的一个或多个计算机程序被所述处理器执行时,还使得所述电子设备执行:The electronic device according to claim 7, wherein when the one or more computer programs stored in the memory are executed by the processor, the electronic device is also caused to execute:
    针对第二训练样本,根据所述第二训练样本的第一文本识别结果和所述第二训练样本的第二文本识别结果,利用随机梯度下降算法对所述初始神经网络模型中的参数进行调整,其中,所述第二训练样本为除了所述第一部分训练样本之外所述n个训练样本中的任意一个。For the second training sample, according to the first text recognition result of the second training sample and the second text recognition result of the second training sample, a parameter in the initial neural network model is adjusted using a stochastic gradient descent algorithm , Where the second training sample is any one of the n training samples except the first part of the training samples.
  10. 如权利要求7至9任一项所述的电子设备,其特征在于,当所述存储器存储的一个或多个计算机程序被所述处理器执行时,还使得所述电子设备执行:The electronic device according to any one of claims 7 to 9, wherein when one or more computer programs stored in the memory are executed by the processor, the electronic device is also caused to execute:
    将所述n个训练样本输入到初始神经网络模型进行多个任务学习,所述多个任务包括分类任务和回归任务;Input the n training samples into the initial neural network model to perform multiple task learning, the multiple tasks include classification tasks and regression tasks;
    根据所述多个任务的学习结果,得到每个训练样本的第二文本识别结果,所述第二文本识别结果包括回归任务对应的文本边框的位置,以及分类任务对应的每个像素点是否是文本。According to the learning results of the multiple tasks, a second text recognition result of each training sample is obtained, the second text recognition result includes the position of the text frame corresponding to the regression task, and whether each pixel corresponding to the classification task is text.
  11. 如权利要求7至10任一项所述的电子设备,其特征在于,当所述存储器存储的一个或多个计算机程序被所述处理器执行时,还使得所述电子设备执行:The electronic device according to any one of claims 7 to 10, wherein when one or more computer programs stored in the memory are executed by the processor, the electronic device is also caused to execute:
    通过调用参考神经网络模型的开放接口,将所述n个训练样本输入所述参考神经网络模型中;Input the n training samples into the reference neural network model by calling the open interface of the reference neural network model;
    接收所述参考神经网络模型输出的与每个训练样本对应的第一文本识别结果。Receiving the first text recognition result corresponding to each training sample output by the reference neural network model.
  12. 如权利要求7至10任一项所述的电子设备,其特征在于,当所述存储器存储的一个或多个计算机程序被所述处理器执行时,还使得所述电子设备执行:The electronic device according to any one of claims 7 to 10, wherein when one or more computer programs stored in the memory are executed by the processor, the electronic device is also caused to execute:
    根据所述电子设备当前运行环境的配置信息和人工智能芯片的硬件信息,确定所述初始神经网络模型,其中,所述初始神经网络模型的结构、参数和大小能够被所述电子设备支持。The initial neural network model is determined according to the configuration information of the current operating environment of the electronic device and the hardware information of the artificial intelligence chip, wherein the structure, parameters, and size of the initial neural network model can be supported by the electronic device.
  13. 一种计算机存储介质,其特征在于,所述计算机可读存储介质包括计算机程序,当计算机程序在电子设备上运行时,使得所述电子设备执行如权利要求1至6任一所述的方法。A computer storage medium, wherein the computer-readable storage medium includes a computer program, and when the computer program is run on an electronic device, the electronic device is caused to perform the method according to any one of claims 1 to 6.
  14. 一种芯片,其特征在于,所述芯片与存储器耦合,用于执行所述存储器中存储的计算机程序,以执行如权利要求1至6任一项所述的方法。A chip, characterized in that the chip is coupled to a memory for executing a computer program stored in the memory to perform the method according to any one of claims 1 to 6.
PCT/CN2019/119815 2018-11-29 2019-11-21 Neural network model training method and electronic device WO2020108368A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201811443331 2018-11-29
CN201811443331.7 2018-11-29
CN201910139681.2A CN111242273B (en) 2018-11-29 2019-02-26 Neural network model training method and electronic equipment
CN201910139681.2 2019-02-26

Publications (1)

Publication Number Publication Date
WO2020108368A1 true WO2020108368A1 (en) 2020-06-04

Family

ID=70854051

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/119815 WO2020108368A1 (en) 2018-11-29 2019-11-21 Neural network model training method and electronic device

Country Status (1)

Country Link
WO (1) WO2020108368A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510083A (en) * 2018-03-29 2018-09-07 国信优易数据有限公司 A kind of neural network model compression method and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510083A (en) * 2018-03-29 2018-09-07 国信优易数据有限公司 A kind of neural network model compression method and device

Similar Documents

Publication Publication Date Title
CN111242273B (en) Neural network model training method and electronic equipment
US20210201147A1 (en) Model training method, machine translation method, computer device, and storage medium
WO2020164270A1 (en) Deep-learning-based pedestrian detection method, system and apparatus, and storage medium
US12094229B2 (en) Character recognition method and apparatus, computer device, and storage medium
US20210264227A1 (en) Method for locating image region, model training method, and related apparatus
WO2020048308A1 (en) Multimedia resource classification method and apparatus, computer device, and storage medium
WO2020140772A1 (en) Face detection method, apparatus, device, and storage medium
WO2021249053A1 (en) Image processing method and related apparatus
EP3702961B1 (en) Hand detection in first person view
JP7125562B2 (en) Target tracking method, computer program, and electronic device
WO2021027585A1 (en) Human face image processing method and electronic device
US10599913B2 (en) Face model matrix training method and apparatus, and storage medium
CN107909583B (en) Image processing method and device and terminal
WO2022042120A1 (en) Target image extracting method, neural network training method, and device
US10133955B2 (en) Systems and methods for object recognition based on human visual pathway
US20220198836A1 (en) Gesture recognition method, electronic device, computer-readable storage medium, and chip
WO2022100221A1 (en) Retrieval processing method and apparatus, and storage medium
CN109033935B (en) Head-up line detection method and device
WO2022166069A1 (en) Deep learning network determination method and apparatus, and electronic device and storage medium
WO2020244151A1 (en) Image processing method and apparatus, terminal, and storage medium
WO2024021742A1 (en) Fixation point estimation method and related device
WO2022193973A1 (en) Image processing method and apparatus, electronic device, computer readable storage medium, and computer program product
US20230032683A1 (en) Method for reconstructing dendritic tissue in image, device and storage medium
US9898799B2 (en) Method for image processing and electronic device supporting thereof
WO2021204187A1 (en) Layout analysis method and electronic device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19889314

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19889314

Country of ref document: EP

Kind code of ref document: A1