CN113033659A

CN113033659A - Method and device for training image recognition model and image recognition

Info

Publication number: CN113033659A
Application number: CN202110313831.4A
Authority: CN
Inventors: 崔程; 杨敏; 薛学通; 魏凯
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-03-24
Filing date: 2021-03-24
Publication date: 2021-06-25

Abstract

The invention discloses a training and image recognition method for an image recognition model, relates to the field of artificial intelligence, and particularly relates to the technical field of computer vision and deep learning. The training method of the image recognition model comprises the following steps: acquiring training data, wherein the training data comprises a plurality of first images and label labels corresponding to the first images; determining position data of the subject in each first image; adjusting the main body in each first image according to the position data to obtain a second image corresponding to each first image; and training the neural network model according to the second images and the label labels corresponding to the second images until the neural network model converges to obtain the image recognition model. The image identification method comprises the following steps: acquiring an image to be identified: and taking the image to be recognized as the input of the image recognition model, and taking the output result of the image recognition model as the recognition result of the image to be recognized. The method and the device can improve the accuracy of the image recognition model obtained by training in image recognition.

Description

Method and device for training image recognition model and image recognition

Technical Field

The present disclosure relates to the field of artificial intelligence technology, and in particular to the field of computer vision and deep learning technology. Provided are a method, a device, an electronic device and a readable storage medium for training an image recognition model and recognizing an image.

Background

The image recognition technology is a technology for extracting features of an image by means of machine learning and then distinguishing different images by the extracted features. The image recognition technology is widely applied to various visual tasks, such as plant classification, dish recognition, landmark recognition and the like. In the field of image recognition, how to improve the accuracy of an image recognition model is one of the most sought problems in academic and industrial fields.

Disclosure of Invention

The disclosure provides a training and image recognition method and device for an image recognition model, an electronic device and a readable storage medium, which are used for improving the accuracy of the image recognition model in image recognition.

According to a first aspect of the present disclosure, there is provided a training method of an image recognition model, including: acquiring training data, wherein the training data comprises a plurality of first images and label labels corresponding to the first images; determining position data of the subject in each first image; adjusting the main body in each first image according to the position data to obtain a second image corresponding to each first image; and training the neural network model according to the second images and the label labels corresponding to the second images until the neural network model converges to obtain an image recognition model.

According to a second aspect of the present disclosure, there is provided a method of image recognition, comprising: acquiring an image to be identified: and taking the image to be recognized as the input of an image recognition model, and taking the output result of the image recognition model as the recognition result of the image to be recognized.

According to a third aspect of the present disclosure, there is provided a training apparatus for an image recognition model, comprising: the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring training data, and the training data comprises a plurality of first images and label labels corresponding to the first images; a determination unit configured to determine position data of the subject in each of the first images; the processing unit is used for adjusting the main body in each first image according to the position data to obtain a second image corresponding to each first image; and the training unit is used for training the neural network model according to the second images and the label labels corresponding to the second images until the neural network model converges to obtain the image recognition model.

According to a fourth aspect of the present disclosure, there is provided an apparatus for image recognition, comprising: a second acquisition unit configured to acquire an image to be recognized: and the identification unit is used for taking the image to be identified as the input of an image identification model and taking the output result of the image identification model as the identification result of the image to be identified.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method as described above.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method as described above.

According to the technical scheme, the position data of the main body in the first image is determined, and the main body in the first image is adjusted according to the determined position data to obtain the second image, so that the position or the position and the size of the main body in the image are changed, the neural network model can identify the main body in the image according to the difference between the position or the position and the size, the convergence precision of the neural network model is greatly improved, and the identification accuracy of the trained image identification model during image identification is improved.

It should be understood that what is described in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;

FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure;

FIG. 4 is a schematic diagram according to a fourth embodiment of the present disclosure;

FIG. 5 is a schematic diagram according to a fifth embodiment of the present disclosure;

FIG. 6 is a block diagram of an electronic device for implementing a method of training an image recognition model and image recognition according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure. As shown in fig. 1, the training method of the image recognition model of this embodiment may specifically include the following steps:

s101, acquiring training data, wherein the training data comprises a plurality of first images and label labels corresponding to the first images;

s102, determining position data of a main body in each first image;

s103, adjusting the main body in each first image according to the position data to obtain a second image corresponding to each first image;

and S104, training the neural network model according to the second images and the label labels corresponding to the second images until the neural network model converges to obtain an image recognition model.

In the method for training the image recognition model of this embodiment, after the position data of the subject included in each first image in the training data is determined, the subject in each first image is adjusted according to the determined position data to obtain each second image, and then the neural network model is trained according to each obtained second image and the corresponding label thereof.

The training data obtained by performing S101 in this embodiment may be composed of a plurality of data sets, each data set including a plurality of first images; the label in the acquired training data is identification information of the subject in each first image, and the identification information may be a name of the subject, a numerical code of the subject, and the like. Preferably, the first image in the present embodiment is an image including only one subject.

After executing S101 to acquire a plurality of first images and annotation labels corresponding to the first images, the present embodiment executes S102 to determine position data of a subject in each first image. The position data determined in this embodiment is a coordinate value of a position of the main body in the first image, specifically, a coordinate value of an upper left corner and/or a coordinate value of a lower right corner of a rectangular frame surrounding the main body in the image.

Specifically, when S102 is executed to determine the position data of the subject in each first image, the present embodiment may adopt the following optional implementation manners: the first images are input as a subject detection model, and the output result of the subject detection model is used as position data of the subject in the first images. The subject detection model in this embodiment is trained in advance, and is capable of outputting position data of a subject in an input image according to the image.

After the position data of the subject in each first image is determined in the embodiment in S102, S103 is performed to adjust the subject in each first image according to the position data, so as to obtain a second image corresponding to each first image.

Specifically, when S103 is executed to adjust the subject in each first image according to the position data, the present embodiment may adopt the following optional implementation manners: determining a floating value corresponding to each first image, wherein the floating value represents the range of the coordinate value of the main body in the image for floating; and adjusting the position data of the main body in each first image according to the determined floating value, and taking the adjustment result as a second image corresponding to each first image.

That is, the present embodiment adjusts the subject in the first image according to the determined position data, so that the position and/or size of the subject in the second image is changed, but the adjustment does not change the actual content of the entity, thereby achieving the purpose of performing data enhancement on the first image according to the position data.

In this embodiment, when S103 is executed to determine the floating value corresponding to each first image, the optional implementation manner that may be adopted is: determining a data set to which each first image belongs; the floating value corresponding to the determined data set is used as the floating value corresponding to each first image.

That is, the present embodiment also sets the corresponding relationship between the data sets and the floating values in advance, and different data sets correspond to different floating values, so as to achieve the purpose of adjusting the first image belonging to different data sets by using different floating values.

It will be appreciated that the present embodiment may also set the same float value for different first images, i.e. use the same float value to adjust the position and/or size of the subject in different first images.

In addition, when S103 is executed to adjust the position data of the subject in each first image according to the determined floating value, and the adjustment result is taken as the second image corresponding to each first image, the present embodiment may adopt an optional implementation manner as follows: and aiming at each first image, adjusting each coordinate value in the position data in a preset direction according to the determined floating value, and taking the adjustment result as a second image corresponding to each first image.

The preset direction in this embodiment includes an up direction, a down direction, a left direction, a right direction, an up left direction, an up right direction, a down left direction, and a down right direction, and the preset direction used when each coordinate value is adjusted according to the floating value is one of the directions.

Since the determined position data of the main body includes the coordinate value of the upper left corner and/or the coordinate value of the lower right corner, in this embodiment, when adjusting the two coordinate values, the used preset directions may be the same direction or different directions, when adjusting using the same preset direction, the position of the main body in the first image is changed, and when adjusting using different preset directions, the position and size of the main body in the first image are changed.

In this embodiment, after the second images corresponding to the first images are obtained by executing S103, S104 is executed to train the neural network model according to the second images and the label labels corresponding to the second images until the neural network model converges, so as to obtain the image recognition model. The image recognition model obtained by training in the present embodiment can output the recognition result of the subject in the image from the input image.

It can be understood that, because the embodiment only changes the position and/or size of the entity in the first image and does not change the actual content of the entity in the first image when the second image corresponding to each first image is obtained by executing S103, the embodiment uses the annotation tag corresponding to the first image for generating the second image as the annotation tag corresponding to the second image.

In this embodiment, when S104 is executed to train the neural network model according to each second image and the label tag corresponding to each second image until the neural network model converges, an optional implementation manner that can be adopted is as follows: taking each second image as the input of the neural network model to obtain the output result of the neural network model aiming at each second image; calculating a loss function according to the output result of each second image and the annotation type corresponding to each second image, wherein the loss function can be calculated by using a calculation mode of a cross entropy loss function; and finishing the training of the neural network model under the condition of determining the convergence of the calculated loss function.

According to the method provided by the embodiment, the position data of the subject in the first image is determined, and the subject in the first image is adjusted according to the determined position data to obtain the second image, so that the position or the position and the size of the subject in the image are changed, and therefore, the neural network model can identify the subject in the image according to the difference between the position or the position and the size, the convergence accuracy of the neural network model is greatly increased, and the identification accuracy of the trained image identification model during image identification is improved.

Fig. 2 is a schematic diagram according to a second embodiment of the present disclosure. As shown in fig. 2, in this embodiment, when executing S104 "training the neural network model according to each second image and the label tag corresponding to each second image until the neural network model converges to obtain the image recognition model", the method may specifically include the following steps:

s201, matching every two second images to obtain a plurality of second image pairs;

s202, obtaining a third image corresponding to each second image pair according to the two second images in each second image pair;

s203, training the neural network model according to the third images and the label labels corresponding to the third images until the neural network model converges to obtain an image recognition model.

That is to say, the embodiment also generates the third image according to the obtained second image, thereby further enhancing the training data, and accordingly improving the accuracy of the trained image recognition model in image recognition.

In this embodiment, when performing S201 to match each second image with each other, a random matching manner may be adopted to combine two second images into one second image pair. In addition, if the training data is composed of a plurality of data sets, in the embodiment, when performing S201 to match each second image pairwise, each second image in the same data set is matched pairwise.

After performing S201 to obtain a plurality of second image pairs, the present embodiment performs S202 to obtain a third image corresponding to each second image pair from two second images in each second image pair.

Specifically, when S202 is executed to obtain the third image corresponding to each second image pair according to two second images in each second image pair, the present embodiment may adopt an optional implementation manner as follows: cutting the two images in each second image pair, for example, cutting the images in a random cutting mode; and aliasing the cutting results of the two second images in each second image pair, and taking the aliasing result as a third image corresponding to each second image pair.

In this embodiment, after the third images corresponding to the second image pairs are obtained by executing S202, S203 is executed to train the neural network model according to the third images and the label labels corresponding to the third images until the neural network model converges, so as to obtain the image recognition model.

It can be understood that, since the third image obtained in this embodiment is obtained by cropping and aliasing the two second images, the annotation label corresponding to the third image can be obtained according to the annotation labels corresponding to the two second images.

In this embodiment, the annotation label corresponding to the third image is composed of annotation labels corresponding to two second images before aliasing, and the weight of each annotation label is the area ratio of the two second images after aliasing in the third image.

For example, if the third image 1 is obtained from the second image 1 and the second image 2, the annotation label corresponding to the second image 1 is "cat", and the annotation label corresponding to the second image 2 is "dog", and if the area ratio of the blended second image 1 in the third image 1 is 0.4 and the area ratio of the blended second image 2 in the third image 1 is 0.6, the annotation labels corresponding to the third image in this embodiment are "cat 0.4 and dog 0.6".

Fig. 3 is a schematic diagram of a third embodiment of the present disclosure, and as shown in fig. 3, the method for image recognition of the present embodiment specifically includes the following steps:

s301, acquiring an image to be identified:

s302, the image to be recognized is used as the input of an image recognition model, and the output result of the image recognition model is used as the recognition result of the image to be recognized.

In the method for image recognition of the present embodiment, the image recognition model obtained by pre-training in the above embodiment is used to obtain the recognition result, and the image recognition model is obtained by training the third image obtained by data enhancement, so that the accuracy of the obtained recognition result can be improved.

The image to be recognized acquired in S301 may be an existing image or an image captured in real time. The embodiment executes S203 to obtain a recognition result, which is identification information of the subject included in the image to be recognized, such as a name of the subject, a category of the subject, and the like.

Fig. 4 is a schematic diagram according to a third embodiment of the present disclosure. As shown in fig. 3, the training apparatus 400 for an image recognition model of the present embodiment includes:

the first obtaining unit 401 is configured to obtain training data, where the training data includes a plurality of first images and labeling labels corresponding to the first images;

a determining unit 402 for determining position data of the subject in each first image;

the processing unit 403 is configured to adjust the main body in each first image according to the position data to obtain a second image corresponding to each first image;

the training unit 404 is configured to train the neural network model according to each second image and the label corresponding to each second image until the neural network model converges to obtain an image recognition model.

The training data acquired by the first acquisition unit 401 may be composed of a plurality of data sets, each data set containing a plurality of first images; the label in the acquired training data is identification information of the subject in each first image, and the identification information may be a name of the subject, a numerical code of the subject, and the like. Preferably, the first image acquired by the first acquisition unit 401 is an image including only one subject.

In the present embodiment, after a plurality of first images and an annotation label corresponding to each first image are acquired by the first acquisition unit 401, the determination unit 402 determines position data of a subject in each first image. The position data determined by the determining unit 402 is a coordinate value of a position of the main body in the first image, specifically, a coordinate value of an upper left corner and/or a coordinate value of a lower right corner of a rectangular frame surrounding the main body in the image.

Specifically, when determining the position data of the subject in each first image, the determining unit 402 may adopt the following optional implementation manners: the first images are input as a subject detection model, and the output result of the subject detection model is used as position data of the subject in the first images. The subject detection model used by the determination unit 302 is trained in advance, and is capable of outputting position data of a subject in an input image from the image.

After the position data of the subject in each first image is determined by the determining unit 402, the subject in each first image is adjusted by the processing unit 403 according to the position data, so that a second image corresponding to each first image is obtained.

Specifically, when the processing unit 403 adjusts the subject in each first image according to the position data, the optional implementation manners that can be adopted are as follows: determining a floating value corresponding to each first image; and adjusting the position data of the main body in each first image according to the determined floating value, and taking the adjustment result as a second image corresponding to each first image.

That is, the processing unit 403 adjusts the subject in the first image by the determined position data, so that the position and/or size of the subject in the second image is changed, but the adjustment does not change the actual content of the entity, achieving the purpose of data enhancement of the first image according to the position data.

When the processing unit 403 determines the floating value corresponding to each first image, the optional implementation manners that may be adopted are: determining a data set to which each first image belongs; the floating value corresponding to the determined data set is used as the floating value corresponding to each first image.

That is, the processing unit 403 may also preset the correspondence between the data sets and the floating values, and different data sets correspond to different floating values, so as to achieve the purpose of adjusting the first image belonging to different data sets by using different floating values.

It will be appreciated that the processing unit 403 may also set the same float value for different first images, i.e. adjust the position and/or size of the subject in different first images using the same float value.

In addition, when the processing unit 403 adjusts the position data of the subject in each first image according to the determined floating value and takes the adjustment result as the second image corresponding to each first image, the optional implementation manners that can be adopted are as follows: and aiming at each first image, adjusting each coordinate value in the position data in a preset direction according to the determined floating value, and taking the adjustment result as a second image corresponding to each first image.

The preset direction in the processing unit 403 includes an up direction, a down direction, a left direction, a right direction, an up left direction, an up right direction, a down left direction, and a down right direction, and the preset direction used when each coordinate value is adjusted according to the floating value is one of the above directions.

Since the determined position data of the main body includes the coordinate value of the upper left corner and/or the coordinate value of the lower right corner, the preset direction used by the processing unit 403 may be the same direction or different directions when adjusting the two coordinate values.

In this embodiment, after the processing unit 403 obtains the second images corresponding to the first images, the training unit 404 trains the neural network model according to the second images and the label labels corresponding to the second images until the neural network model converges to obtain the image recognition model. The image recognition model obtained by training by the training unit 404 can output the recognition result of the subject in the image from the input image.

It can be understood that, since the processing unit 403 only changes the position and/or size of the entity in the first image and does not change the actual content of the entity in the first image when obtaining the second image corresponding to each first image, the training unit 404 uses the annotation label corresponding to the first image for generating the second image as the annotation label corresponding to the second image.

When the training unit 404 trains the neural network model according to each second image and the label corresponding to each second image until the neural network model converges, the optional implementation manner that can be adopted is: taking each second image as the input of the neural network model to obtain the output result of the neural network model aiming at each second image; calculating a loss function according to the output result of each second image and the label type corresponding to each second image, and the training unit 404 may calculate the loss function by using a calculation method of a cross entropy loss function; and finishing the training of the neural network model under the condition of determining the convergence of the calculated loss function.

The training unit 404 trains the neural network model according to the second images and the label labels corresponding to the second images until the neural network model converges, and may further adopt the following method when obtaining the image recognition model: matching every two second images to obtain a plurality of second image pairs; obtaining a third image corresponding to each second image pair according to the two second images in each second image pair; and training the neural network model according to each third image and the label corresponding to each third image until the neural network model converges to obtain the image recognition model.

That is, the training unit 404 generates a third image according to the obtained second image, so as to further enhance the training data, and accordingly improve the accuracy of the trained image recognition model in performing image recognition.

When the training unit 404 matches every two second images, the two second images may be combined into a second image pair by using a random matching method. In addition, if the training data is composed of a plurality of data sets, the training unit 404 matches two second images located in the same data set when matching two second images.

Specifically, when the training unit 404 obtains the third image corresponding to each second image pair according to two second images in each second image pair, the optional implementation manners that can be adopted are: cutting two images in each second image pair; and aliasing the cutting results of the two second images in each second image pair, and taking the aliasing result as a third image corresponding to each second image pair.

It can be understood that, since the third image obtained by the training unit 404 is obtained by cropping and aliasing the two second images, the annotation label corresponding to the third image can be obtained according to the annotation labels corresponding to the two second images.

The annotation label corresponding to the third image in the training unit 404 is composed of the annotation labels corresponding to the two second images before aliasing, and the weight of each annotation label is the area ratio of the two second images after aliasing in the third image.

Fig. 5 is a schematic diagram of a fourth embodiment according to the present disclosure, as shown in fig. 5, an image recognition apparatus 500 of the present embodiment includes:

the second obtaining unit 501 is configured to obtain an image to be recognized:

the identifying unit 502 is configured to use the image to be identified as an input of an image identification model, and use an output result of the image identification model as an identification result of the image to be identified.

The image to be recognized acquired by the second acquiring unit 501 may be an existing image or an image captured in real time. The recognition result obtained by the recognition unit 502 is identification information of the subject included in the image to be recognized, such as the name of the subject, the category of the subject, and the like.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

As shown in fig. 6, it is a block diagram of an electronic device of a method for training an image recognition model and image recognition according to an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 605 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the above-described respective methods and processes, such as training of an image recognition model and a method of image recognition. For example, in some embodiments, the method of training of the image recognition model and image recognition may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608.

In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM602 and/or the communication unit 609. When the computer program is loaded into the RAM603 and executed by the computing unit 601, one or more steps of the method of training of an image recognition model and image recognition described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured by any other suitable means (e.g., by means of firmware) to perform the method of training of the image recognition model and image recognition.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A training method of an image recognition model comprises the following steps:

acquiring training data, wherein the training data comprises a plurality of first images and label labels corresponding to the first images;

determining position data of the subject in each first image;

adjusting the main body in each first image according to the position data to obtain a second image corresponding to each first image;

and training the neural network model according to the second images and the label labels corresponding to the second images until the neural network model converges to obtain an image recognition model.

2. The method of claim 1, wherein the determining the location data of the subject in each first image comprises:

using each first image as an input of a main body detection model, and using an output result of the main body detection model as position data of a main body in each first image;

and the determined position data is the coordinate value of the main body in the first image.

3. The method of claim 1, wherein the adjusting the subject in each first image according to the position data comprises:

determining a floating value corresponding to each first image;

and adjusting the position data of the main body in each first image according to the floating value, and taking the adjustment result as a second image corresponding to each first image.

4. The method of claim 3, wherein the determining a float value for each first image comprises:

determining a data set to which each first image belongs;

the floating value corresponding to the determined data set is taken as the floating value corresponding to each first image.

5. The method of claim 3, wherein the adjusting the position data of the subject in each first image according to the floating value, and the using the adjusted result as the second image corresponding to each first image comprises:

and aiming at each first image, adjusting each coordinate value in the position data in a preset direction according to the floating value, and taking an adjustment result as a second image corresponding to each first image.

6. The method of claim 1, wherein the training of the neural network model according to each second image and the label corresponding to each second image until the neural network model converges to obtain the image recognition model comprises:

matching every two second images to obtain a plurality of second image pairs;

obtaining a third image corresponding to each second image pair according to the two second images in each second image pair;

and training the neural network model according to each third image and the label corresponding to each third image until the neural network model converges to obtain an image recognition model.

7. The method of claim 6, wherein obtaining a third image corresponding to each second image pair from two second images in each second image pair comprises:

cutting two images in each second image pair;

and aliasing the cutting results of the two second images in each second image pair, and taking the aliasing result as a third image corresponding to each second image pair.

8. A method of image recognition, comprising:

acquiring an image to be identified:

the image to be recognized is used as the input of an image recognition model, and the output result of the image recognition model is used as the recognition result of the image to be recognized;

wherein the image recognition model is pre-trained according to the method of any one of claims 1 to 5.

9. An apparatus for training an image recognition model, comprising:

the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring training data, and the training data comprises a plurality of first images and label labels corresponding to the first images;

a determination unit configured to determine position data of the subject in each of the first images;

the processing unit is used for adjusting the main body in each first image according to the position data to obtain a second image corresponding to each first image;

and the training unit is used for training the neural network model according to the second images and the label labels corresponding to the second images until the neural network model converges to obtain the image recognition model.

10. The apparatus according to claim 9, wherein the determining unit, when determining the position data of the subject in each first image, specifically performs:

11. The apparatus according to claim 9, wherein the processing unit, when adjusting the subject in each first image according to the position data, specifically performs:

determining a floating value corresponding to each first image;

12. The apparatus according to claim 11, wherein the processing unit, when determining the floating value corresponding to each first image, specifically performs:

determining a data set to which each first image belongs;

13. The apparatus according to claim 11, wherein the processing unit, when adjusting the position data of the subject in each first image according to the floating value and using the adjustment result as the second image corresponding to each first image, specifically performs:

14. The apparatus according to claim 9, wherein the training unit, when training the neural network model according to each second image and the label corresponding to each second image until the neural network model converges to obtain the image recognition model, specifically performs:

matching every two second images to obtain a plurality of second image pairs;

15. The apparatus according to claim 14, wherein the training unit, when obtaining the third image corresponding to each second image pair from two second images in each second image pair, specifically performs:

cutting two images in each second image pair;

16. An image recognition apparatus comprising:

a second acquisition unit configured to acquire an image to be recognized:

the recognition unit is used for taking the image to be recognized as the input of an image recognition model and taking the output result of the image recognition model as the recognition result of the image to be recognized;

wherein the image recognition model is pre-trained according to the apparatus of any one of claims 9 to 15.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1 to 8.

19. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 8.