WO2022247005A1

WO2022247005A1 - Method and apparatus for identifying target object in image, electronic device and storage medium

Info

Publication number: WO2022247005A1
Application number: PCT/CN2021/109479
Authority: WO
Inventors: 王瑞; 李君�; 陈凌智; 薛淑月; 吕传峰
Original assignee: 平安科技（深圳）有限公司
Priority date: 2021-05-27
Filing date: 2021-07-30
Publication date: 2022-12-01
Also published as: CN113283446B; CN113283446A

Abstract

A method and apparatus for identifying a target object in an image, a device and a medium, related to the image processing technology, comprising: obtaining a sample image set by means of image amplification; constructing a first training set and a test set by using a noise image and the sample image set, and training an original identification model by using the first training set to obtain an initial identification model; testing the initial identification model by using the test set, selecting the error test result to construct a second training set to generate a first feature vector, and constructing a second feature vector of a sample image; and calculating a loss value between the first feature vector and the second feature vector to perform parameter update on the initial identification model to obtain a standard identification model to identify an image to be identified, and obtaining the target object identification result in the image. In addition, the method also relates to the blockchain technology, and the sample image may be stored in a node of a blockchain. The method can solve the problem of lower accuracy of target object identification.

Description

Object recognition method, device, electronic equipment and storage medium in image

This application claims the priority of the Chinese patent application with the application number CN202110581184.5 and the title "Method, device, electronic equipment and storage medium for object recognition in images" submitted to the China Patent Office on May 27, 2021, all of which The contents are incorporated by reference in this application.

technical field

The present application relates to the technical field of artificial intelligence, and in particular to a method, device, electronic equipment, and computer-readable storage medium for object recognition in an image.

Background technique

With the rapid development of artificial intelligence technology, it is more and more common in people's daily life to use image recognition models to process images to identify the information of the target contained in the images, for example, to use image recognition model machines to analyze disease images , to determine the type of lesion in the image, etc. However, due to the privacy of medical images, the number of images that can be used to train the model is very small, which leads to the low accuracy of the trained model.

The inventor realized that in the existing model training for a small number of samples, the original small number of samples is often expanded into multiple images by way of image expansion, so as to realize a large amount of training for the model. However, in this method, since the new image generated by image expansion is obtained based on the features of the original image, the basic features of the image in the new image have not changed, which may easily lead to overfitting of the trained model, which in turn leads to the failure of the model. The accuracy of target recognition is not high.

Contents of the invention

A method for recognizing objects in an image provided by the present application includes:

Acquiring a sample image containing the target object, performing image amplification on the sample image to obtain a sample image set;

constructing a first training set and a test set by using a noise image of the same type as the sample image and the sample image set, and using the first training set to train a pre-built original recognition model to obtain an initial recognition model;

Using the test set to test the initial recognition model, and selecting preset types of error results in the test results to construct a second training set, and using the sample images to construct a third training set;

extracting the first feature vector of the image in the second training set, and extracting the second feature vector of the image in the third training set;

calculating a loss value between the first feature vector and the second feature vector, and updating parameters of the initial recognition model according to the loss value to obtain a standard recognition model;

The image to be recognized is acquired, and the standard recognition model is used to perform target recognition on the image to be recognized to obtain a target recognition result in the image.

The present application also provides a device for identifying objects in an image, the device comprising:

An image amplification module, configured to obtain a sample image containing a target object, perform image amplification on the sample image, and obtain a sample image set;

The first training module is configured to use noise images of the same type as the sample image and the sample image set to construct a first training set and a test set, and use the first training set to perform pre-built original recognition models. Training to get the initial recognition model;

A model testing module, configured to use the test set to test the initial recognition model, select a preset type of error result in the test result to construct a second training set, and use the sample image to construct a third training set;

A feature extraction module, configured to extract a first feature vector of images in the second training set, and extract a second feature vector of images in the third training set;

A second training module, configured to calculate a loss value between the first feature vector and the second feature vector, and update parameters of the initial recognition model according to the loss value to obtain a standard recognition model;

The image recognition module is used to obtain the image to be recognized, and use the standard recognition model to perform target recognition on the image to be recognized to obtain the target recognition result in the image.

The present application also provides an electronic device, the electronic device comprising:

a memory storing at least one instruction; and

A processor, executing instructions stored in the memory to implement the following steps:

Please also provide a computer-readable storage medium, where at least one instruction is stored in the computer-readable storage medium, and the at least one instruction is executed by a processor in the electronic device to implement the following steps:

Description of drawings

FIG. 1 is a schematic flow diagram of a method for recognizing objects in an image provided by an embodiment of the present application;

FIG. 2 is a schematic flow diagram of generating a first feature vector provided by an embodiment of the present application;

Fig. 3 is a functional module diagram of an object recognition device in an image provided by an embodiment of the present application;

FIG. 4 is a schematic structural diagram of an electronic device for implementing the method for recognizing an object in an image provided by an embodiment of the present application.

The realization, functional features and advantages of the present application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

Detailed ways

It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.

The embodiments of the present application may acquire and process relevant data based on artificial intelligence technology. Among them, artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .

Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

An embodiment of the present application provides a method for recognizing an object in an image. The executor of the method for object recognition in an image includes, but is not limited to, at least one of electronic devices such as a server and a terminal that can be configured to execute the method provided by the embodiment of the present application. In other words, the method for identifying an object in an image can be executed by software or hardware installed on a terminal device or a server device, and the software can be a blockchain platform. The server includes, but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like. The server can be an independent server, or it can provide cloud services, cloud database, cloud computing, cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, content distribution network (ContentDelivery Network, CDN ), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.

Referring to FIG. 1 , it is a schematic flow chart of a method for recognizing an object in an image provided by an embodiment of the present application. In this embodiment, the target recognition method in the image includes:

S1. Acquire a sample image including a target object, and perform image amplification on the sample image to obtain a sample image set.

In the embodiment of the present application, the sample image contains a specific target object and the real label corresponding to the target object. For example, when the target object is an apple, the sample image is an image containing an apple, and the real label is " Apple"; or, when the target object is a lesion of a certain disease, the sample image is an image containing the lesion, and the real label is the name of the disease corresponding to the lesion.

In the embodiment of the present application, the pre-stored sample image can be captured from the pre-built block chain node through the java statement with the data capture function, and the high throughput of the block chain for data can be used to improve the speed of obtaining the sample image efficiency.

In the embodiment of the present application, the image amplification of the sample image can be realized by performing geometric transformation, color change, contrast adjustment, and partial occlusion on the sample image.

In one embodiment of the present application, the sample image set is generated by stretching the sample image in different sizes in the horizontal and vertical directions to obtain the length, width or a combination of both.

Alternatively, by dyeing the sample image to change the color of the sample image into multiple different colors, a sample image set including sample images of multiple different colors is obtained.

Alternatively, a plurality of sample image sets in which different parts are covered are obtained by partially covering the sample image. For example, cover the upper part of the target object in the sample image set to obtain a sample image in which only the lower part of the target object can be seen, and cover the right half part of the target object in the sample image set to obtain only the left half of the target object. half of the sample images, and the sample images that cover different areas are collected as a sample image set.

In one embodiment of the present application, performing image amplification on the sample image to obtain a sample image set includes:

performing texture drawing on the sample image to obtain the image texture of the sample image;

performing random local deepening on the image texture to obtain a texture deepened image;

performing random partial lightening on the image file to obtain a texture lightened image;

Collect the texture-enhanced image and the texture-lighten image into the sample image set.

In detail, image texture extraction algorithms such as GLCM (Gray-level co-occurrence matrix, gray-level co-occurrence matrix) method, LBP (Local Binary Pattern, local binary pattern) can be used to implement texture depiction of the sample image to highlight Output the image texture in the sample image.

Specifically, the random local deepening of the image texture to obtain a texture deepened image includes:

Count the number of textures of the image texture;

Selecting an image texture with a preset ratio as the texture to be processed according to the number of textures;

Adjusting pixel values of the pixels on the texture to be processed to obtain a texture-enhanced image.

In one of the application scenarios of the present application, the image texture of the preset ratio is selected according to the number of textures, and the pixel value on the selected texture to be processed is adjusted (for example, the pixel value on the texture to be processed is adjusted to a black range ), to realize the deepening of the texture to be processed, and obtain the texture deepened image.

Similarly, the step of performing random local lightening on the image file to obtain a texture lightening image is consistent with the step of texture deepening, for example, adjusting the pixel value on the texture to be processed to a white range to realize the lightening of the texture to be processed, Get a textured faded image.

S2. Using noise images of the same type as the sample image and the sample image set to construct a first training set and a test set, and using the first training set to train a pre-built original recognition model to obtain an initial recognition Model.

In this embodiment of the present application, the noise image is an image of the same type as the sample image, but the target object in the noise image is different from that in the sample image (for example, the noise image and the sample image are both is a fruit image, but the noise image contains watermelons, and the sample image contains apples), similarly, the noise image also contains the real label corresponding to the noise image. For example, the target object contained in the sample image is a focus of a lung disease, but the focus of a liver disease may be contained in the noise image.

In the embodiment of the present application, the noise image and the sample image set are collected, and the collected sample image and noise image are divided into a first training set and a test set according to a preset division ratio. Wherein, both the first training set and the test set include sample images and noise images.

In the embodiment of the present application, the original recognition model can use networks with image recognition functions such as VGG network, GoogleNet and Residual Network.

In one embodiment of the present application, EfficientNet is used as the backbone network of the original recognition model, and the EfficientNet is a composite multi-dimensional convolutional neural network, which is beneficial to improve the accuracy of the image processing process, thereby improving the accuracy of image recognition. Accuracy.

Further, the embodiment of the present application uses the training set to train the original recognition model, so as to adjust the model parameters in the original recognition model, improve the accuracy of the original recognition model for image recognition, and obtain the initial recognition model.

Specifically, using the first training set to train the pre-built original recognition model to obtain an initial recognition model, including:

performing image recognition on the first training set by using the original recognition model to obtain a recognition result;

Calculate the loss value of the recognition result and the real label corresponding to each image in the first training set;

Adjusting parameters of the original recognition model according to the loss value to obtain an initial recognition model.

Specifically, the recognition result is the recognition result of the original recognition model for the type of target contained in each image in the first training set, for example, the first training set contains sample image A, sample image B and noise image C, where , the real label of sample image A and sample image B is apple (that is, the target object is apple), and the real label of image C is watermelon, but the recognition result obtained by the original recognition model is: sample image A is apple, and sample image B is watermelon , the noise image C is an apple.

The preset first loss function can be used to calculate the loss value of the recognition result and the real label corresponding to each image in the first training set, and then adjust the parameters of the original recognition model according to the loss value to improve the accuracy of the original recognition model.

For example, vector conversion is performed on the real labels in the first training set to obtain the real vector; vector conversion is performed on the recognition result to obtain the recognition vector, and the loss value between the real vector and the recognition vector corresponding to each image in the first training set is calculated respectively , and then use a preset optimization algorithm to adjust the parameters of the original recognition model according to the loss value. The optimization algorithm includes but is not limited to: batch gradient descent algorithm, stochastic gradient descent algorithm, and mini-batch gradient descent algorithm.

S3. Use the test set to test the initial recognition model, select preset types of error results from the test results to construct a second training set, and use the sample images to construct a third training set.

In the embodiment of the present application, the test set can be used to test the initial recognition model obtained in step S2, for example, the test set is input into the initial recognition model to obtain the test of each image in the test set by the initial recognition model result.

In one embodiment of the present application, when the second training set is constructed by using preset types of error results in the test results, the preset types of error results include error results of noise images in the test set.

For example, the test set contains sample image D, sample image E, noise image F and noise image G, wherein the real label of sample image D and sample image E is apple, and the real label of noise image E and noise image F is watermelon ; After the initial identification model identifies the test set, the test results obtained are: the sample image D and the noise image F are apples, the sample image E and the noise image G are watermelons, therefore, the noise image F and the sample image E is the wrong result in the test result, and the noise image F is selected as the second training set.

Further, in the embodiment of the present application, at least one sample image in the sample image set generated in step S1 is used as the third training set.

In the embodiment of the present application, the second training set is constructed by using the wrong results in the test results, and the third training set is constructed by using the sample images, so that the images that are easily misrecognized by the initial recognition model in the noise image and the images containing the target object can be realized. The sample images construct the training set separately, which is conducive to further training the initial recognition model and improving the accuracy of the initial recognition model.

S4. Extract a first feature vector of the images in the second training set, and extract a second feature vector of the images in the third training set.

In the embodiment of the present application, the first feature vectors of the images in the second training set may be extracted by using an image feature extraction algorithm such as the HOG algorithm, the LBP algorithm, and the Harr algorithm.

Further, the step of extracting the second feature vectors of the images in the third training set is the same as the step of extracting the first feature vectors of the images in the second training set, which will not be repeated here.

In one of the embodiments of the present application, as shown in FIG. 2, the extraction of the first feature vector of the images in the second training set includes:

S21. Using a preset first gradient operator to perform a horizontal convolution operation on the images in the second training set to obtain a horizontal gradient component;

S22. Using a preset second gradient operator to perform a vertical convolution operation on the images in the second training set to obtain a vertical gradient component;

S23. Calculate the first feature vector according to the horizontal gradient component and the vertical gradient component.

In detail, the first gradient operator and the second gradient operator are preset matrices, for example, the first gradient operator may be [-1, 0, 1], and the second gradient operator can be [1, 0, -1], and each image in the second training set can be obtained by convolving the first gradient operator and the second gradient operator with each image in the second training set The horizontal gradient component and vertical gradient component corresponding to the image.

Specifically, the use of the preset first gradient operator to perform a horizontal convolution operation on the image in the second training set to obtain a horizontal gradient component includes:

Get the convolution step size and convolution length;

calculating the number of horizontal convolutions according to the convolution step size and the convolution length;

Using the first gradient operator to perform a convolution operation on each image in the second training set according to the convolution step size for the number of convolutions to obtain a horizontal gradient component.

Wherein, the convolution step size refers to the pixel length that the first gradient operator needs to move after performing a convolution operation, and the convolution length refers to the length of each image in the second training set in the horizontal direction. pixel length, divide the convolution step by the convolution length, and calculate the number of times of horizontal convolution that needs to be performed on each image in the second training set in the horizontal direction.

In one of the embodiments of the present application, the calculating the first feature vector according to the horizontal gradient component and the vertical gradient component includes:

performing normalized calculation on the horizontal gradient component to obtain a horizontal normalized component;

performing normalized calculation on the vertical gradient component to obtain a vertical normalized component;

The horizontal normalized component and the vertical normalized component are squared and summed to obtain the first feature vector.

In detail, the horizontal gradient component and the vertical gradient component can be calculated by using a preset linear function, logarithmic function, or inverse cotangent function, etc. Normalization of gradient components.

Specifically, the horizontal normalized component and the vertical normalized component can be squared and summed using the following square summation formula to obtain the first eigenvector:

Wherein, L is the first feature vector, α is the horizontal normalized component, and β is the vertical normalized component.

S5. Calculate a loss value between the first feature vector and the second feature vector, and update parameters of the initial recognition model according to the loss value to obtain a standard recognition model.

In the embodiment of the present application, the preset second loss function can be used to calculate the loss value of the recognition result and the real label corresponding to each image in the first training set, and then adjust the parameters of the initial recognition model to improve the performance of the initial recognition model. Accuracy.

Wherein, the second loss function may be the same as or different from the first loss function in step S2.

In detail, the step of updating the parameters of the initial recognition model by using the loss value to obtain a standard recognition model is the same as when using the first training set to train the pre-built original recognition model in step S2, The steps of adjusting the parameters of the original recognition model are the same, and will not be repeated here.

S6. Acquire the image to be recognized, and use the standard recognition model to perform target recognition on the image to be recognized to obtain a target recognition result in the image.

In the embodiment of the present application, the image to be recognized may be an image that contains or does not contain the target object. When the image to be recognized is obtained, the standard recognition model can be used to perform target recognition on the image to be recognized to obtain the target object in the image. The recognition result of the object type.

The embodiment of the present application can realize the expansion of the original small number of samples through image expansion, and use the expanded sample image set and noise image to train the model at the same time, which is beneficial to improve the accuracy and robustness of the model; Test, and use the wrong results in the test results to construct a training set, and retrain the model with sample images, which can further improve the accuracy of the model's recognition of the target. Therefore, the method for identifying objects in images proposed by the present application can solve the problem of low accuracy in identifying objects.

As shown in FIG. 3 , it is a functional block diagram of an apparatus for recognizing objects in images provided by an embodiment of the present application.

The apparatus 100 for recognizing objects in images described in this application can be installed in electronic equipment. According to the realized functions, the device 100 for identifying objects in images may include an image augmentation module 101 , a first training module 102 , a model testing module 103 , a feature extraction module 104 , a second training module 105 and an image recognition module 106 . The module described in this application can also be called a unit, which refers to a series of computer program segments that can be executed by the processor of the electronic device and can complete fixed functions, and are stored in the memory of the electronic device.

In this embodiment, the functions of each module/unit are as follows:

The image augmentation module 101 is configured to acquire a sample image containing a target object, perform image augmentation on the sample image, and obtain a sample image set;

In one of the embodiments of the present application, the image augmentation module 101 is specifically used for:

Obtain a sample image containing the target object;

Count the number of textures of the image texture;

The first training module 102 is configured to use noise images of the same type as the sample image and the sample image set to construct a first training set and a test set, and use the first training set to compare the pre-built original The recognition model is trained to obtain the initial recognition model;

In detail, the first training module 102 is specifically used for:

The model testing module 103 is configured to use the test set to test the initial recognition model, select preset types of error results in the test results to construct a second training set, and use the sample images to construct a third training set ;

In the embodiment of the present application, the test set can be used to test the initial recognition model obtained in the first training module 102, for example, the test set is input into the initial recognition model to obtain the initial recognition model for each image in the test set. Image test results.

Further, in the embodiment of the present application, at least one sample image in the sample image set generated in the image augmentation module 101 is used as the third training set.

The feature extraction module 104 is configured to extract a first feature vector of images in the second training set, and extract a second feature vector of images in the third training set;

In one of the embodiments of the present application, the feature extraction module 104 is specifically used for:

performing a horizontal convolution operation on the images in the second training set by using a preset first gradient operator to obtain a horizontal gradient component;

performing a vertical convolution operation on the images in the second training set by using a preset second gradient operator to obtain a vertical gradient component;

calculating the first feature vector according to the horizontal gradient component and the vertical gradient component;

Extracting second feature vectors of images in the third training set.

Get the convolution step size and convolution length;

The second training module 105 is configured to calculate a loss value between the first feature vector and the second feature vector, and update the parameters of the initial recognition model according to the loss value to obtain a standard recognition model ;

Wherein, the second loss function may be the same as or different from the first loss function in the first training module 102 .

In detail, the step of updating the parameters of the initial recognition model using the loss value to obtain a standard recognition model is the same as performing the pre-built original recognition model using the first training set in the first training module 102. During training, the steps of adjusting the parameters of the original recognition model are the same, and will not be repeated here.

The image recognition module 106 is configured to acquire an image to be recognized, and use the standard recognition model to perform object recognition on the image to be recognized to obtain a target object recognition result in the image.

The embodiment of the present application can realize the expansion of the original small number of samples through image expansion, and use the expanded sample image set and noise image to train the model at the same time, which is beneficial to improve the accuracy and robustness of the model; Test, and use the wrong results in the test results to construct a training set, and retrain the model with sample images, which can further improve the accuracy of the model's recognition of the target. Therefore, the device for identifying objects in images proposed by the present application can solve the problem of low accuracy in identifying objects.

As shown in FIG. 4 , it is a schematic structural diagram of an electronic device for implementing a method for recognizing an object in an image provided by an embodiment of the present application.

The electronic device 1 may include a processor 10 , a memory 11 and a bus, and may also include a computer program stored in the memory 11 and operable on the processor 10 , such as a program 12 for object recognition in an image.

Wherein, the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc. The storage 11 may be an internal storage unit of the electronic device 1 in some embodiments, such as a mobile hard disk of the electronic device 1 . The memory 11 can also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk equipped on the electronic device 1, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital , SD) card, flash memory card (Flash Card), etc. Further, the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device. The memory 11 can not only be used to store application software and various data installed in the electronic device 1 , such as codes of the object recognition program 12 in images, but also can be used to temporarily store data that has been output or will be output.

In some embodiments, the processor 10 may be composed of integrated circuits, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions, including one or more Central processing unit (Central Processing unit, CPU), microprocessor, digital processing chip, graphics processor and a combination of various control chips, etc. The processor 10 is the control core (Control Unit) of the electronic device, and uses various interfaces and lines to connect the various components of the entire electronic device, and runs or executes programs or modules stored in the memory 11 (such as image target recognition program, etc.), and call the data stored in the memory 11 to execute various functions of the electronic device 1 and process data.

The bus may be a peripheral component interconnect standard (PCI for short) bus or an extended industry standard architecture (EISA for short) bus or the like. The bus can be divided into address bus, data bus, control bus and so on. The bus is configured to realize connection and communication between the memory 11 and at least one processor 10 and the like.

FIG. 4 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 4 does not constitute a limitation to the electronic device 1, and may include fewer or more components, or combinations of certain components, or different arrangements of components.

For example, although not shown, the electronic device 1 can also include a power supply (such as a battery) for supplying power to various components. Preferably, the power supply can be logically connected to the at least one processor 10 through a power management device, so that the power supply can be controlled by power management. The device implements functions such as charge management, discharge management, and power consumption management. The power supply may also include one or more DC or AC power supplies, recharging devices, power failure detection circuits, power converters or inverters, power status indicators and other arbitrary components. The electronic device 1 may also include various sensors, bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.

Further, the electronic device 1 may also include a network interface, optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which are usually used in the electronic device 1 Establish a communication connection with other electronic devices.

Optionally, the electronic device 1 may further include a user interface, which may be a display (Display) or an input unit (such as a keyboard (Keyboard)). Optionally, the user interface may also be a standard wired interface or a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like. Wherein, the display may also be appropriately called a display screen or a display unit, and is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.

It should be understood that the embodiments are only for illustration, and are not limited by the structure in terms of the scope of the patent application.

The object recognition program 12 in the image stored in the memory 11 in the electronic device 1 is a combination of multiple instructions. When running in the processor 10, it can realize:

Extract the first feature vector of the image in the second training set, extract the second feature vector of the image in the third training set;

Specifically, for the specific implementation method of the above instructions by the processor 10, reference may be made to the description of relevant steps in the embodiments corresponding to FIG. 1 to FIG. 4 , and details are not repeated here.

Further, if the integrated modules/units of the electronic device 1 are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. The computer-readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory).

The present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor of an electronic device, it can realize:

In the several embodiments provided in this application, it should be understood that the disclosed devices, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division methods in actual implementation.

The modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or in the form of hardware plus software function modules.

It will be apparent to those skilled in the art that the present application is not limited to the details of the exemplary embodiments described above, but that the present application can be implemented in other specific forms without departing from the spirit or essential characteristics of the present application.

Therefore, the embodiments should be regarded as exemplary and not restrictive in all points of view, and the scope of the application is defined by the appended claims rather than the foregoing description, and it is intended that the scope of the present application be defined by the appended claims rather than by the foregoing description. All changes within the meaning and range of equivalents of the elements are embraced in this application. Any reference sign in a claim should not be construed as limiting the claim concerned.

The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain (Blockchain), essentially a decentralized database, is a series of data blocks associated with each other using cryptographic methods. Each data block contains a batch of network transaction information, which is used to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

In addition, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or devices stated in the system claims may also be realized by one unit or device through software or hardware. Secondary terms are used to denote names without implying any particular order.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application without limitation. Although the present application has been described in detail with reference to the preferred embodiments, those skilled in the art should understand that the technical solutions of the present application can be Make modifications or equivalent replacements without departing from the spirit and scope of the technical solutions of the present application.

Claims

A method for object recognition in an image, wherein the method includes:

Acquiring a sample image containing the target object, performing image amplification on the sample image to obtain a sample image set;

constructing a first training set and a test set by using a noise image of the same type as the sample image and the sample image set, and using the first training set to train a pre-built original recognition model to obtain an initial recognition model;

Using the test set to test the initial recognition model, and selecting preset types of error results in the test results to construct a second training set, and using the sample images to construct a third training set;

extracting the first feature vector of the image in the second training set, and extracting the second feature vector of the image in the third training set;

calculating a loss value between the first feature vector and the second feature vector, and updating parameters of the initial recognition model according to the loss value to obtain a standard recognition model;

The image to be recognized is acquired, and the standard recognition model is used to perform target recognition on the image to be recognized to obtain a target recognition result in the image.
The method for identifying objects in an image according to claim 1, wherein said performing image amplification on said sample image to obtain a sample image set comprises:

performing texture drawing on the sample image to obtain the image texture of the sample image;

performing random local deepening on the image texture to obtain a texture deepened image;

performing random partial lightening on the image file to obtain a texture lightened image;

Collect the texture-enhanced image and the texture-lighten image into the sample image set.
The method for identifying objects in an image according to claim 2, wherein said performing random local deepening on said image texture to obtain a texture-enhanced image comprises:

Count the number of textures of the image texture;

Selecting an image texture with a preset ratio as the texture to be processed according to the number of textures;

Adjusting pixel values of the pixels on the texture to be processed to obtain a texture-enhanced image.
The method for recognizing an object in an image according to claim 1, wherein said using said first training set to train a pre-built original recognition model to obtain an initial recognition model, comprising:

performing image recognition on the first training set by using the original recognition model to obtain a recognition result;

Calculate the loss value of the recognition result and the real label corresponding to each image in the first training set;

Adjusting parameters of the original recognition model according to the loss value to obtain an initial recognition model.
The object recognition method in an image according to any one of claims 1 to 4, wherein said extracting the first feature vector of the image in the second training set comprises:

performing a horizontal convolution operation on the images in the second training set by using a preset first gradient operator to obtain a horizontal gradient component;

performing a vertical convolution operation on the images in the second training set by using a preset second gradient operator to obtain a vertical gradient component;

The first feature vector is calculated according to the horizontal gradient component and the vertical gradient component.
The method for identifying objects in an image according to claim 5, wherein the horizontal convolution operation is performed on the image in the second training set using the preset first gradient operator to obtain a horizontal gradient component, comprising:

Get the convolution step size and convolution length;

calculating the number of horizontal convolutions according to the convolution step size and the convolution length;

Using the first gradient operator to perform a convolution operation on each image in the second training set according to the convolution step size for the number of convolutions to obtain a horizontal gradient component.
The method for recognizing objects in an image according to claim 5, wherein said calculating said first feature vector according to said horizontal gradient component and said vertical gradient component comprises:

performing normalized calculation on the horizontal gradient component to obtain a horizontal normalized component;

performing normalized calculation on the vertical gradient component to obtain a vertical normalized component;

The horizontal normalized component and the vertical normalized component are squared and summed to obtain the first feature vector.
A device for recognizing an object in an image, wherein the device includes:

An image amplification module, configured to obtain a sample image containing a target object, perform image amplification on the sample image, and obtain a sample image set;

The first training module is configured to use noise images of the same type as the sample image and the sample image set to construct a first training set and a test set, and use the first training set to perform pre-built original recognition models. Training to get the initial recognition model;

A model testing module, configured to use the test set to test the initial recognition model, select a preset type of error result in the test result to construct a second training set, and use the sample image to construct a third training set;

A feature extraction module, configured to extract a first feature vector of images in the second training set, and extract a second feature vector of images in the third training set;

A second training module, configured to calculate a loss value between the first feature vector and the second feature vector, and update parameters of the initial recognition model according to the loss value to obtain a standard recognition model;

The image recognition module is used to obtain the image to be recognized, and use the standard recognition model to perform object recognition on the image to be recognized to obtain a target object recognition result in the image.
An electronic device, wherein the electronic device includes:

at least one processor; and,

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can perform the following steps:

Acquiring a sample image containing the target object, performing image amplification on the sample image to obtain a sample image set;

constructing a first training set and a test set using noise images of the same type as the sample image and the sample image set, and using the first training set to train a pre-built original recognition model to obtain an initial recognition model;

Using the test set to test the initial recognition model, and selecting preset types of error results in the test results to construct a second training set, and using the sample images to construct a third training set;

extracting the first feature vector of the image in the second training set, and extracting the second feature vector of the image in the third training set;

calculating a loss value between the first feature vector and the second feature vector, and updating parameters of the initial recognition model according to the loss value to obtain a standard recognition model;

The image to be recognized is acquired, and the standard recognition model is used to perform target recognition on the image to be recognized to obtain a target recognition result in the image.
The electronic device according to claim 9, wherein said performing image amplification on said sample image to obtain a sample image set comprises:

performing texture drawing on the sample image to obtain the image texture of the sample image;

performing random local deepening on the image texture to obtain a texture deepened image;

performing random partial lightening on the image file to obtain a texture lightened image;

Collect the texture-enhanced image and the texture-lighten image into the sample image set.
The electronic device according to claim 10, wherein said random local deepening of said image texture to obtain a texture deepened image comprises:

Count the number of textures of the image texture;

Selecting an image texture with a preset ratio as the texture to be processed according to the number of textures;

Adjusting pixel values of the pixels on the texture to be processed to obtain a texture-enhanced image.
The electronic device according to claim 9, wherein said utilizing said first training set to train a pre-built original recognition model to obtain an initial recognition model, comprising:

performing image recognition on the first training set by using the original recognition model to obtain a recognition result;

Calculate the loss value of the recognition result and the real label corresponding to each image in the first training set;

Adjusting parameters of the original recognition model according to the loss value to obtain an initial recognition model.
The electronic device according to any one of claims 9 to 12, wherein said extracting the first feature vectors of the images in the second training set comprises:

performing a horizontal convolution operation on the images in the second training set by using a preset first gradient operator to obtain a horizontal gradient component;

performing a vertical convolution operation on the images in the second training set by using a preset second gradient operator to obtain a vertical gradient component;

The first feature vector is calculated according to the horizontal gradient component and the vertical gradient component.
The electronic device according to claim 13, wherein the horizontal convolution operation is performed on the image in the second training set by using the preset first gradient operator to obtain a horizontal gradient component, comprising:

Get the convolution step size and convolution length;

calculating the number of horizontal convolutions according to the convolution step size and the convolution length;

Using the first gradient operator to perform a convolution operation on each image in the second training set according to the convolution step size for the number of convolutions to obtain a horizontal gradient component.
The electronic device according to claim 13, wherein said calculating said first feature vector according to said horizontal gradient component and said vertical gradient component comprises:

performing normalized calculation on the horizontal gradient component to obtain a horizontal normalized component;

performing normalized calculation on the vertical gradient component to obtain a vertical normalized component;

The horizontal normalized component and the vertical normalized component are squared and summed to obtain the first feature vector.
A computer-readable storage medium storing a computer program, wherein the computer program implements the following steps when executed by a processor:

Acquiring a sample image containing the target object, performing image amplification on the sample image to obtain a sample image set;

constructing a first training set and a test set by using a noise image of the same type as the sample image and the sample image set, and using the first training set to train a pre-built original recognition model to obtain an initial recognition model;

Using the test set to test the initial recognition model, and selecting preset types of error results in the test results to construct a second training set, and using the sample images to construct a third training set;

extracting the first feature vector of the image in the second training set, and extracting the second feature vector of the image in the third training set;

calculating a loss value between the first feature vector and the second feature vector, and updating parameters of the initial recognition model according to the loss value to obtain a standard recognition model;

The image to be recognized is acquired, and the standard recognition model is used to perform target recognition on the image to be recognized to obtain a target recognition result in the image.
The computer-readable storage medium according to claim 16, wherein said performing image augmentation on said sample image to obtain a sample image set comprises:

performing texture drawing on the sample image to obtain the image texture of the sample image;

performing random local deepening on the image texture to obtain a texture deepened image;

performing random partial lightening on the image file to obtain a texture lightened image;

Collect the texture-enhanced image and the texture-lighten image into the sample image set.
The computer-readable storage medium according to claim 17, wherein said performing random local deepening on said image texture to obtain a texture-enhanced image comprises:

Count the number of textures of the image texture;

Selecting an image texture with a preset ratio as the texture to be processed according to the number of textures;

Adjusting pixel values of the pixels on the texture to be processed to obtain a texture-enhanced image.
The computer-readable storage medium according to claim 16, wherein said using said first training set to train a pre-built original recognition model to obtain an initial recognition model comprises:

performing image recognition on the first training set by using the original recognition model to obtain a recognition result;

Calculate the loss value of the recognition result and the real label corresponding to each image in the first training set;

Adjusting parameters of the original recognition model according to the loss value to obtain an initial recognition model.
The computer-readable storage medium according to any one of claims 16 to 19, wherein said extracting the first feature vector of the images in the second training set comprises:

performing a horizontal convolution operation on the images in the second training set by using a preset first gradient operator to obtain a horizontal gradient component;

performing a vertical convolution operation on the images in the second training set by using a preset second gradient operator to obtain a vertical gradient component;

The first feature vector is calculated according to the horizontal gradient component and the vertical gradient component.