CN111160460A

CN111160460A - Object recognition method and device, electronic device and medium

Info

Publication number: CN111160460A
Application number: CN201911390471.7A
Authority: CN
Inventors: 马骁; 姜譞
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2020-05-15

Abstract

The present disclosure provides an object recognition method, including: determining a first mapping relation between a visible light image and a dermatoscope image aiming at a first object, wherein the visible light image is an image acquired by using a first lens, the dermatoscope image does not need to be acquired, but if the dermatoscope image is acquired, a second lens is required to be acquired, and the magnification of the second lens is larger than that of the first lens; determining corresponding visible light image characteristic information based on the visible light image for the first object; and identifying the first object based on the determined first mapping relation and the visible light image characteristic information. The present disclosure also provides an object recognition apparatus, an electronic device, and a computer-readable storage medium.

Description

Object recognition method and device, electronic device and medium

Technical Field

The present disclosure relates to an object recognition method, an object recognition apparatus, an electronic device, and a computer-readable storage medium.

Background

When using computer vision techniques to identify skin disorders, three dimensional modalities of image data are typically used: visible light image data, skin mirror image data, and pathology image data.

Currently, in a workflow of a doctor, a visible light image is used for focus identification, if the visible light image cannot be identified, a dermatoscope image is further used for focus identification, and if the dermatoscope image cannot be identified, a pathology image is further used for focus identification.

This is because, among the image data of the three-scale modalities, the visible light image data has universality, and the visible light image has advantages of noninvasive acquisition and low price, but the visible light image data has a disadvantage of low accuracy when identifying a lesion with respect to some disease types because of insufficient information amount. Although the dermatoscope image also has the advantage of non-invasive acquisition, the dermatoscope image is expensive and needs professional equipment for acquisition, and many hospitals do not have the realization condition. The acquisition of pathological images can damage the skin and has the disadvantages of high price, long waiting period, and the like.

Therefore, lesion identification using visible light images for certain diseases is currently the most economical and time-saving method. However, due to the uncertainty of the visible light image in identifying the lesion of some disease types, the visible light image needs to be combined with the image of the high-scale modality to improve the accuracy.

Disclosure of Invention

One aspect of the present disclosure provides an object recognition method, including: determining a first mapping relation between a visible light image and a dermatoscope image aiming at a first object, wherein the visible light image is an image acquired by using a first lens, the dermatoscope image does not need to be acquired, but if the dermatoscope image is acquired, a second lens is required to be acquired, and the magnification of the second lens is larger than that of the first lens; determining corresponding visible light image characteristic information based on the visible light image for the first object; and identifying the first object based on the determined first mapping relation and the visible light image characteristic information.

Optionally, the determining a first mapping relationship between the visible light image and the dermatoscope image for the first object comprises: determining visible light image data for the visible light image; inputting the visible light image data into a first encoder to obtain first characteristic information output by a middle layer of the first encoder, wherein the first encoder can encode the visible light image data of a target object to obtain dermatoscope image data of the target object, and the target object comprises the first object; and characterizing the first mapping relation by using the first characteristic information.

Optionally, the method further comprises: training the first encoder; the training the first encoder includes: acquiring visible light images and dermatoscope images of a plurality of second objects, wherein the visible light images of the plurality of second objects are images acquired by using the first lens, and the dermatoscope images of the plurality of second objects are images acquired by using the second lens; and training the first encoder based on the visible light image and the dermatoscope image for the plurality of second objects.

Optionally, the identifying the first object based on the determined first mapping relationship and the visible light image feature information includes: determining a first feature vector based on the first mapping relation and the visible light image feature information; inputting the first feature vector into a first classification model to output a corresponding classification result; and identifying the first object based on the classification result.

Optionally, the method further comprises: training the first classification model; the training the first classification model comprises: acquiring visible light images and dermatoscope images of a plurality of third objects, wherein the visible light images of the plurality of third objects are images acquired by using the first lens, and the dermatoscope images of the plurality of third objects are images acquired by using the second lens; determining a plurality of third mapping relationships between the visible light images and the dermatoscope images for the plurality of third objects; and training the first classification model based on the visible light images for the third objects and the third mapping relationships.

Another aspect of the present disclosure provides an object recognition apparatus including: the device comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining a first mapping relation between a visible light image and a skin mirror image of a first object, the visible light image is an image acquired by using a first lens, the skin mirror image does not need to be acquired, but if the skin mirror image is acquired, a second lens is required to be acquired, and the magnification of the second lens is larger than that of the first lens; a second determining module, configured to determine corresponding visible light image characteristic information based on the visible light image for the first object; and the identification module is used for identifying the first object based on the determined first mapping relation and the visible light image characteristic information.

Optionally, the first determining module includes: a determination unit configured to determine visible light image data of the visible light image; a first obtaining unit, configured to input the visible light image data into a first encoder to obtain first feature information output by an intermediate layer of the first encoder, where the first encoder is capable of encoding visible light image data of a target object to obtain dermatoscope image data of the target object, where the target object includes the first object; and the processing unit is used for representing the first mapping relation by utilizing the first characteristic information.

Optionally, the apparatus further comprises: a training module to train the first encoder; the training module comprises: a second acquisition unit configured to acquire visible light images and dermatoscope images for a plurality of second objects, wherein the visible light images for the plurality of second objects are images acquired using the first lens, and the dermatoscope images for the plurality of second objects are images acquired using the second lens; and a training unit for training the first encoder based on the visible light image and the dermatoscope image for the plurality of second objects.

Another aspect of the present disclosure provides an electronic device including: one or more processors; memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the above-described methods of embodiments of the present disclosure.

Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the above-described method of the embodiments of the present disclosure when executed.

Another aspect of the present disclosure provides a computer program comprising computer executable instructions for implementing the above method of an embodiment of the present disclosure when executed.

Drawings

For a more complete understanding of the present disclosure and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:

fig. 1 schematically illustrates an application scenario of an object recognition method and apparatus according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of an object recognition method according to an embodiment of the present disclosure;

FIG. 3A schematically shows a schematic diagram of a depth encoder according to an embodiment of the present disclosure;

FIG. 3B schematically illustrates a schematic diagram of extracting feature information from a visible light image, according to an embodiment of the disclosure;

fig. 3C schematically illustrates a schematic diagram of object recognition based on visible light images, according to an embodiment of the disclosure;

FIG. 4 schematically illustrates a block diagram of an object recognition apparatus according to an embodiment of the present disclosure; and

fig. 5 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

Some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations thereof, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, which execute via the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks. The techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). In addition, the techniques of this disclosure may take the form of a computer program product on a computer-readable storage medium having instructions stored thereon for use by or in connection with an instruction execution system.

The embodiment of the disclosure provides an object identification method which does not need to acquire a skin mirror image and can achieve identification precision which can be achieved by identifying a focus based on the skin mirror image, and an object identification device which can apply the method. The method includes the following operations. Determining a first mapping relation between a visible light image and a dermatoscope image aiming at a first object, wherein the visible light image is an image acquired by using a first lens, the dermatoscope image does not need to be acquired, but if the dermatoscope image is acquired, the dermatoscope image needs to be acquired by using a second lens, and the magnification of the second lens is larger than that of the first lens. Based on the visible light image for the first object, corresponding visible light image characteristic information is determined. And identifying the first object based on the determined first mapping relation and the visible light image characteristic information.

Fig. 1 schematically illustrates an application scenario of an object recognition method and apparatus according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a scenario in which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, but does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

In the workflow of a doctor, at first, skin disease recognition is performed using a visible light image 101 as shown in fig. 1, and if the skin disease is not recognized using the visible light image 101, skin disease recognition is further performed using a dermatoscope image 102, and if the skin disease is not recognized using the dermatoscope image 102, skin disease recognition is further performed using a pathology image 103.

As can be seen, although it is possible to discriminate skin diseases using only image data in one scale mode, such as visible light image data, in the prior art, since the amount of information contained in a visible light image is insufficient, it is not good enough to discriminate skin diseases using only image data in one scale mode, such as visible light image data.

In order to overcome the problem of poor precision when skin disease discrimination is carried out by using image data of only one scale mode, such as visible light image data, the related technology also provides a scheme for identifying the focus by using a multi-scale mode fused model. But this approach requires training a model in which the multi-scale modalities are fused.

Since the skin mirror image and the pathological image have the disadvantages described above, they have no universality, and the condition that the image data of the three scale modalities are complete can only appear on the patient with serious illness, obviously, the method has too high requirement on data integrity, and does not meet the clinical requirement.

In view of this, the disclosed embodiments provide an improved object recognition method. By using the method, only the visible light image of the patient needs to be acquired, the skin mirror image of the patient does not need to be further acquired, but the identification precision which can be achieved based on the skin mirror image can be achieved based on the visible light image.

The present disclosure will be described in detail below with reference to specific examples.

Fig. 2 schematically shows a flow chart of an object recognition method according to an embodiment of the present disclosure.

As shown in fig. 2, the method may include operations S210 to S230, for example.

In operation S210, a first mapping relationship between a visible light image and a dermoscope image for a first object is determined, wherein the visible light image is an image captured using a first lens, the dermoscope image does not need to be captured, but if the dermoscope image is captured, a second lens is required to be captured, and a magnification of the second lens is greater than that of the first lens.

It should be noted that, in the embodiment of the present disclosure, the visible light image may be captured using the first device, where the optical lens used for capturing by the first device is the first lens. The dermatoscopic image need not be acquired specifically in the disclosed embodiments, but if such an image is acquired, it needs to be acquired using a special second device, where the optical lens used for acquisition by the second device is the second lens. The magnification of the second lens is larger than that of the first lens.

Specifically, in order to understand the skin condition of any skin of any person, in the embodiment of the present disclosure, only the visible light image at the corresponding skin may be acquired for the person, and the dermatoscope image at the skin does not have to be acquired. Because the information output by the coding part (namely the encoder part) of the depth encoder from the visible light image to the dermatoscope image contains high-order semantic information which is fused with the visible light image and is shared by the corresponding dermatoscope image, after the visible light image of the skin of a person is collected, the visible light image can be input into the depth encoder, corresponding characteristic information is extracted from the information output by the coding part of the depth encoder, and the extracted characteristic information is used as the mapping relation between the visible light image of the skin of the person and the dermatoscope image.

Fig. 3A schematically illustrates a depth encoder according to an embodiment of the disclosure.

As shown in fig. 3A, for the depth encoder 31, the left end is a visible light image input layer 311, the right end is a skin mirror image output layer 312, and the information output by the middle layer 313 is the information output by the encoding part of the depth encoder.

The embodiment of the disclosure only needs to acquire images of small-scale modalities, that is, only needs to acquire visible light images, so that the embodiment of the disclosure has the advantages of noninvasive acquisition and low price, and no special professional equipment is needed for acquiring the visible light images, so that many hospitals have implementation conditions.

It should be noted that, in the embodiment of the present disclosure, the first object may be any pigmented nevus, melanoma, red birthmark, hemangioma, etc. that a person grows on any skin. In the embodiments of the present disclosure, pigmented nevi, melanoma, red birthmark, hemangioma, etc. may be collectively referred to as a lesion.

In operation S220, based on the visible light image for the first object, corresponding visible light image characteristic information is determined.

It should be noted that, in the embodiment of the present disclosure, any image feature extraction method may be used to extract corresponding image features from the visible light image, and is not limited herein.

For example, fig. 3B schematically illustrates a schematic diagram of extracting feature information from a visible light image according to an embodiment of the present disclosure. As shown in fig. 3B, a visible light image a may be input to the neural network 32 for convolution operation and sampling processing, and information output by the neural network 32 may be used as feature information of the visible light image.

In operation S230, a first object is identified based on the determined first mapping relationship and the visible-light image characteristic information.

Specifically, the feature information of the visible light image and the mapping information used for characterizing the mapping relationship determined based on the visible light image may be fused (e.g., spliced) to obtain fused feature information, the fused feature information is input into a pre-trained classifier for classification, and the first object is subjected to lesion identification based on the classification result. For example, it is determined whether the first object is a pigmented nevus, a melanoma, or the like based on the classification result.

According to the embodiment of the disclosure, since only the visible light image of the small-scale modality needs to be acquired, the advantages of noninvasive acquisition and low price are achieved, and no special professional equipment is needed for acquiring the visible light image, so that many hospitals have implementation conditions.

In addition, according to the embodiments of the present disclosure, in the object recognition process, in addition to the image features of the visible light image, a mapping relationship between the dermatoscope images corresponding to the visible light image (that is, the dermatoscope images are the same as the object to which the visible light image is directed) is also considered, that is, feature information of the dermatoscope images corresponding to the visible light image (corresponding to image data using two scale modalities for an image using one scale modality) is also considered, so that compared with the related art in which an image using one scale modality only uses image data of one scale modality, the embodiments of the present disclosure refer to a larger amount of information and have higher recognition accuracy.

In addition, although the embodiment of the present disclosure relates to fusion of image data of multi-scale modalities, it is not necessary to acquire images of multiple scale modalities, for example, only visible light images need to be acquired, a dermatoscope image does not need to be acquired, and image data of multiple scale modalities are not needed to be complete.

As an alternative embodiment, determining the first mapping relationship between the visible light image and the dermatoscope image for the first object may for example comprise the following operations.

Visible light image data of the visible light image is determined.

Inputting the visible light image data into a first encoder to acquire first characteristic information output by a middle layer of the first encoder, wherein the first encoder can encode the visible light image data of a target object to obtain dermatoscope image data of the target object, and the target object comprises a first object.

And characterizing the first mapping relation by using the first characteristic information.

Specifically, referring back to fig. 3A, in order to obtain a mapping relationship between a visible light image and its corresponding skin mirror image, for example, the visible light image may be input to a visible light image input terminal 311 of a depth encoder 31 (e.g., a first encoder), and output information is obtained from an intermediate layer 313 of the depth encoder 31, so as to use the output information to characterize the mapping relationship between the visible light image and its corresponding skin mirror image.

Because the information output by the middle layer of the depth encoder from the visible light image to the skin mirror image corresponding to the visible light image contains the visible light image and the high-order semantic information shared by the skin mirror image corresponding to the visible light image, after the visible light image of the skin of a person is collected, the visible light image can be input into the depth encoder, corresponding characteristic information can be rapidly extracted from the information output by the encoding part of the depth encoder, and the characteristic information is used as the mapping relation between the visible light image of the skin of the person and the corresponding skin mirror image.

Further, as an alternative embodiment, the method may further comprise training the first encoder, for example.

Wherein training the first encoder may comprise, for example, the following operations.

And acquiring visible light images and dermatoscope images of the plurality of second objects, wherein the visible light images of the plurality of second objects are images acquired by using the first lens, and the dermatoscope images of the plurality of second objects are images acquired by using the second lens.

The first encoder is trained based on the visible light images and the dermatoscope images for the plurality of second objects.

In the disclosed embodiment, visible light images and dermoscopic images of past patients may be collected as training samples for the disclosed embodiment. It should be noted that, in the training sample, the visible light image and the dermatoscope image need to be paired, and the visible light image and the dermatoscope image that are paired must be acquired from the same focus of the same patient. And in the disclosed embodiment, both the visible light image and the dermatoscope image need to be actually acquired.

Specifically, after the training samples are obtained, the depth encoder from the visible light image to the dermatoscope image can be trained by using the neural network to learn the mapping relationship from the visible light image to the dermatoscope image. After the trained depth encoder is obtained, a visible light image is directly input at the input end of the depth encoder when the depth encoder is used, information output by the encoder part is taken out, and the information fuses high-order semantic information shared by the visible light image and a skin mirror image corresponding to the visible light image. Referring back to fig. 3A, the network to the left of the middle layer 313 in the figure is an encoder portion, and the network to the right of the middle layer 313 in the figure is a decoder portion.

Here, when predicting the mapping relationship between the visible light image and the dermatoscope image, since only the output information of the encoder portion (i.e., the information output by the intermediate layer 313) is used, it is not necessary to acquire the corresponding dermatoscope image, and thus the high cost required for acquiring the dermatoscope image can be saved.

As an alternative embodiment, identifying the first object based on the determined first mapping relationship and the visible-light image characteristic information may include, for example, the following operations.

And determining a first feature vector based on the first mapping relation and the visible light image feature information.

And inputting the first feature vector into the first classification model to output a corresponding classification result.

Based on the classification result, the first object is identified.

Specifically, the feature information representing the first mapping relationship and the feature information of the corresponding visible light image may be fused (e.g., spliced), so as to determine the corresponding first feature vector.

For example, assuming that a visible light image a of a black spot of the back skin of a patient is acquired, according to the foregoing embodiment of the present disclosure, the mapping relationship between the visible light image a and the corresponding dermoscopic image a can be predicted from the visible light image a, wherein the characteristic information X₁For characterizing the mapping, feature vectors F are used₁(H，W，T₁) Denotes, T₁For example, 512-dimensional features may be included. Further, extracting image features from the visible-light image a can obtain feature information X₂Using feature vectors F₂(H，W，T₂)，T₂For example, 1024-dimensional features may be included. When identifying the lesion, F may be added₁(H，W，T₁) And F₂(H，W，T₂) Spliced into F₃(H，W，T₃) Wherein T is₃＝(T₁+T₂)，T₃May include, for example, T₁Including 512 dimensional features and T₂Including 1024 dimensional features, and F₃(H，W，T₃) And inputting a first classification model which is trained in advance as a first feature vector so that the back skin black spot of the patient is a pigmented nevus or melanoma.

Fig. 3C schematically illustrates a schematic diagram of object recognition based on visible light images according to an embodiment of the disclosure.

As shown in FIG. 3C, the visible light image A can be input to the depth encoder 31 and output from an intermediate layer 313 (not shown) of the depth encoder 31Extracting characteristic information X from the extracted information₁. Meanwhile, the visible light image A can be input into the neural network 32 in the figure to output the characteristic information X of the visible light image A₂. Finally, the feature information X is₁Corresponding feature vector F₁(H，W，T₁) And from the characteristic information X₂Corresponding feature vector F₂(H，W，T₂) F obtained by fusion₃(H，W，T₃) The first classification model 33 in the figure is inputted to obtain the output information (i.e. classification result) of the first classification model 33, and then the skin black spot on the back of the patient is determined to be a pigmented nevus or a melanoma according to the information outputted by the first classification model 33. Furthermore, as shown in fig. 3C, the neural network 32 includes a convolutional layer 321 and a pooling layer 322, wherein the convolutional layer 321 is used for implementing convolution operations, and the pooling layer 322 is used for implementing sampling operations.

Further, as an alternative embodiment, the method may further include training the first classification model, for example.

Wherein training the first classification model may for example comprise the following operations.

And acquiring visible light images and dermatoscope images of a plurality of third objects, wherein the visible light images of the plurality of third objects are images acquired by using the first lens, and the dermatoscope images of the plurality of third objects are images acquired by using the second lens.

A plurality of third mapping relationships between the visible light images and the dermatoscope images for a plurality of third objects is determined.

The first classification model is trained based on the visible light images for the plurality of third objects and the plurality of third mapping relationships.

In the embodiment of the present disclosure, similarly to the aforementioned embodiment of training the first encoder, specifically, the visible light images and the dermoscopic images of the plurality of third subjects may be taken as samples, and the samples may be divided into training samples and prediction samples. The method provided by the foregoing embodiment of training the first encoder may be used to train a corresponding depth encoder using the training sample, and then input the prediction sample into the depth encoder, so as to predict a mapping relationship between each visible light image in the prediction sample and its corresponding dermoscopic image. And simultaneously extracting the characteristic information of each visible light image in the prediction sample. And then, fusing the characteristic information of each visible light image in the prediction sample with the characteristic information for representing the mapping relation corresponding to each visible light image to obtain fused characteristic information. And finally, training a first classification model in the embodiment of the disclosure by using the fused feature information.

By the aid of the method and the device, when the visible light classification model (namely the first classification model) is trained, the characteristic information output by the encoder part of the depth encoder of each visible light image in the prediction sample is fused with the characteristic information extracted from each visible light image in the prediction sample, and training is carried out based on the fused characteristic information, so that the characteristic information of the relevant skin mirror image learned by the depth encoder can be added into a visible light classification network, and accuracy of the visible light classification model can be improved.

In other words, according to the embodiment of the present disclosure, when the visible light image is used for classification and identification, information included in the skin mirror image is added through the mapping relationship, and thus the accuracy of identification can be improved. And the characteristic information in the dermatoscope image is coded into the visible light classification network through the mapping relation during training, so that only the visible light image needs to be collected during testing, and the method accords with the workflow of a doctor.

Fig. 4 schematically shows a block diagram of an object recognition apparatus according to an embodiment of the present disclosure.

As shown in fig. 4, the object recognition apparatus 400 includes a first determination module 410, a second determination module 420, and a recognition module 430. The object recognition apparatus 400 may perform the method described above with reference to fig. 2 to achieve a recognition effect that is achieved without taking a dermatoscope image but can achieve image recognition based on the dermatoscope image.

Specifically, the first determining module 410 is configured to determine a first mapping relationship between a visible light image and a dermoscope image for a first object, where the visible light image is an image captured by using a first lens, the dermoscope image does not need to be captured, but if the dermoscope image is captured, a second lens is required to be captured, and a magnification of the second lens is greater than that of the first lens.

A second determining module 420, configured to determine corresponding visible light image characteristic information based on the visible light image for the first object.

And the identifying module 430 is configured to identify the first object based on the determined first mapping relationship and the visible light image feature information.

As an alternative embodiment, the first determining module includes: a determining unit for determining visible light image data of the visible light image. The first acquisition unit is used for inputting visible light image data into a first encoder so as to acquire first characteristic information output by an intermediate layer of the first encoder, wherein the first encoder can encode the visible light image data of a target object so as to obtain dermatoscope image data of the target object, and the target object comprises a first object. And the processing unit is used for representing the first mapping relation by utilizing the first characteristic information.

Further, as an optional embodiment, the apparatus further comprises: a training module to train the first encoder. The training module comprises: and a second acquisition unit for acquiring visible light images and dermoscopic images of the plurality of second objects, wherein the visible light images of the plurality of second objects are images acquired by using the first lens, and the dermoscopic images of the plurality of second objects are images acquired by using the second lens. A training unit to train the first encoder based on the visible light images and the dermoscopic images for the plurality of second objects.

It should be noted that, in the embodiment of the present disclosure, the embodiment of the apparatus portion is the same as or similar to the embodiment of the method portion, and is not described herein again.

Any of the modules, units, or at least part of the functionality of any of them according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules and units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, units according to the embodiments of the present disclosure may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by any other reasonable means of hardware or firmware by integrating or packaging the circuits, or in any one of three implementations of software, hardware and firmware, or in any suitable combination of any of them. Alternatively, one or more of the modules, units according to embodiments of the present disclosure may be implemented at least partly as computer program modules, which, when executed, may perform the respective functions.

For example, any number of the first determining module 410, the second determining module 420, and the identifying module 430 may be combined in one module to be implemented, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the first determining module 410, the second determining module 420, and the identifying module 430 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware. Alternatively, at least one of the first determining module 410, the second determining module 420 and the identifying module 430 may be at least partially implemented as a computer program module, which when executed, may perform a corresponding function.

Fig. 5 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 5, the electronic device 500 includes a processor 510, a computer-readable storage medium 520. The electronic device 500 may perform a method according to an embodiment of the present disclosure.

In particular, processor 510 may include, for example, a general purpose microprocessor, an instruction set processor and/or related chip set and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), and/or the like. The processor 510 may also include on-board memory for caching purposes. Processor 510 may be a single processing unit or a plurality of processing units for performing different actions of a method flow according to embodiments of the disclosure.

Computer-readable storage media 520, for example, may be non-volatile computer-readable storage media, specific examples including, but not limited to: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and so on.

The computer-readable storage medium 520 may include a computer program 521, which computer program 521 may include code/computer-executable instructions that, when executed by the processor 510, cause the processor 510 to perform a method according to an embodiment of the disclosure, or any variation thereof.

The computer program 521 may be configured with, for example, computer program code comprising computer program modules. For example, in an example embodiment, code in computer program 521 may include one or more program modules, including for example 521A, modules 521B, … …. It should be noted that the division and number of modules are not fixed, and those skilled in the art may use suitable program modules or program module combinations according to actual situations, and when these program modules are executed by the processor 510, the processor 510 may execute the method according to the embodiment of the present disclosure or any variation thereof.

According to an embodiment of the present disclosure, at least one of the first determining module 410, the second determining module 420 and the identifying module 430 may be implemented as a computer program module described with reference to fig. 5, which, when executed by the processor 510, may implement the respective operations described above.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

While the disclosure has been shown and described with reference to certain exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims and their equivalents. Accordingly, the scope of the present disclosure should not be limited to the above-described embodiments, but should be defined not only by the appended claims, but also by equivalents thereof.

Claims

1. An object recognition method, comprising:

determining a first mapping relation between a visible light image and a dermatoscope image aiming at a first object, wherein the visible light image is an image acquired by using a first lens, the dermatoscope image does not need to be acquired, but if the dermatoscope image is acquired, a second lens is required to be acquired, and the magnification of the second lens is larger than that of the first lens;

determining corresponding visible light image characteristic information based on the visible light image for the first object; and

and identifying the first object based on the determined first mapping relation and the visible light image characteristic information.

2. The method of claim 1, wherein the determining a first mapping relationship between a visible light image and a dermatoscope image for a first object comprises:

determining visible light image data for the visible light image;

inputting the visible light image data into a first encoder to obtain first characteristic information output by a middle layer of the first encoder, wherein the first encoder can encode the visible light image data of a target object to obtain dermatoscope image data of the target object, and the target object comprises the first object; and

3. The method of claim 2, wherein the method further comprises: training the first encoder;

the training the first encoder includes:

acquiring visible light images and dermatoscope images of a plurality of second objects, wherein the visible light images of the plurality of second objects are images acquired by using the first lens, and the dermatoscope images of the plurality of second objects are images acquired by using the second lens; and

training the first encoder based on the visible light image and the dermatoscope image for the plurality of second objects.

4. The method of claim 1, wherein the identifying the first object based on the determined first mapping relationship and the visible-light image feature information comprises:

determining a first feature vector based on the first mapping relation and the visible light image feature information;

inputting the first feature vector into a first classification model to output a corresponding classification result; and

identifying the first object based on the classification result.

5. The method of claim 4, wherein the method further comprises: training the first classification model;

the training the first classification model comprises:

acquiring visible light images and dermatoscope images of a plurality of third objects, wherein the visible light images of the plurality of third objects are images acquired by using the first lens, and the dermatoscope images of the plurality of third objects are images acquired by using the second lens;

determining a plurality of third mapping relationships between the visible light images and the dermatoscope images for the plurality of third objects; and

training the first classification model based on the visible light images for the third objects and the third mappings.

6. An object recognition apparatus comprising:

the device comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining a first mapping relation between a visible light image and a skin mirror image of a first object, the visible light image is an image acquired by using a first lens, the skin mirror image does not need to be acquired, but if the skin mirror image is acquired, a second lens is required to be acquired, and the magnification of the second lens is larger than that of the first lens;

a second determining module, configured to determine corresponding visible light image characteristic information based on the visible light image for the first object; and

and the identification module is used for identifying the first object based on the determined first mapping relation and the visible light image characteristic information.

7. The apparatus of claim 6, wherein the first determining means comprises:

a determination unit configured to determine visible light image data of the visible light image;

a first obtaining unit, configured to input the visible light image data into a first encoder to obtain first feature information output by an intermediate layer of the first encoder, where the first encoder is capable of encoding visible light image data of a target object to obtain dermatoscope image data of the target object, where the target object includes the first object; and

and the processing unit is used for representing the first mapping relation by using the first characteristic information.

8. The apparatus of claim 7, wherein the apparatus further comprises: a training module to train the first encoder;

the training module comprises:

a second acquisition unit configured to acquire visible light images and dermatoscope images for a plurality of second objects, wherein the visible light images for the plurality of second objects are images acquired using the first lens, and the dermatoscope images for the plurality of second objects are images acquired using the second lens; and

a training unit to train the first encoder based on the visible light image and the dermatoscope image for the plurality of second objects.

9. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-5.

10. A computer-readable storage medium storing computer-executable instructions for implementing the method of any one of claims 1 to 5 when executed.