CN115170536A

CN115170536A - Image detection method, model training method and device

Info

Publication number: CN115170536A
Application number: CN202210869385.XA
Authority: CN
Inventors: 蒋旻悦; 何悦
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-07-22
Filing date: 2022-07-22
Publication date: 2022-10-11
Anticipated expiration: 2042-07-22
Also published as: CN115170536B

Abstract

The present disclosure provides an image detection method, a model training method and an image detection device, which relate to the technical field of artificial intelligence, specifically to the technical fields of image processing, computer vision, deep learning, etc., and in particular to scenes such as smart cities and intelligent transportation. The implementation scheme is as follows: acquiring a first image, wherein the first image comprises a first area corresponding to a target object; performing feature extraction on the first image to obtain a first feature; and obtaining a detection result based on the first feature, the detection result indicating a position of the target object in the first image.

Description

Image detection method, model training method and device

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to the field of image processing, computer vision, deep learning, and the like, and in particular, to a method and an apparatus for training an image detection method and a model, an electronic device, a computer-readable storage medium, and a computer program product.

Background

Artificial intelligence is the subject of research that makes computers simulate some human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge map technology and the like.

The image detection based on artificial intelligence obtains the category and position information of the object in the image by obtaining the image and carrying out detection based on the image. How to improve the accuracy and precision of the obtained detection result is a problem which people pay attention to forever.

The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, unless otherwise indicated, the problems mentioned in this section should not be considered as having been acknowledged in any prior art.

Disclosure of Invention

The disclosure provides an image detection method, a model training method, an image detection device, an electronic device, a computer readable storage medium and a computer program product.

According to an aspect of the present disclosure, there is provided an image detection method including: acquiring a first image, wherein the first image comprises a first area corresponding to a target object; performing feature extraction on the first image to obtain a first feature, a similarity of which to a second feature is greater than a preset threshold, the second feature being obtained by feature extraction on a second image obtained based on the first image, the second image being obtained based on the first image and having a second region corresponding to the first region, and a contrast between the second region and another region different from the second region in the second image being greater than a contrast between the first region and another region different from the first region in the first image; and obtaining a detection result based on the first feature, the detection result indicating a position of the target object in the first image.

According to another aspect of the present disclosure, there is provided a training method of an image detection model, including: obtaining a sample image including a sample region corresponding to a target object; obtaining a training image based on the sample image, a contrast between a training image region corresponding to the first sample region and another region different from the training image region in the training image being larger than a contrast between the sample region and another region different from the sample region in the sample image; inputting the sample image to the image detection model and the training image to a trained first model; obtaining a first feature extracted by the image detection model based on the sample image and a second feature extracted by the first model based on the training image; obtaining a second loss based on the first feature and the second feature; and adjusting parameters of the image detection model based on the second loss.

According to another aspect of the present disclosure, there is provided an image detection apparatus including: an image acquisition unit configured to acquire a first image including a first region corresponding to a target object; a feature extraction unit configured to perform feature extraction on the first image, obtain a first feature having a similarity greater than a preset threshold with respect to a second feature obtained by feature extraction of a second image obtained based on the first image, the second image being obtained based on the first image and having a second region corresponding to the first region, and a contrast between the second region and another region different from the second region in the second image being greater than a contrast between the first region and another region different from the first region in the first image; and a detection result acquisition unit configured to obtain a detection result indicating a position of the target object in the first image based on the first feature.

According to another aspect of the present disclosure, there is provided a training apparatus for an image detection model, including: a sample image acquisition unit configured to obtain a sample image including a sample region corresponding to a target object; a training image acquisition unit configured to obtain, based on the sample image, a training image in which a contrast between a training image region corresponding to the first sample region and another region different from the training image region is larger than a contrast between the sample region and another region different from the sample region in the sample image; an image input unit configured to input the sample image to the image detection model and input the training image to a trained first model; a feature input unit configured to obtain a first feature extracted by the image detection model based on the sample image, and a second feature extracted by the first model based on the training image, and a loss calculation unit configured to obtain a second loss based on the first feature and the second feature; and a parameter adjusting unit configured to adjust a parameter of the image detection model based on the second loss.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to the embodiments of the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program, wherein the computer program realizes the method according to embodiments of the present disclosure when executed by a processor.

According to one or more embodiments of the present disclosure, the accuracy of a detection result obtained after image detection is performed on a first image can be improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the embodiments and, together with the description, serve to explain the exemplary implementations of the embodiments. The illustrated embodiments are for purposes of illustration only and do not limit the scope of the claims. Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.

FIG. 1 illustrates a schematic diagram of an exemplary system in which various methods described herein may be implemented, according to an embodiment of the present disclosure;

FIG. 2 shows a flow diagram of an image detection method according to an embodiment of the present disclosure;

FIG. 3 shows a flow diagram of a method of training an image detection model according to an embodiment of the present disclosure;

FIG. 4 shows a flowchart of a process of obtaining a training image based on the sample image in a training method of an image detection model according to an embodiment of the present disclosure;

fig. 5 shows a block diagram of the structure of an image detection apparatus according to an embodiment of the present disclosure;

FIG. 6 shows a block diagram of a training apparatus for an image detection model according to an embodiment of the present disclosure; and

FIG. 7 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of embodiments of the present disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the present disclosure, unless otherwise specified, the use of the terms "first", "second", and the like to describe various elements is not intended to limit the positional relationship, the temporal relationship, or the importance relationship of the elements, and such terms are used only to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, based on the context, they may also refer to different instances.

The terminology used in the description of the various described examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the elements may be one or more. Furthermore, the term "and/or" as used in this disclosure is intended to encompass any and all possible combinations of the listed items.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented in accordance with embodiments of the present disclosure. Referring to fig. 1, the system 100 includes one or

more client devices

101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks 110 coupling the one or more client devices to the server 120.

Client devices

101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.

In embodiments of the present disclosure, the server 120 may run one or more services or software applications that enable the image detection method to be performed.

In some embodiments, the server 120 may also provide other services or software applications, which may include non-virtual environments and virtual environments. In some embodiments, these services may be provided as web-based services or cloud services, for example, provided to users of

client devices

101, 102, 103, 104, 105, and/or 106 under a software as a service (SaaS) model.

In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof, which may be executed by one or more processors. A user operating a

client device

101, 102, 103, 104, 105, and/or 106 may, in turn, utilize one or more client applications to interact with the server 120 to take advantage of the services provided by these components. It should be understood that a variety of different system configurations are possible, which may differ from system 100. Accordingly, fig. 1 is one example of a system for implementing the various methods described herein, and is not intended to be limiting.

The user may receive the obtained detection results using

client devices

101, 102, 103, 104, 105, and/or 106. The client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although fig. 1 depicts only six client devices, those skilled in the art will appreciate that any number of client devices may be supported by the present disclosure.

Client devices

101, 102, 103, 104, 105, and/or 106 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptop computers), workstation computers, wearable devices, smart screen devices, self-service terminal devices, service robots, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and so forth. These computer devices may run various types and versions of software applications and operating systems, such as MICROSOFT Windows, APPLE iOS, UNIX-like operating systems, linux, or Linux-like operating systems (e.g., GOOGLE Chrome OS); or include various Mobile operating systems such as MICROSOFT Windows Mobile OS, iOS, windows Phone, android. Portable handheld devices may include cellular telephones, smart phones, tablets, personal Digital Assistants (PDAs), and the like. Wearable devices may include head-mounted displays (such as smart glasses) and other devices. The gaming system may include a variety of handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), short Message Service (SMS) applications, and may use a variety of communication protocols.

Network 110 may be any type of network known to those skilled in the art that may support data communications using any of a variety of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. By way of example only, one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a blockchain network, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.

The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture involving virtualization (e.g., one or more flexible pools of logical storage that may be virtualized to maintain virtual storage for the server). In various embodiments, the server 120 may run one or more services or software applications that provide the functionality described below.

The computing units in server 120 may run one or more operating systems including any of the operating systems described above, as well as any commercially available server operating systems. The server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, and the like.

In some implementations, the server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of the

client devices

101, 102, 103, 104, 105, and/or 106. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of

client devices

101, 102, 103, 104, 105, and/or 106.

In some embodiments, the server 120 may be a server of a distributed system, or a server incorporating a blockchain. The server 120 may also be a cloud server, or a smart cloud computing server or a smart cloud host with artificial intelligence technology. The cloud Server is a host product in a cloud computing service system, and is used for solving the defects of high management difficulty and weak service expansibility in the traditional physical host and Virtual Private Server (VPS) service.

The system 100 may also include one or more databases 130. In some embodiments, these databases may be used to store data and other information. For example, one or more of the databases 130 may be used to store information such as audio files and video files. The database 130 may reside in various locations. For example, the database used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The database 130 may be of different types. In certain embodiments, the database used by the server 120 may be, for example, a relational database. One or more of these databases may store, update, and retrieve data to and from the database in response to the command.

In some embodiments, one or more of the databases 130 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key-value stores, object stores, or regular stores supported by a file system.

The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.

According to an aspect of the present disclosure, an image detection method is provided. As shown in fig. 2, an image detection method 200 according to some embodiments of the present disclosure includes:

step S210: acquiring a first image, wherein the first image comprises a first area corresponding to a target object;

step S220: performing feature extraction on the first image to obtain a first feature, a similarity of which to a second feature is greater than a preset threshold, the second feature being obtained by feature extraction on a second image obtained based on the first image, the second image being obtained based on the first image and having a second region corresponding to the first region, and a contrast between the second region and another region different from the second region in the second image being greater than a contrast between the first region and another region different from the first region in the first image; and

step S230: based on the first feature, a detection result is obtained, the detection result indicating a position of the target object in the first image.

In the related art, an image to be detected is input to a trained image detection model, so that the image detection model obtains a detection result based on image characteristics of the image, wherein the image detection model is obtained by performing supervised training by using a training sample and a label of the training sample, and the generalization capability of the model is poor. When the image to be detected is fuzzy, the model often cannot obtain an accurate detection result based on the features extracted from the image.

In the embodiment according to the present disclosure, the first feature is obtained by performing feature extraction on the first image, the similarity between the first feature and the second feature obtained based on the second image is greater than the similarity threshold, and the contrast between the region of the target object and another region different from the region in the second image is greater than that in the first image, that is, the target object in the second image is more easily distinguished, so that the image feature of the second image, which is more easily distinguished from the target object, can be obtained based on the first image, and an accurate detection result is more easily obtained.

In some embodiments, the first image may be an image captured by any camera device.

In some embodiments, the target object may be any object to be detected contained in the first image.

In some embodiments, the first image is an image captured by a vehicle-mounted camera, and the target object includes at least one of: lane lines, vehicles, and traffic cones.

In some embodiments, the performing feature extraction on the first image, obtaining a first feature comprises:

inputting the target image to a first model, obtaining the first feature based on a feature extraction network of the first model, wherein,

the first model is obtained by training based on a second model, wherein the first model has a sample image as an input, and the second model has a training image as an input, wherein the sample image contains a sample region corresponding to the target object, the training image is obtained based on the sample image, and a contrast between a training image region corresponding to the sample region and another region different from the training image region in the training image is larger than a contrast between the sample region and another region different from the sample region in the sample image.

In the process of training the first model based on the second model, the sample image is input into the first model, and the training image which is obtained based on the sample image and has the contrast between the region where the target object is located and other regions larger than that of the sample image is input into the second model, so that the first model after being trained by the second model under guidance can extract the image characteristics extracted based on the training image based on the sample image and the second model, namely the first model can extract the image characteristics of the image with the contrast between the region where the target object is located and other regions based on the image with the contrast being smaller than that of the region where the target object is located and other regions, and can more easily separate the target object in the image based on the image characteristics, so that the obtained detection result is more accurate.

In some embodiments, the number of parameters of the first model is the same as the number of parameters of the second model.

In other embodiments, the number of parameters of the first model is less than the number of parameters of the second model.

Because the number of parameters of the second model is large, the precision of the features obtained based on the training image is high, and after the first model is trained based on the guidance of the second model, the features obtained by the first model also have the precision of the features obtained by the second model, so that a more accurate detection result can be obtained.

In some embodiments, the training image can be a saliency image obtained after processing of the specimen image to highlight the specimen region.

The highlighted image does not substantially change the sample image, only changing the brightness of the area where the target object is located, so that the target object can be highlighted to be easily recognized and detected.

In some embodiments, said process of highlighting said sample region in said sample image comprises at least one of:

increasing the brightness of the sample region; and

reducing the brightness of other regions of the sample image that are distinct from the sample region.

In some embodiments, the training images comprise fused images obtained by:

processing the sample image to highlight the sample region to obtain a third image; and

and fusing the third image and the sample image.

The training image is obtained by fusing the third image obtained by processing the sample image to highlight the sample region with the sample image, so that the highlighting degree of the target object in the training image is lower than that of the third image, the phenomenon that in the process of training the first model based on the second model, the difference between the image characteristics extracted by the second model based on the training image and the image characteristics extracted by the first model based on the sample image is too large, the first model cannot be converged is avoided, the process of training the first model is smooth, and the first model with better generalization is obtained.

In some embodiments, the fused image comprises a first proportion of the sample image and a second proportion of the third image, the sum of the first proportion and the second proportion being 1.

For example, the pixel value of the corresponding position in the fused image is obtained by adding the pixel value of each position in the first image multiplied by the first ratio to the pixel value of the corresponding position in the third image multiplied by the second ratio.

In some embodiments, the first ratio ranges from 0.1 to 0.9.

In one example, the first ratio is 0.3 and the second ratio is 0.7.

In some embodiments, the detection result is obtained by inputting the first feature into a classification network.

In some embodiments, the detection result further indicates a category of the target object. For example, when a plurality of target objects are included in the first image, the category of each target object can distinguish the target object from the plurality of target objects.

In some embodiments, the detection result is embodied as a segmentation result.

According to another aspect of the present disclosure, there is also provided a training method of an image detection model, as shown in fig. 3, the method 300 includes:

step S310: obtaining a sample image including a sample region corresponding to a target object;

step S320: obtaining a training image based on the sample image, a contrast between a training image region corresponding to the first sample region and another region different from the training image region in the training image being larger than a contrast between the sample region and another region different from the sample region in the sample image;

step S330: inputting the sample image to the image detection model and the training image to a trained first model; and

step S340: obtaining a first feature extracted by the image detection model based on the sample image and a second feature extracted by the first model based on the training image;

step S350: obtaining a second loss based on the first feature and the second feature; and

step S360: adjusting parameters of the image detection model based on the second loss.

In the related art, a supervised training image detection model is usually performed based on training samples and sample labels of the training samples, and the generalization capability of the model is poor. When the image to be detected is fuzzy, the image detection model often cannot obtain an accurate detection result based on the features extracted from the image.

According to an embodiment of the present disclosure, in training an image detection model based on a first model, by inputting a sample image to the image detection model and inputting a training image, which is obtained based on the sample image and in which a contrast between a region where a target object is located and other regions is larger than the sample image, to the first model, data input in the image detection model and the first model are made different in training the image detection model. Due to the fact that the contrast between the region where the target object is located in the training image and other regions is larger, namely the target image is easier to recognize and detect in the training image, accurate detection results can be easier to obtain by the first model based on the image features extracted by the training image.

After the image detection model is trained under the guidance of the first model, the image detection model can extract image features based on the input sample image and the first model based on the training image, namely the image detection model can extract image features based on the image with small contrast between the region where the target object is located and other regions, and can extract image features based on the image with large contrast between the region where the target object is located and other regions, so that accurate detection results can be obtained more easily based on the image features, and the trained image detection model can obtain more accurate detection results.

In some embodiments, the sample image may be any image obtained by any camera device.

In some embodiments, the target object may be any object to be detected.

In some embodiments, the sample image is an image captured by an onboard camera, and the target object includes at least one of: lane lines, vehicles, and traffic cones.

In some embodiments, as shown in fig. 4, obtaining a training image based on the sample image comprises:

step S410: processing the sample image to highlight the sample region to obtain a highlighted image; and

step S420: obtaining the training image based on the salient image.

The highlighted image does not substantially change the sample image, only changes the brightness of the area where the target object is located, so that the target object can be highlighted to be easily recognized and detected. The training image is obtained based on the highlighted image, so that the detection result obtained based on the image features extracted from the training image is more accurate.

In some embodiments, said process of highlighting said sample region on said sample image comprises at least one of:

increasing the brightness of the sample region; and

In some embodiments, the salient image is taken as the training image.

In some embodiments, said obtaining said training image based on said saliency image comprises:

and fusing the highlighted image and the sample image to obtain the training image.

The training image is obtained by fusing the salient image and the sample image, so that the salient degree of the target object in the training image is lower than that of the salient image, the phenomenon that in the process of training the image detection model based on the first model, the difference between the image characteristics extracted by the first model based on the training image and the image characteristics extracted by the image detection model based on the sample image is too large, the phenomenon that the image detection model cannot be converged is avoided, the process of training the image detection model is smooth, and the image detection model with better generalization is obtained.

In some embodiments, the training image comprises a first proportion of the sample image and a second proportion of the salient image, the sum of the first proportion and the second proportion being 1.

In some embodiments, the first ratio ranges from 0.1 to 0.9.

In one example, the first ratio is 0.3 and the second ratio is 0.7.

In some embodiments, the training method of the image detection model according to the present disclosure further includes:

obtaining an annotation label corresponding to the sample image, and obtaining a prediction result output by the image detection model;

obtaining a second loss based on the labeling label and the prediction result; and

adjusting parameters of the image detection model based on the second loss.

In some embodiments, the annotation tag indicates a location of the target object in the sample image or a category of the target object.

obtaining a first prediction result output by the image detection model and a second prediction result output by the first model;

obtaining a third loss based on the first prediction result and the second prediction result; and

adjusting parameters of the image detection model based on the third loss.

In some embodiments, the number of parameters of the image detection model is the same as the number of parameters of the first model.

In other embodiments, the number of parameters of the image detection model is less than the number of parameters of the first model.

Because the number of the parameters of the first model is large, the precision of the features obtained based on the training images is high, and after the image detection model is guided and trained based on the first model, the features obtained by the image detection model also have the precision of the features obtained by the first model, so that a more accurate detection result can be obtained.

In some embodiments, in the method of detecting a model according to the present disclosure, the first model is obtained by training using a salient image.

According to another aspect of the present disclosure, there is provided an image detection apparatus, as shown in fig. 5, the apparatus 500 including: an image acquisition unit 510 configured to acquire a first image including a first region corresponding to a target object; a feature extraction unit 520 configured to perform feature extraction on the first image, obtain a first feature having a similarity greater than a preset threshold with respect to a second feature obtained by feature extraction on a second image obtained based on the first image, the second image being obtained based on the first image and having a second region corresponding to the first region, and a contrast between the second region and another region different from the second region in the second image being greater than a contrast between the first region and another region different from the first region in the first image; and a detection result obtaining unit 530 configured to obtain a detection result indicating a position of the target object in the first image based on the first feature.

In some embodiments, the feature extraction unit includes: an image input unit configured to input the target image to a first model, the first feature being obtained based on a feature extraction network of the first model, wherein the first model is obtained based on a second model obtained by training, wherein the first model has a sample image as an input, and the second model has a training image as an input, wherein the sample image contains a sample region corresponding to the target object, the training image is obtained based on the sample image, and a contrast between a training image region corresponding to the sample region and another region different from the training image region in the training image is larger than a contrast between the sample region and another region different from the sample region in the sample image.

In some embodiments, the training image comprises a fused image obtained by: processing the sample image to highlight the sample region to obtain a third image; and fusing the third image and the sample image.

In some embodiments, said process of highlighting said sample region in a sample image comprises at least one of: increasing the brightness of the sample region; and reducing the brightness of other regions of the sample image that are distinct from the sample region.

In some embodiments, the fused image comprises a first scale of the sample image and a second scale of the third image, the sum of the first scale and the second scale being 1.

In some embodiments, the first ratio is in the range of 0.1-0.9.

In some embodiments, the number of parameters of the second model is not less than the number of parameters of the first model.

In some embodiments, the first image comprises an image obtained by an onboard camera, and the target object comprises at least one of a lane line, a traffic cone, and the like.

According to another aspect of the present disclosure, there is also provided an apparatus for training an image detection model, as shown in fig. 6, the apparatus 600 includes: a sample image acquisition unit 610 configured to obtain a sample image including a sample region corresponding to a target object; a training image acquisition unit 620 configured to obtain, based on the sample image, a training image in which a contrast between a training image region corresponding to the first sample region and another region different from the training image region is larger than a contrast between the sample region and another region different from the sample region in the sample image; an image input unit 630 configured to input the sample image to the image detection model and the training image to the trained first model; a feature input unit 640 configured to obtain a first feature extracted by the image detection model based on the sample image and a second feature extracted by the first model based on the training image; a loss calculating unit 650 configured to obtain a second loss based on the first feature and the second feature; and a parameter adjusting unit 660 configured to adjust a parameter of the image detection model based on the second loss.

In some embodiments, the training image acquisition unit comprises: a highlight processing unit configured to perform processing of highlighting the sample region on the sample image to obtain a highlight image; and a training image acquisition subunit configured to obtain the training image based on the saliency image.

In some embodiments, the process of highlighting the sample region on the sample image comprises at least one of: increasing the brightness of the sample region; and reducing the brightness of other regions of the sample image that are distinct from the sample region.

In some embodiments, the training image acquisition subunit includes: a fusion unit configured to fuse the salient image and the sample image to obtain the training image.

In some embodiments, the first ratio ranges from 0.1 to 0.9.

In some embodiments, the sample image comprises an image obtained by an in-vehicle camera, and the target object comprises at least one of: lane lines, traffic cones, and vehicles.

According to an embodiment of the present disclosure, there is also provided an electronic device, a readable storage medium, and a computer program product.

Referring to fig. 7, a block diagram of a structure of an electronic device 700, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic device is intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the electronic device 700 includes a computing unit 701, which may perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the electronic device 700 can also be stored. The calculation unit 701, the ROM 702, and the RAM703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

A number of components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706, an output unit 707, a storage unit 708, and a communication unit 709. The input unit 706 may be any type of device capable of inputting information to the electronic device 700, and the input unit 706 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a track pad, a track ball, a joystick, a microphone, and/or a remote controller. Output unit 707 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. Storage unit 708 may include, but is not limited to, magnetic or optical disks. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers, and/or chipsets, such as bluetooth (TM) devices, 802.11 devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 701 performs the various methods and processes described above, such as the method 200. For example, in some embodiments, the method 200 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM 702 and/or the communication unit 709. When the computer program is loaded into RAM703 and executed by the computing unit 701, one or more steps of the method 200 described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the method 200 by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be performed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the above-described methods, systems and apparatus are merely exemplary embodiments or examples and that the scope of the present invention is not limited by these embodiments or examples, but only by the claims as issued and their equivalents. Various elements in the embodiments or examples may be omitted or may be replaced with equivalents thereof. Further, the steps may be performed in an order different from that described in the present disclosure. Further, various elements in the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced with equivalent elements that appear after the present disclosure.

Claims

1. An image detection method, comprising:

acquiring a first image, wherein the first image comprises a first area corresponding to a target object;

performing feature extraction on the first image to obtain a first feature, a similarity of which to a second feature is greater than a preset threshold, the second feature being obtained by feature extraction on a second image obtained based on the first image, the second image being obtained based on the first image and having a second region corresponding to the first region, and a contrast between the second region and another region different from the second region in the second image being greater than a contrast between the first region and another region different from the first region in the first image; and

based on the first feature, a detection result is obtained, the detection result indicating a position of the target object in the first image.

2. The method of claim 1, wherein the performing feature extraction on the first image to obtain first features comprises:

inputting the target image into a first model, obtaining the first feature based on a feature extraction network of the first model, wherein,

3. The method of claim 2, wherein the training image comprises a fused image obtained by:

and fusing the third image and the sample image.

4. The method of claim 3, wherein the processing of the sample image to highlight the sample region comprises at least one of:

increasing the brightness of the sample region; and

5. The method of claim 3, wherein the fused image comprises a first scale of the sample image and a second scale of the third image, the first scale and the second scale summing to 1.

6. The method of claim 1, wherein the number of parameters of the second model is not less than the number of parameters of the first model.

7. The method of claim 1, wherein the first image comprises an image obtained by an onboard camera, the target object comprising: lane lines, vehicles, or traffic cones.

8. A training method of an image detection model comprises the following steps:

obtaining a sample image including a sample region corresponding to a target object;

obtaining a training image based on the sample image, a contrast between a training image region corresponding to the first sample region and another region different from the training image region in the training image being larger than a contrast between the sample region and another region different from the sample region in the sample image;

inputting the sample image to the image detection model and the training image to a trained first model;

obtaining a first feature extracted by the image detection model based on the sample image and a second feature extracted by the first model based on the training image;

obtaining a second loss based on the first feature and the second feature; and

adjusting parameters of the image detection model based on the second loss.

9. The method of claim 8, wherein the obtaining a training image based on the sample image comprises:

processing the sample image to highlight the sample region to obtain a highlighted image; and

obtaining the training image based on the salient image.

10. The method of claim 9, wherein the processing of the sample image to highlight the sample region comprises at least one of:

increasing the brightness of the sample region; and

11. The method of claim 9, wherein the obtaining the training image based on the salient image comprises:

12. The method of claim 11, wherein the training image comprises a first proportion of the sample image and a second proportion of the salient image, the first proportion and the second proportion summing to 1.

13. The method of claim 8, wherein the sample image is an image captured by a vehicle-mounted camera, and the target object comprises: lane lines, vehicles, or traffic cones.

14. An image detection apparatus comprising:

an image acquisition unit configured to acquire a first image including a first region corresponding to a target object;

a feature extraction unit configured to perform feature extraction on the first image, obtain a first feature having a similarity greater than a preset threshold with respect to a second feature obtained by feature extraction of a second image obtained based on the first image, the second image being obtained based on the first image and having a second region corresponding to the first region, and a contrast between the second region and another region different from the second region in the second image being greater than a contrast between the first region and another region different from the first region in the first image; and

a detection result acquisition unit configured to obtain a detection result indicating a position of the target object in the first image based on the first feature.

15. The apparatus of claim 14, wherein the feature extraction unit comprises:

an image input unit configured to input the target image to a first model, the first feature being obtained based on a feature extraction network of the first model, wherein,

16. The apparatus of claim 15, wherein the training image comprises a fused image obtained by:

and fusing the third image and the sample image.

17. The apparatus of claim 16, wherein the processing of the sample image to highlight the sample region comprises at least one of:

increasing the brightness of the sample region; and

18. The apparatus of claim 17, wherein the fused image comprises a first proportion of the sample image and a second proportion of the third image, a sum of the first proportion and the second proportion being 1.

19. The apparatus of claim 15, wherein the number of parameters of the second model is not less than the number of parameters of the first model.

20. The apparatus of claim 14, wherein the first image comprises an image obtained by an in-vehicle camera, the target object comprising: lane lines, traffic cones, or vehicles.

21. An apparatus for training an image detection model, comprising:

a sample image acquisition unit configured to obtain a sample image including a sample region corresponding to a target object;

a training image acquisition unit configured to obtain, based on the sample image, a training image in which a contrast between a training image region corresponding to the first sample region and another region different from the training image region is larger than a contrast between the sample region and another region different from the sample region in the sample image;

an image input unit configured to input the sample image to the image detection model and input the training image to a trained first model;

a feature input unit configured to obtain a first feature extracted by the image detection model based on the sample image, and a second feature extracted by the first model based on the training image

A loss calculation unit configured to obtain a second loss based on the first feature and the second feature; and

a parameter adjusting unit configured to adjust a parameter of the image detection model based on the second loss.

22. The apparatus of claim 21, wherein the training image acquisition unit comprises:

a highlight processing unit configured to perform processing of highlighting the sample region on the sample image to obtain a highlight image; and

a training image acquisition subunit configured to obtain the training image based on the saliency image.

23. The apparatus of claim 22, wherein the processing of the sample image to highlight the sample region comprises at least one of:

increasing the brightness of the sample region;

24. The apparatus of claim 21, wherein the training image acquisition subunit comprises:

a fusion unit configured to fuse the salient image and the sample image to obtain the training image.

25. The apparatus of claim 24, wherein the training image comprises a first scale of the sample image and a second scale of the salient image, a sum of the first scale and the second scale being 1.

26. The apparatus of claim 21, wherein the sample image comprises an image obtained by an onboard camera, the target object comprising: lane lines, traffic cones, or vehicles.

27. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-13.

28. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-13.

29. A computer program product comprising a computer program, wherein the computer program realizes the method of any one of claims 1-13 when executed by a processor.