CN111553283A

CN111553283A - Method and device for generating model

Info

Publication number: CN111553283A
Application number: CN202010357239.XA
Authority: CN
Inventors: 蒋旻悦; 杨喜鹏; 谭啸; 孙昊; 章宏武; 文石磊; 丁二锐
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-04-29
Filing date: 2020-04-29
Publication date: 2020-08-18
Anticipated expiration: 2040-04-29
Also published as: CN111553283B

Abstract

The embodiment of the disclosure discloses a method and a device for generating a model. One embodiment of the method comprises: acquiring a plurality of sample scene images and a sample non-scene image corresponding to each of the plurality of sample scene images; extracting a pre-established generative countermeasure network; and training the generating network and the judging network by using a machine learning method and taking each sample scene image in the plurality of sample scene images as the input of the generating network, taking a sample non-scene image corresponding to an image output by the generating network and the sample scene image input into the generating network as the input of the judging network, and determining the generated network after training as a non-scene image generating model. The embodiment improves the identification accuracy of the target object in the cross-scene.

Description

Method and device for generating model

Technical Field

The embodiment of the disclosure relates to the technical field of image processing, in particular to a method and a device for generating a model.

Background

Object recognition refers to the process by which a particular object of interest (or type of object of interest) is distinguished from other objects of interest (or other types of objects). The target object may be a vehicle, a pedestrian, or the like.

Disclosure of Invention

The embodiment of the disclosure provides a method and a device for generating a model.

In a first aspect, an embodiment of the present disclosure provides a method for generating a model, the method including: acquiring a plurality of sample scene images and a sample non-scene image corresponding to each of the plurality of sample scene images, wherein the sample scene images and the sample non-scene images contain object images of the same target object; extracting a pre-established generative confrontation network, wherein the generative confrontation network comprises a generation network and a discrimination network, the generation network is used for generating a non-scene image by using an input scene image, and the discrimination network is used for determining whether the image input into the discrimination network is an image output by the generation network; and training the generating network and the judging network by using a machine learning method and taking each sample scene image in the plurality of sample scene images as the input of the generating network, taking a sample non-scene image corresponding to an image output by the generating network and the sample scene image input into the generating network as the input of the judging network, and determining the generated network after training as a non-scene image generating model.

In some embodiments, the sample scene image is acquired by: determining at least one target object in the sample image; for a target object of the at least one target object, at least one sample scene image of each of at least one scene of the target object is acquired.

In some embodiments, the training the generation network and the discriminant network and determining the trained generation network as the non-scene image generation model includes: the following training steps are performed: fixing parameters of a generation network, taking each sample scene image in the plurality of sample scene images as the input of the generation network, taking an image output by the generation network and a sample non-scene image corresponding to the sample scene image input to the generation network as the input of a discrimination network, and training the discrimination network by using a machine learning method; fixing the parameters of the trained discrimination network, taking each sample scene image in the plurality of sample scene images as the input of a generation network, and training the generation network by utilizing a machine learning method, a cross entropy loss function and a triplet loss function; and determining the accuracy of the discrimination result output by the trained discrimination network, and determining the generation network trained most recently as a non-scene image generation model in response to the determination accuracy being within a preset numerical range.

In some embodiments, the training the generation network and the discriminant network, and determining the trained generation network as the non-scene image generation model further includes: and in response to determining that the accuracy is outside the preset numerical range, re-executing the training steps using the generation network and the discrimination network which were trained last time.

In some embodiments, for each of the plurality of sample scene images, the sample scene image includes object image features that are at least partially the same as object image features included in the sample non-scene image corresponding to the sample scene image.

In some embodiments, the above method further comprises: acquiring a reference target object image and a scene image to be processed; inquiring a reference scene image of the reference target object image; inputting the reference scene image into the non-scene image generation model to obtain at least one non-scene image corresponding to the reference scene image; in response to the existence of a target non-scene image corresponding to the scene image to be processed in the at least one non-scene image, acquiring target characteristic information corresponding to the reference target object image in the target non-scene image; and identifying a target object image from the scene image to be processed according to the target characteristic information.

In a second aspect, an embodiment of the present disclosure provides an apparatus for generating a model, the apparatus including: a sample image acquisition unit configured to acquire a plurality of sample scene images and a sample non-scene image corresponding to each of the plurality of sample scene images, wherein the sample scene images and the sample non-scene images contain object images of the same target object; an extraction unit configured to extract a generative confrontation network established in advance, wherein the generative confrontation network includes a generation network for generating a non-scene image using an input scene image and a discrimination network for determining whether an image input to the discrimination network is an image output by the generation network; and a model generation unit configured to train the generation network and the discrimination network by using a machine learning method with each of the plurality of sample scene images as an input of the generation network, and with a sample non-scene image corresponding to a sample scene image input to the generation network and an image output from the generation network as an input of the discrimination network, and determine the generation network after the training as the non-scene image generation model.

In some embodiments, the apparatus comprises a sample determination unit configured to determine a sample scene image, the sample determination unit comprising: a target object determination subunit configured to determine at least one target object in the sample image; a sample scene image determining subunit configured to acquire, for a target object of the at least one target object, at least one sample scene image of each scene of the at least one scene of the target object.

In some embodiments, the model generating unit includes: a model training subunit configured to fix parameters of a generation network, to take each of the plurality of sample scene images as an input of the generation network, to take an image output by the generation network and a sample non-scene image corresponding to the sample scene image input to the generation network as an input of a discrimination network, and to train the discrimination network by a machine learning method; fixing the parameters of the trained discrimination network, taking each sample scene image in the plurality of sample scene images as the input of a generation network, and training the generation network by utilizing a machine learning method, a cross entropy loss function and a triplet loss function; and determining the accuracy of the discrimination result output by the trained discrimination network, and determining the generation network trained most recently as a non-scene image generation model in response to the determination accuracy being within a preset numerical range.

In some embodiments, the model generating unit further includes: a model determining subunit, responsive to determining that the accuracy is outside the preset range of values, configured to re-perform the training steps using the most recently trained generating network and discriminating network.

In some embodiments, the above apparatus further comprises: an image acquisition unit configured to acquire a reference target object image and a scene image to be processed; a reference scene image search unit configured to search for a reference scene image of the reference target object image; a non-scene image generation unit configured to input the reference scene image to the non-scene image generation model, and obtain at least one non-scene image corresponding to the reference scene image; a target feature information acquiring unit configured to acquire target feature information corresponding to the reference target object image in the target non-scene image in response to a presence of a target non-scene image corresponding to the to-be-processed scene image in the at least one non-scene image; and the identification unit is configured to identify a target object image from the scene image to be processed according to the target characteristic information.

In a third aspect, an embodiment of the present disclosure provides an electronic device/terminal/server, including: one or more processors; memory having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to perform the method for generating a model of the first aspect.

In a fourth aspect, an embodiment of the present disclosure provides a computer-readable medium, on which a computer program is stored, which is characterized in that the program, when executed by a processor, implements the method for generating a model of the first aspect described above.

According to the method and the device for generating the model, firstly, a sample scene image and a sample non-scene image which contain a target object are obtained, wherein the sample scene image is an image obtained under various different scenes, and the target object can be identified under different scenes; and then training the sample scene image and the sample non-scene image through a generative confrontation network to obtain a non-scene image generation model. The method and the device improve the identification accuracy of the target object under the cross-scene condition.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present application;

FIG. 2 is a schematic diagram according to a second embodiment of the present application;

FIG. 3 is a schematic illustration according to a third embodiment of the present application;

FIG. 4 is a schematic illustration according to a fourth embodiment of the present application;

FIG. 5 is a block diagram of an electronic device for implementing a method for generating a model of an embodiment of the present application;

FIG. 6 is a schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 illustrates an exemplary system architecture 100 of a method for generating a model or an apparatus for generating a model to which embodiments of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The

terminal devices

101, 102, 103 interact with a server 105 via a network 104 to receive or send messages or the like. Various image applications, such as an image capture application, an image editing application, a video capture application, etc., may be installed on the

terminal devices

101, 102, 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having lenses and supporting image capturing, including but not limited to monitoring lenses, smart cameras, road condition cameras, vehicle-mounted cameras, and so on. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as a plurality of software or software modules (for example, for providing distributed services), or as a single software or software module, which is not specifically limited herein.

The server 105 may be a server that provides various services, such as an identification server that provides support for images sent from the

terminal apparatuses

101, 102, 103. The recognition server may perform processing such as analysis on the received data such as the image, and feed back the processing result (e.g., the image in which the target object image is marked) to the terminal device.

It should be noted that the method for generating the model provided by the embodiment of the present disclosure is generally performed by the server 105, and accordingly, the apparatus for generating the model is generally disposed in the server 105.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (for example, to provide distributed services), or may be implemented as a single software or software module, and is not limited specifically herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for generating a model according to the present disclosure is shown. The method for generating a model comprises the following steps:

step 201, acquiring a plurality of sample scene images and a sample non-scene image corresponding to each sample scene image in the plurality of sample scene images.

In this embodiment, the executing subject (for example, the server 105 shown in fig. 1) of the method for generating the model may obtain the sample scene images and the sample non-scene images from the

terminal devices

101, 102, and 103 through a wired connection manner or a wireless connection manner, it should be noted that the wireless connection manner may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other now known or later developed wireless connection manners.

In the prior art, when a target object is identified, the target object is generally identified only; when a target object appears in other images, the target object is often not easily recognized, resulting in low recognition accuracy of the target object.

In order to accurately identify a target object in different scenes, an executive subject of the application first acquires a plurality of sample scene images and a sample non-scene image corresponding to each sample scene image in the plurality of sample scene images. The sample scene image may be an image normally acquired by the

terminal device

101, 102, 103. For example, the sample scene image may be an image containing the target vehicle at an intersection, an image of a parking lot, an image in driving, and the like. Where intersections, parking lots, and driving can be considered as the scene where the target vehicle is located. The sample non-scene image may be an image obtained by adjusting the sample scene image based on the sample scene image. For example, the sample scene image is adjusted by changing the brightness, contrast, color, etc. of the image to obtain a corresponding sample non-scene image. That is, the sample scene image and the sample non-scene image contain object images of the same target object. The sample scene image and the object image have a correspondence relationship, and the sample non-scene image and the object image have a correspondence relationship.

In step 202, a pre-established generative countermeasure network is extracted.

In order to train the recognition of target objects in different scenarios, the executive may extract a pre-established generative confrontation network. The generative confrontation network may include a generation network and a discrimination network, the generation network may be configured to generate a non-scene image using the input scene image, and the discrimination network may be configured to determine whether the image input to the discrimination network is an image output by the generation network.

Step 203, using a machine learning method, using each of the plurality of sample scene images as an input of a generation network, using an image output by the generation network and a sample non-scene image corresponding to the sample scene image input to the generation network as an input of a discrimination network, training the generation network and the discrimination network, and determining the trained generation network as a non-scene image generation model.

The execution subject can apply the sample scene image and the sample non-scene image to the generation network and the discrimination network contained in the generative confrontation network through various learning methods to obtain a non-scene image generation model. The non-scene image generation model can be used for generating output images of objects in other scenes according to input images of the objects in a set scene. For example, the number of all scenes of the vehicle is 10, and the execution subject stores images of the vehicle a in 4 scenes. When the vehicle A appears in other 6 scenes, images of the vehicle A in other 6 scenes can be simulated through the non-scene image generation model, and then the images of the vehicle A in other 6 scenes are identified. That is, the non-scene image generation model of the present application can control generation of an output image corresponding to a scene. Therefore, images under different scenes can be generated through the non-scene image generation model, and accuracy of object identification under the cross-scene is improved.

With continued reference to FIG. 3, a flow 300 of one embodiment of a method for generating a model according to the present disclosure is shown. The method for generating a model comprises the following steps:

at step 301, at least one target object in the sample image is determined.

The sample image may be an image acquired by the

terminal device

101, 102, 103, and includes object information and scene information where the object is located. The scene information may characterize the scene in which the object is located. For example, the scene information may be an intersection, a station, and the like. Therefore, the execution subject needs to first determine the target object from the sample image. For example, the execution subject acquires a plurality of sample images including a vehicle a and a vehicle B. The enforcement agent may mark vehicle a and vehicle B as target objects.

At step 302, for a target object of the at least one target object, at least one sample scene image of each scene of the at least one scene of the target object is obtained.

After the target object is determined, the execution subject may divide the sample image containing the target object by scene. Therefore, the sample scene image of the target object in each scene is obtained, and the target object can be identified according to the scene.

On the basis, the execution subject can perform various image processing on the sample scene image to obtain a sample non-scene image.

Step 303, acquiring a plurality of sample scene images and a sample non-scene image corresponding to each sample scene image in the plurality of sample scene images.

The content of step 303 is the same as that of step 201, and is not described in detail here. For each sample scene image in the plurality of sample scene images, the object image features included in the sample scene image are at least partially the same as the object image features included in the sample non-scene image corresponding to the sample scene image. When the non-scene image generation model is trained, the input image and the output image of the non-scene image generation model are both images related to the same object in different scenes.

Step 304, extracting the pre-established generative countermeasure network.

The content of step 304 is the same as that of step 202, and is not described in detail here.

Step 305, using a machine learning method, using each of the plurality of sample scene images as an input of a generation network, using an image output by the generation network and a sample non-scene image corresponding to the sample scene image input to the generation network as an input of a discrimination network, training the generation network and the discrimination network, and determining the generation network after training as a non-scene image generation model.

The content of step 305 is the same as that of step 203, and is not described in detail here.

In some optional implementation manners of this embodiment, the training the generation network and the discrimination network, and determining the trained generation network as the non-scene image generation model may include:

in the first step, the execution agent may fix parameters of the generation network, use each of the plurality of sample scene images as an input of the generation network, use an image output from the generation network and a sample non-scene image corresponding to the sample scene image input to the generation network as an input of the discrimination network, and train the discrimination network using a machine learning method.

And secondly, the execution subject can fix the parameters of the trained discrimination network, each sample scene image in the plurality of sample scene images is used as the input of the generation network, and the generation network is trained by utilizing a machine learning method, a cross entropy loss function and a triplet loss function. It should be understood that the above machine learning method, cross entropy loss function and triple loss function are well known technologies that are currently widely studied and applied, and are not described herein.

And thirdly, the execution subject can determine the accuracy of the discrimination result output by the trained discrimination network, and in response to the determination accuracy being within a preset numerical range (for example, 45% -55%), determine the generation network trained most recently as the non-scene image generation model. In this case, the non-scene image generation model can generate an output image of an object in another scene from an input image of the object in the set scene. Further, in response to determining that the accuracy is outside the preset range of values, the training steps are re-performed using the most recently trained generation network and discrimination network.

It is also clear from the above examples. The number of all scenes of the vehicle is 10, and the execution body stores images of the vehicle a in 4 scenes. The execution subject may first adjust the sample scene image in 4 scenes to an image in the other 6 scenes. Then, the non-scene image generation model can accurately generate images in other 6 scenes through images in 4 scenes by the generation network and the discrimination network. Correspondingly, when the corresponding sample non-scene image is obtained from the sample scene image, the sample non-scene image should be adjusted to the image corresponding to the other 6 scenes as much as possible. Therefore, the data volume of the main body processing image is reduced, and the target object can be accurately identified in the cross-scene.

With further reference to fig. 4, a flow 400 of yet another embodiment of a method for identifying an object is shown. The method for identifying an object is based on the embodiments described above and corresponding to fig. 2 and 3. The process 400 of the method for identifying an object includes the steps of:

step 401, acquiring a reference target object image and a scene image to be processed.

In the present embodiment, the execution subject (for example, the server 105 shown in fig. 1) of the method for identifying an object may acquire the reference target object image and the to-be-processed scene image by a wired connection manner or a wireless connection manner. The reference target object image is an image of an existing target object of the execution subject. The scene image to be processed is an image of the execution subject in a scene other than the set scene, the scene image including the target object. With the above example, the executive is now an image of vehicle a in 4 scenes. The image of the scene to be processed is the image of the vehicle a appearing in the other 6 scenes.

Step 402, querying a reference scene image of the reference target object image.

The reference scene image herein may refer to an image of the vehicle a in 4 scenes, that is, an image existing in the execution subject.

Step 403, inputting the reference scene image into the non-scene image generation model, and obtaining at least one non-scene image corresponding to the reference scene image.

As is apparent from the above description, the non-scene image generation model is capable of generating an image of a scene different from the input image of the object. Therefore, the execution subject may input the reference scene image to the non-scene image generation model, and obtain at least one non-scene image corresponding to the reference scene image. It should be noted that at least one non-scene image is to be understood as an image of the reference target object in all possible scenes. Correspondingly, at least one non-scene image is a possible image of the vehicle a in the other 6 scenes. Therefore, the accuracy of identifying the reference target object under the cross-scene condition is improved.

Step 404, in response to that a target non-scene image corresponding to the to-be-processed scene image exists in the at least one non-scene image, obtaining target characteristic information corresponding to the reference target object image in the target non-scene image.

When there is a target non-scene image corresponding to the to-be-processed scene image, the execution subject may further acquire target feature information corresponding to the reference target object image in the target non-scene image. The target characteristic information is characteristic information that the reference target object may exist in the scene image to be processed.

Step 405, identifying a target object image from the to-be-processed scene image according to the target characteristic information.

Finally, the execution subject can recognize the target object image from the scene image to be processed according to the target characteristic information. Therefore, accuracy of identifying the reference target object under the cross-scene condition is improved.

With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for generating a model, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the apparatus 500 for generating a model of the present embodiment may include: a sample image acquisition unit 501, an extraction unit 502, and a model generation unit 503. A sample image obtaining unit 501 configured to obtain a plurality of sample scene images and a sample non-scene image corresponding to each of the plurality of sample scene images, where the sample scene images and the sample non-scene images contain object images of a same target object, the sample scene images and the object images have a corresponding relationship, and the sample non-scene images and the object images have a corresponding relationship; an extracting unit 502 configured to extract a pre-established generative confrontation network including a generation network for generating a non-scene image using an input scene image and a discrimination network for determining whether an image input to the discrimination network is an image output by the generation network; the model generation unit 503 is configured to train the generation network and the discrimination network using each of the plurality of sample scene images as an input of the generation network and a sample non-scene image corresponding to the sample scene image input to the generation network as an input of the discrimination network by a machine learning method, and determine the generation network after the training as the non-scene image generation model.

In some optional implementations of the present embodiment, the apparatus 500 for generating a model may include a sample determining unit (not shown) configured to determine a sample scene image, and the sample determining unit may include: a target object determining subunit (not shown) and a sample scene image determining subunit (not shown). Wherein the target object determination subunit is configured to determine at least one target object in the sample image; a sample scene image determining subunit configured to acquire, for a target object of the at least one target object, at least one sample scene image of each scene of the at least one scene of the target object.

In some optional implementations of this embodiment, the model generating unit 503 may include: a model training subunit (not shown in the figure) configured to fix parameters of a generation network, take each of the plurality of sample scene images as an input of the generation network, take an image output by the generation network and a sample non-scene image corresponding to the sample scene image input to the generation network as an input of a discrimination network, and train the discrimination network by using a machine learning method; fixing the parameters of the trained discrimination network, taking each sample scene image in the plurality of sample scene images as the input of a generation network, and training the generation network by utilizing a machine learning method, a cross entropy loss function and a triplet loss function; and determining the accuracy of the discrimination result output by the trained discrimination network, and determining the generation network trained most recently as a non-scene image generation model in response to the determination accuracy being within a preset numerical range.

In some optional implementation manners of this embodiment, the model generating unit 503 may further include: a model determination subunit (not shown) configured to re-perform the training steps using the last trained generation network and discrimination network in response to determining that the accuracy is outside the preset range of values.

In some optional implementations of the embodiment, for each sample scene image in the plurality of sample scene images, the object image features included in the sample scene image are at least partially the same as the object image features included in the sample non-scene image corresponding to the sample scene image.

In some optional implementations of this embodiment, the apparatus 500 for generating a model may further include: an image acquisition unit (not shown), a reference scene image inquiry unit (not shown), a non-scene image generation unit (not shown), an object feature information acquisition unit (not shown), and a recognition unit (not shown). The image acquisition unit is configured to acquire a reference target object image and a scene image to be processed; a reference scene image search unit configured to search for a reference scene image of the reference target object image; a non-scene image generation unit configured to input the reference scene image to the non-scene image generation model, and obtain at least one non-scene image corresponding to the reference scene image; a target feature information acquiring unit configured to acquire target feature information corresponding to the reference target object image in the target non-scene image in response to a presence of a target non-scene image corresponding to the to-be-processed scene image in the at least one non-scene image; and the identification unit is configured to identify a target object image from the scene image to be processed according to the target characteristic information.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 6, is a block diagram of an electronic device for a method of generating a model according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for generating a model provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method for generating a model provided herein.

The memory 602, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method for generating a model in the embodiment of the present application (for example, the sample image acquisition unit 501, the extraction unit 502, and the model generation unit 503 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing by executing non-transitory software programs, instructions, and modules stored in the memory 602, that is, implements the method for generating a model in the above method embodiments.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device for generating the model, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 602 optionally includes memory located remotely from processor 601, which may be connected to electronics for generating models over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method for generating a model may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic apparatus used to generate the model, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, a sample scene image and a sample non-scene image containing a target object are obtained firstly, wherein the sample scene image is an image obtained under various different scenes, and the target object can be identified under different scenes; and then training the sample scene image and the sample non-scene image through a generative confrontation network to obtain a non-scene image generation model. The method and the device improve the identification accuracy of the target object under the cross-scene condition.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for generating a model, comprising:

obtaining a plurality of sample scene images and a sample non-scene image corresponding to each of the plurality of sample scene images, wherein the sample scene images and the sample non-scene images contain object images of a same target object;

extracting a pre-established generative confrontation network, wherein the generative confrontation network comprises a generation network and a discrimination network, the generation network is used for generating a non-scene image by using the input scene image, and the discrimination network is used for determining whether the image input into the discrimination network is the image output by the generation network;

and training the generating network and the judging network by using a machine learning method and taking each sample scene image in the plurality of sample scene images as the input of the generating network, taking a sample non-scene image corresponding to an image output by the generating network and the sample scene image input into the generating network as the input of the judging network, and determining the trained generating network as a non-scene image generating model.

2. The method of claim 1, wherein the sample scene image is acquired by:

determining at least one target object in the sample image;

for a target object of the at least one target object, at least one sample scene image of each of at least one scene of the target object is acquired.

3. The method of claim 1, wherein training the generation network and the discriminant network, and determining the trained generation network as the non-scene image generation model comprises:

the following training steps are performed: fixing parameters of a generation network, taking each sample scene image in the plurality of sample scene images as the input of the generation network, taking an image output by the generation network and a sample non-scene image corresponding to the sample scene image input to the generation network as the input of a discrimination network, and training the discrimination network by using a machine learning method; fixing the parameters of the trained discrimination network, taking each sample scene image in the plurality of sample scene images as the input of a generation network, and training the generation network by utilizing a machine learning method, a cross entropy loss function and a triplet loss function; and determining the accuracy of the discrimination result output by the trained discrimination network, and determining the generation network trained most recently as a non-scene image generation model in response to the determination accuracy being within a preset numerical range.

4. The method of claim 3, wherein the training the generation network and the discriminant network, and determining the trained generation network as the non-scene image generation model further comprises:

in response to determining that the accuracy is outside the preset range of values, re-performing the training step using the most recently trained generation network and discrimination network.

5. The method of claim 1, wherein, for each sample scene image of the plurality of sample scene images, the sample scene image contains object image features that are at least partially identical to object image features contained in a sample non-scene image corresponding to the sample scene image.

6. The method of any of claims 1 to 5, wherein the method further comprises:

acquiring a reference target object image and a scene image to be processed;

inquiring a reference scene image of the reference target object image;

inputting the reference scene image into the non-scene image generation model to obtain at least one non-scene image corresponding to the reference scene image;

in response to the existence of a target non-scene image corresponding to the scene image to be processed in the at least one non-scene image, acquiring target characteristic information corresponding to the reference target object image in the target non-scene image;

and identifying a target object image from the scene image to be processed according to the target characteristic information.

7. An apparatus for generating a model, comprising:

a sample image acquisition unit configured to acquire a plurality of sample scene images and a sample non-scene image corresponding to each of the plurality of sample scene images, wherein the sample scene images and the sample non-scene images contain object images of the same target object;

an extraction unit configured to extract a generative confrontation network established in advance, wherein the generative confrontation network includes a generation network for generating a non-scene image using an input scene image and a discrimination network for determining whether an image input to the discrimination network is an image to generate a network output;

a model generation unit configured to train the generation network and the discrimination network using each of the plurality of sample scene images as an input of the generation network, and a sample non-scene image corresponding to a sample scene image input to the generation network and an image output from the generation network as an input of the discrimination network, and determine the trained generation network as a non-scene image generation model using a machine learning method.

8. The apparatus of claim 7, wherein the apparatus comprises a sample determination unit configured to determine a sample scene image, the sample determination unit comprising:

a target object determination subunit configured to determine at least one target object in the sample image;

a sample scene image determination subunit configured to, for a target object of the at least one target object, acquire at least one sample scene image of each scene of the at least one scene of the target object.

9. The apparatus of claim 7, wherein the model generation unit comprises:

a model training subunit configured to fix parameters of a generation network, take each of the plurality of sample scene images as an input of the generation network, take an image output by the generation network and a sample non-scene image corresponding to the sample scene image input to the generation network as an input of a discrimination network, and train the discrimination network by using a machine learning method; fixing the parameters of the trained discrimination network, taking each sample scene image in the plurality of sample scene images as the input of a generation network, and training the generation network by utilizing a machine learning method, a cross entropy loss function and a triplet loss function; and determining the accuracy of the discrimination result output by the trained discrimination network, and determining the generation network trained most recently as a non-scene image generation model in response to the determination accuracy being within a preset numerical range.

10. The apparatus of claim 9, wherein the model generation unit further comprises:

a model determination subunit, responsive to determining that the accuracy is outside the preset range of values, configured to re-perform the training step using the most recently trained generation network and discrimination network.

11. The apparatus of claim 7, wherein, for each sample scene image of the plurality of sample scene images, the sample scene image contains object image features that are at least partially identical to object image features contained in a sample non-scene image corresponding to the sample scene image.

12. The apparatus of any of claims 7 to 11, wherein the apparatus further comprises:

an image acquisition unit configured to acquire a reference target object image and a scene image to be processed;

a reference scene image query unit configured to query a reference scene image of the reference target object image;

a non-scene image generation unit configured to input the reference scene image to the non-scene image generation model, resulting in at least one non-scene image corresponding to the reference scene image;

a target feature information acquiring unit configured to acquire target feature information corresponding to the reference target object image in the target non-scene image in response to a presence of a target non-scene image corresponding to the to-be-processed scene image in the at least one non-scene image;

and the identification unit is configured to identify a target object image from the scene image to be processed according to the target characteristic information.

13. An electronic device, comprising:

one or more processors;

a memory having one or more programs stored thereon,

the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-6.

14. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 6.