CN110188833B

CN110188833B - Method and apparatus for training a model

Info

Publication number: CN110188833B
Application number: CN201910480683.8A
Authority: CN
Inventors: 陈日伟
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing Volcano Engine Technology Co Ltd
Priority date: 2019-06-04
Filing date: 2019-06-04
Publication date: 2021-06-18
Anticipated expiration: 2039-06-04
Also published as: CN110188833A

Abstract

Embodiments of the present disclosure disclose methods and apparatus for training models, and methods and apparatus for generating information. One embodiment of the method for training a model comprises: acquiring a training sample set, wherein training samples in the training sample set comprise sample images, discrimination information and position information, the discrimination information corresponds to the sample images and is used for indicating whether the corresponding sample images contain animal objects, and the position information is used for indicating the positions of the animal objects in the sample images; and training to obtain the recognition model by using a machine learning algorithm and taking the sample images included in the training samples in the training sample set as input data and the discrimination information and the position information corresponding to the input sample images as expected output data. The embodiment enriches the training mode of the model, and is beneficial to improving the image processing efficiency by adopting the model obtained by training.

Description

Method and apparatus for training a model

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method and apparatus for training a model, and a method and apparatus for generating information.

Background

In the prior art, when a convolutional neural network is used to determine the position of a target object in an image, the target object in the image is usually located by inputting the image into a convolutional neural network model for location, so as to obtain the position information of the target object in the image.

However, in electronic devices such as computers, before the target object is located, it is often impossible to know in advance whether the image contains the target object, in other words, the image input to the convolutional neural network for locating may or may not contain the target object.

Disclosure of Invention

The present disclosure presents methods and apparatus for training a model, and methods and apparatus for generating information.

In a first aspect, an embodiment of the present disclosure provides a method for training a model, the method including: acquiring a training sample set, wherein training samples in the training sample set comprise sample images, discrimination information and position information, the discrimination information corresponds to the sample images and is used for indicating whether the corresponding sample images contain animal objects, and the position information is used for indicating the positions of the animal objects in the sample images; and training to obtain a recognition model by using a machine learning algorithm and taking the sample images included in the training samples in the training sample set as input data and taking the discrimination information and the position information corresponding to the input sample images as expected output data, wherein the recognition model comprises a discrimination model and a positioning model, the expected output data of the discrimination model is the discrimination information, and the expected output data of the positioning model is the position information.

In some embodiments, training, by using a machine learning algorithm, a recognition model using sample images included in training samples in a training sample set as input data and using discrimination information and position information corresponding to the input sample images as expected output data includes: acquiring an initial model, wherein the initial model comprises a first submodel, a second submodel and a third submodel; by utilizing a machine learning algorithm, taking sample images included in training samples in a training sample set as input data of a first sub-model to obtain actual output data of the first sub-model, taking the actual output data of the first sub-model as input data of a second sub-model and a third sub-model to respectively obtain actual output data of the second sub-model and the third sub-model, and adjusting parameters of the initial model based on the actual output data and expected output data of the second sub-model and the third sub-model to obtain a trained initial model, wherein the expected output data of the second sub-model is discrimination information corresponding to the input sample images, and the expected output data of the third sub-model is position information corresponding to the input sample images; and determining the first sub-model and the second sub-model included in the trained initial model as discrimination models, and determining the third sub-model included in the trained initial model as a positioning model.

In some embodiments, the training samples in the set of training samples include sample images that are any one of: an image containing a cat object, an image containing a dog object; and the animal subject comprises at least one of: cat subjects, dog subjects.

In a second aspect, an embodiment of the present disclosure provides a method for generating information, the method including: acquiring a target image, and executing the following processing steps: inputting the target image into a pre-trained discrimination model, and generating discrimination information for indicating whether the input target image contains an animal object; in response to determining that the generated discrimination information indicates that the input target image contains an animal subject, generating position information of the animal subject in the target image based on a pre-trained localization model, and taking the generated position information as output information generated for the target image; wherein the discriminant model and the positioning model are trained by the method according to any one of the embodiments of the method for training a model in the first aspect.

In some embodiments, the method further comprises: in response to determining that the generated discrimination information indicates that the input target image does not contain an animal subject, the discrimination information is taken as output information generated for the input target image.

In some embodiments, the discriminant model includes a feature extraction layer; and inputting the target image into a pre-trained discrimination model, and generating discrimination information indicating whether the input target image includes an animal object, the method including: inputting a target image to a feature extraction layer included in a pre-trained discrimination model, and generating feature data of the input target image; based on the generated feature data, discrimination information indicating whether or not the input target image contains an animal subject is generated.

In some embodiments, generating positional information of the animal subject in the target image based on a pre-trained localization model comprises: the generated feature data is input to a positioning model trained in advance, and position information of the animal object in the input target image is generated.

In some embodiments, acquiring a target image comprises: selecting a video frame from the acquired video as a target image; and the method further comprises: in response to determining that the generated discrimination information indicates that the input target image does not contain an animal subject, selecting an unselected video frame from the video as a new target image, and continuing to perform the processing step based on the new target image.

In a third aspect, an embodiment of the present disclosure provides an apparatus for training a model, the apparatus including: a first acquisition unit configured to acquire a training sample set, wherein training samples in the training sample set include sample images, discrimination information corresponding to the sample images, and position information, the discrimination information indicating whether the corresponding sample images contain an animal object, the position information being information indicating a position of the animal object in the sample images; and a training unit configured to train a recognition model using a machine learning algorithm with sample images included in training samples in a training sample set as input data and discrimination information and position information corresponding to the input sample images as expected output data, wherein the recognition model includes a discrimination model and a positioning model, the expected output data of the discrimination model is the discrimination information, and the expected output data of the positioning model is the position information.

In some embodiments, the training unit comprises: an obtaining module configured to obtain an initial model, wherein the initial model comprises a first submodel, a second submodel and a third submodel; the training module is configured to use a machine learning algorithm to take sample images included in training samples in a training sample set as input data of a first sub-model to obtain actual output data of the first sub-model, take the actual output data of the first sub-model as input data of a second sub-model and a third sub-model to respectively obtain actual output data of the second sub-model and the third sub-model, and adjust parameters of the initial model based on the actual output data and expected output data of the second sub-model and the third sub-model to obtain a trained initial model, wherein the expected output data of the second sub-model is discrimination information corresponding to the input sample images, and the expected output data of the third sub-model is position information corresponding to the input sample images; a determining module configured to determine a first sub-model and a second sub-model included in the trained initial model as discriminant models, and determine a third sub-model included in the trained initial model as a positioning model.

In a fourth aspect, an embodiment of the present disclosure provides an apparatus for generating information, the apparatus including: a second acquisition unit configured to acquire a target image, and to perform the following processing steps: inputting the target image into a pre-trained discrimination model, and generating discrimination information for indicating whether the input target image contains an animal object; in response to determining that the generated discrimination information indicates that the input target image contains an animal subject, generating position information of the animal subject in the target image based on a pre-trained localization model, and taking the generated position information as output information generated for the target image; wherein the discriminant model and the positioning model are trained by the method according to any one of the embodiments of the method for training a model in the first aspect.

In some embodiments, the apparatus further comprises: a determination unit configured to determine the discrimination information as output information generated for the input target image in response to determining that the generated discrimination information indicates that the input target image does not contain an animal subject.

In some embodiments, the second acquisition unit comprises: an input module configured to input the generated feature data to a pre-trained positioning model, generating position information of the animal object in the input target image.

In some embodiments, the second acquisition unit comprises: a selecting module configured to select a video frame from the acquired video as a target image; and the apparatus further comprises: an execution unit configured to select, in response to a determination that the generated discrimination information indicates that the input target image does not contain an animal subject, a video frame that has not been selected from the video as a new target image, and to continue executing the processing step based on the new target image.

In a fifth aspect, embodiments of the present disclosure provide an electronic device for training a model, including: one or more processors; a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement a method as in any one of the embodiments of the method for training a model as in the first aspect above.

In a sixth aspect, embodiments of the present disclosure provide a computer-readable medium for training a model, on which a computer program is stored, which when executed by a processor, implements the method as described above in any of the embodiments of the method for training a model in the first aspect.

The method and the device for training the model provided by the embodiment of the disclosure enrich the training mode of the model by obtaining a training sample set, wherein training samples in the training sample set comprise sample images, discrimination information and position information corresponding to the sample images, the discrimination information is used for indicating whether the corresponding sample images contain animal objects, the position information is used for indicating the positions of the animal objects in the sample images, then, by using a machine learning algorithm, the sample images included in the training samples in the training sample set are used as input data, the discrimination information and the position information corresponding to the input sample images are used as expected output data, and the identification model is obtained by training, wherein the identification model comprises a discrimination model and a positioning model, the expected output data of the discrimination model is the discrimination information, the expected output data of the positioning model is the position information, the method is beneficial to improving the image processing efficiency by adopting the model obtained by training.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for training a model according to the present disclosure;

FIG. 3 is a schematic diagram of one application scenario of a method for training a model according to the present disclosure;

FIG. 4 is a flow diagram for one embodiment of a method for generating information, according to the present disclosure;

FIG. 5 is a flow diagram of yet another embodiment of a method for generating information according to the present disclosure;

FIG. 6 is a schematic block diagram of one embodiment of an apparatus for training models according to the present disclosure;

FIG. 7 is a schematic block diagram illustrating one embodiment of an apparatus for generating information according to the present disclosure;

FIG. 8 is a schematic structural diagram of a computer system suitable for use with the electronic device used to implement embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 of an embodiment of a method for training a model or an apparatus for training a model, or a method for generating information or an apparatus for generating information, to which embodiments of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 over the network 104 to receive or transmit data (e.g., training samples), etc. The

terminal devices

101, 102, 103 may have various client applications installed thereon, such as video playing software, news information applications, image processing applications, web browser applications, shopping applications, search applications, instant messaging tools, mailbox clients, social platform software, and the like.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a display screen and supporting image presentation, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server that provides various services, such as a background server that processes images transmitted by the

terminal devices

101, 102, 103 (e.g., generates position information of an animal object in the images). The backend server may generate location information for the animal object in the received image (e.g., target image). Optionally, the background server may also feed back the generated location information to the terminal device. As an example, the server 105 may be a cloud server or a physical server.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be further noted that the method for training the model provided by the embodiment of the present disclosure may be executed by a server, may also be executed by a terminal device, and may also be executed by the server and the terminal device in cooperation with each other. Accordingly, each part (for example, each unit, sub-unit, module, sub-module) included in the apparatus for training a model may be entirely disposed in the server, may be entirely disposed in the terminal device, and may be disposed in the server and the terminal device, respectively. In addition, the method for generating information provided by the embodiment of the disclosure may be executed by the server, the terminal device, or the server and the terminal device in cooperation with each other. Accordingly, each part (for example, each unit, sub-unit, module, sub-module) included in the apparatus for generating information may be entirely provided in the server, may be entirely provided in the terminal device, and may be provided in the server and the terminal device, respectively.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. As an example, when the electronic device on which the method for training the model runs does not require data transmission with other electronic devices in performing the method, the system architecture may include only the electronic device (e.g., a server or a terminal device) on which the method for training the model runs.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for training a model according to the present disclosure is shown. The method for training the model comprises the following steps:

step 201, a training sample set is obtained.

In this embodiment, an executing subject (for example, a server or a terminal device shown in fig. 1) of the method for training the model may obtain the training sample set from other electronic devices or locally through a wired connection manner or a wireless connection manner. The training samples in the training sample set comprise sample images, and discrimination information and position information corresponding to the sample images. The discrimination information is used to indicate whether or not the corresponding sample image contains an animal subject. The position information is information indicating the position of the animal subject in the sample image.

Here, the sample image may be various images. As an example, the sample image may be an image including an animal subject or may be an image not including an animal subject. The animal subject may be an image of the animal photographed which is present in an image obtained by photographing the animal. As an example, the sample image may be an image containing a cat object, an image containing a goldfish object, an image containing a lion object, or the like. In addition, the sample image may be an image obtained by imaging a scene not including an animal, and in this case, the sample image does not usually include an animal subject. The cat object may be an image of a cat photographed as shown in an image obtained by photographing a cat, the goldfish object may be an image of a goldfish photographed as shown in an image obtained by photographing a goldfish, and the lion object may be an image of a lion photographed as shown in an image obtained by photographing a lion.

It is understood that, when the sample image is an image containing an animal object, the discrimination information corresponding to the sample image may be used to indicate that the sample image contains the animal object, and the position information corresponding to the sample image may be used to indicate the position of the animal object contained in the sample image; when the sample image is an image that does not include an animal object, the discrimination information corresponding to the sample image may be used to indicate that the sample image does not include an animal object, and the position information corresponding to the sample image may be characterized by predetermined data, for example, the position information corresponding to the sample image may be "null", "0", or the like. The position information may be coordinates of 4 corner points of a rectangular frame containing the animal subject in the image, or may be information indicating a position of a center point of the animal subject in the image.

In some optional implementations of this embodiment, the training samples in the training sample set include sample images that are any one of: an image containing a cat object, an image containing a dog object. The animal subject comprises at least one of: cat subjects, dog subjects. The dog object may be a video of a dog to be photographed, which is displayed in an image obtained by photographing the dog.

It is understood that, since cats and dogs are usually raised as pets, the alternative implementation manner may train to obtain, based on the sample image containing the pet object, a recognition model for determining whether the image contains the pet object and determining the position of the pet object in the image through subsequent steps, thereby facilitating capturing an image meeting specific requirements (for example, the pet is at a predetermined position in the image).

Step 202, training to obtain an identification model by using a machine learning algorithm and using the sample images included in the training samples in the training sample set as input data and using the discrimination information and the position information corresponding to the input sample images as expected output data.

In this embodiment, the executing agent may train the sample images included in the training samples in the training sample set acquired in step 201 as input data and the discrimination information and the position information corresponding to the input sample images as expected output data by using a machine learning algorithm to obtain the recognition model. The identification model comprises a discrimination model and a positioning model, expected output data of the discrimination model is discrimination information, and expected output data of the positioning model is position information.

Here, the input data is a sample image included in the training sample. The expected output data is discrimination information and position information included in the training sample. The actual output data is data that is actually output after the model is operated after input data is input to the model (for example, a recognition model obtained by training, or an initial model used for training to obtain a recognition model).

In some optional implementations of this embodiment, the executing main body may execute the step 202 by using the following steps:

first, an initial model is obtained. Wherein the initial model comprises a first submodel, a second submodel and a third submodel.

Here, the initial model may be a model to be trained, or may be a model that has been trained but does not satisfy a training end condition. Here, the initial model may be a convolutional neural network. In practice, the first submodel, the second submodel, and the third submodel may include, but are not limited to, at least one of the following model structures, respectively: AlexNet, ZFNET, VGGNet, GoogleNet or ResNet.

And secondly, by utilizing a machine learning algorithm, taking sample images included by training samples in the training sample set as input data of the first submodel to obtain actual output data of the first submodel, taking the actual output data of the first submodel as input data of the second submodel and the third submodel to respectively obtain actual output data of the second submodel and the third submodel, and adjusting parameters of the initial model based on the actual output data and expected output data of the second submodel and the third submodel to obtain the trained initial model. The desired output data of the second submodel is the discrimination information corresponding to the input sample image, and the desired output data of the third submodel is the position information corresponding to the input sample image.

Specifically, the executing entity may determine, before adjusting parameters of the initial model each time, whether a current initial model (the initial model obtained in step one, or a new initial model obtained after parameter adjustment has been performed on the initial model in step one) meets a predetermined training end condition, and if the current initial model meets the training end condition, determine the current initial model as the initial model after training; if the training end condition is not met, the parameters of the current initial model are adjusted by adopting a gradient descent method or other algorithms. Wherein, the parameters of the initial model may include, but are not limited to, at least one of the following: weights, step sizes, bias terms, etc. The end-of-training conditions may include, but are not limited to, at least one of the following: the training times reach the preset times; the training time reaches the preset time length; or the function value of a predetermined loss function calculated based on the actual output data and the expected output data of the second submodel and the third submodel is smaller than a preset threshold value.

And thirdly, determining the first sub-model and the second sub-model included in the trained initial model as discrimination models, and determining the trained initial model and the third sub-model as positioning models.

It can be understood that the recognition model trained by the alternative implementation manner includes a relatively independent discriminant model and a positioning model. Thus, in the process of using the recognition model, whether the animal object is contained in the image and the position of the animal object in the image can be determined relatively independently. In addition, in the process of using the recognition model obtained by the optional implementation manner, the discrimination model may be first used to determine whether the image contains the animal object, and if so, the positioning model is then used to determine the position of the animal object in the image; if not, the image is not manipulated using the localization model. In the prior art, in the process of determining the position of the target object in the image by using the convolutional neural network, the image is generally input into a convolutional neural network model for determining the position of the target object in the image regardless of whether the target object is contained in the image or not. Therefore, because the discrimination model in the optional implementation manner is a binary model, and has fewer model parameters relative to a convolutional neural network model for determining the position of the target object in the image, the processing speed of the image is faster, and the consumed computing resources are less, so that under the condition that the image does not contain the animal object, the optional implementation scheme can more quickly complete the processing of the image relative to the prior art, and the utilization rate of the computing resources is reduced.

Optionally, the executing main body may also execute the step 202 by using the following steps:

And thirdly, determining the first sub-model and the second sub-model included in the trained initial model as discrimination models, and determining the trained initial model including the first sub-model and the third sub-model as a positioning model.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for training a model according to the present embodiment. In the application scenario of fig. 3, the server 301 first obtains a training sample set 3011. The training samples in the training sample set 3011 include sample images, and discrimination information and position information corresponding to the sample images. The discrimination information is used to indicate whether or not the corresponding sample image contains an animal subject. The position information is information indicating the position of the animal subject in the sample image. Then, the server 301 trains the recognition model 3012 using a machine learning algorithm, with the sample images included in the training samples in the training sample set 3011 as input data, and with the discrimination information and the position information corresponding to the input sample images as expected output data. The recognition model 3012 includes a discrimination model and a positioning model, where expected output data of the discrimination model is discrimination information and expected output data of the positioning model is position information.

The method for training a model according to the above embodiment of the present disclosure enriches the training mode of the model by obtaining a training sample set, where training samples in the training sample set include sample images, and discrimination information and position information corresponding to the sample images, the discrimination information is used to indicate whether the corresponding sample images include an animal object, and the position information is information used to indicate a position of the animal object in the sample images, and then training, using a machine learning algorithm, the sample images included in the training samples in the training sample set as input data, and the discrimination information and the position information corresponding to the input sample images as expected output data, to obtain a recognition model, where the recognition model includes a discrimination model and a positioning model, the expected output data of the discrimination model is the discrimination information, and the expected output data of the positioning model is the position information, the method is beneficial to improving the image processing efficiency by adopting the model obtained by training.

With further reference to FIG. 4, a flow 400 of one embodiment of a method for generating information is shown. The flow 400 of the method for generating information comprises the steps of:

step 401, a target image is acquired.

In this embodiment, an execution subject of the method for generating information (e.g., a server or a terminal device shown in fig. 1) may acquire the target image from other electronic devices or locally by a wired connection manner or a wireless connection manner.

Here, the above-mentioned target image may be various images. As an example, the target image may be an image including an animal subject or may be an image not including an animal subject. The animal subject may be an image of the animal photographed which is present in an image obtained by photographing the animal.

Step 402, inputting the target image into a pre-trained discrimination model, and generating discrimination information indicating whether the input target image includes an animal object.

In this embodiment, the executing body may input the target image acquired in step 401 to a discrimination model trained in advance, and generate discrimination information indicating whether the input target image includes an animal subject.

Step 403, in response to determining that the generated discrimination information indicates that the input target image contains an animal object, generating position information of the animal object in the target image based on a pre-trained positioning model, and using the generated position information as output information generated for the target image.

In this embodiment, in a case where it is determined that the generated discrimination information indicates that the input target image contains an animal subject, the execution body may generate position information of the animal subject in the target image based on a positioning model trained in advance, and use the generated position information as output information generated for the target image.

In this embodiment, the discriminant model and the positioning model are obtained by training using the method of any one of the embodiments of the method for training a model corresponding to fig. 2.

As an example, when the discrimination model includes a first sub-model, the execution subject may input data output by the first sub-model included in the discrimination model to a positioning model trained in advance, thereby generating position information of the animal subject in the target image.

As yet another example, when the input data of the positioning model is an image, the execution subject may input the target image to a positioning model trained in advance, thereby generating the position information of the animal subject in the target image. The positioning model may be a model obtained by training based on a training sample including a sample image and position information in the sample image by using a machine learning algorithm.

In some cases, the execution body may output the output information after having the discrimination information as the output information generated for the input target image.

It is understood that, in general, the method for outputting information in the present embodiment may be used to determine the position of a target object in an image, and thus, in the case where the execution subject generates position information of an animal object in a target image based on a positioning model trained in advance and uses the generated position information as output information generated for the target image, the execution subject may generate output information for characterizing the position of the animal object.

In some optional implementation manners of this embodiment, the executing main body may further perform the following steps: in a case where it is determined that the generated discrimination information indicates that the input target image does not include the animal subject, the discrimination information is taken as output information generated for the input target image.

It is to be understood that in the case where the method for outputting information is used to determine the position of a target object in an image and the executing body takes discrimination information as output information generated for the input target image when it is determined that the generated discrimination information indicates that the input target image does not contain an animal object, the executing body may generate output information for characterizing that the image does not contain an animal object and output the output information.

In the case where it is determined that the generated discrimination information indicates that the input target image does not include an animal object, the execution main body or the electronic device communicatively connected to the execution main body may not need to determine the position of the animal object in the image. In the prior art, in the process of determining the position of the target object in the image by using the convolutional neural network, the image is generally input into a convolutional neural network model for determining the position of the target object in the image regardless of whether the target object is contained in the image or not. Therefore, under the condition that the image does not contain the animal object, compared with the prior art, the optional implementation scheme can finish the processing of the image more quickly, and the utilization rate of computing resources is reduced.

In some optional implementations of this embodiment, the discriminant model includes a feature extraction layer. Thus, the executing body may further execute the step 402 by:

step one, inputting a target image to a feature extraction layer included in a pre-trained discrimination model, and generating feature data of the input target image.

Here, the feature extraction layer may include one or more convolution layers. The feature data may be data for characterizing one or more of texture, contour, and color of the target image. Here, when the initial model includes a first submodel, a second submodel, and a third submodel, the feature extraction layer may be the first submodel included in the discriminant model, may be one or more convolution layers in the first submodel included in the discriminant model, and may further include the first submodel and one or more convolution layers in the second submodel.

And secondly, generating discrimination information indicating whether the input target image contains an animal object or not based on the generated feature data.

As an example, the execution body may input the generated feature data into a model structure subsequent to the feature extraction layer in the discrimination model, thereby generating discrimination information indicating whether the input target image contains an animal subject.

As still another example, the executing body may search a table or a database in which feature data and discrimination information are stored in association with each other in advance for feature data having the highest similarity to the generated feature data, and determine the discrimination information stored in association with the searched feature data as the discrimination information generated in step two. Wherein, the similarity may include but is not limited to: euclidean distance, cosine similarity, etc.

In some optional implementations of this embodiment, the executing main body may further adopt the following steps to execute the step 403: the generated feature data is input to a positioning model trained in advance, and position information of the animal object in the input target image is generated.

It is to be understood that, in this alternative implementation, in a case where it is determined that the generated discrimination information indicates that the input target image contains an animal object, the execution subject may determine the position information of the animal object in the input target image based on the positioning model included in the recognition model, thereby improving the accuracy of determining the position of the animal object in the image.

It should be noted that, in addition to the above-mentioned contents, the present embodiment may further include the same or similar features and effects as those of the embodiment corresponding to fig. 2, and details are not repeated herein.

The method for generating information provided by the above embodiment of the present disclosure is implemented by acquiring a target image and performing the following processing steps: inputting a target image to a pre-trained discrimination model, generating discrimination information indicating whether the input target image includes an animal object, and then, in response to determining that the generated discrimination information indicates that the input target image includes an animal object, generating position information of the animal object in the target image based on a pre-trained positioning model, and using the generated position information as output information generated for the target image, wherein the discrimination model and the positioning model are trained using the method of any one of the embodiments of the method for training a model as described above in the first aspect, enriching a manner of generating the position information of the animal object in the target image, and improving accuracy of generating the position information of the animal object in the target image.

With further reference to fig. 5, a flow 500 of yet another embodiment of a method for generating information is shown. The flow 500 of the method for generating information includes the steps of:

step 501, selecting a video frame from the acquired video as a target image. Thereafter, execution continues at step 502.

In this embodiment, an execution subject of the method for generating information (for example, a server or a terminal device shown in fig. 1) may acquire a video from other electronic devices or locally through a wired connection manner or a wireless connection manner, and then select a video frame from the acquired video as a target image.

The video may be various videos. The video may be a video obtained by shooting an animal, or a video obtained by shooting an object other than an animal, for example. It is understood that when the video is a video obtained by shooting an animal, all or part of the video frames in the video may contain an animal object.

It is understood that a video frame in a video is an image.

Here, the execution subject may randomly select a video frame from the acquired video as a target image, or may select a currently presented video frame from the acquired video as a target image.

Step 502, inputting the target image into a pre-trained discrimination model, and generating discrimination information indicating whether the input target image includes an animal object. Thereafter, execution continues at step 503.

In this embodiment, step 502 is substantially the same as step 402 in the embodiment corresponding to fig. 4, and is not described here again.

Step 503, determining whether the generated discrimination information indicates that the input target image contains an animal object. If yes, go on to step 504; if not, go to step 505.

In this embodiment, the execution subject described above may determine whether the generated discrimination information indicates that the input target image contains an animal subject.

Step 504 is to generate position information of the animal subject in the target image based on the pre-trained positioning model, and to use the generated position information as output information generated for the target image. Thereafter, execution continues at step 505.

In this embodiment, step 504 is substantially the same as step 403 in the embodiment corresponding to fig. 4, and is not described here again.

And 505, selecting unselected video frames from the video as new target images. Thereafter, execution continues at step 502.

Here, the execution subject may randomly select a video frame from video frames that are not selected in the video as a new target image, may select a currently presented video frame from video frames that are not selected in the video as a new target image, and may select a subsequent video frame (for example, a video frame next to the currently selected target image, a video frame 2 located after the currently selected target image, and the like) of the currently selected target image as a new target image.

It should be noted that, in addition to the contents described above, the present embodiment may also include the same or similar features and effects as those of the embodiment corresponding to fig. 4, and details are not repeated herein.

As can be seen from fig. 5, compared with the embodiment corresponding to fig. 4, the flow 500 of the method for training a model in the present embodiment highlights the step of locating an animal object in all or part of the video frames in the video. Therefore, the scheme described in the embodiment enriches the way of generating the position information of the animal object in the video frame included in the video, and improves the accuracy of generating the position information of the animal object in the video frame included in the video. Under the condition that the video frame does not contain the animal object, the position of the animal object in the video frame does not need to be determined, the image processing efficiency is improved, and the computing resource consumed in the image processing process is reduced.

With further reference to fig. 6, as an implementation of the method shown in fig. 2 described above, the present disclosure provides an embodiment of an apparatus for training a model, the embodiment of the apparatus corresponding to the embodiment of the method shown in fig. 2, the embodiment of the apparatus may further include the same or corresponding features as the embodiment of the method shown in fig. 2, and produce the same or corresponding effects as the embodiment of the method shown in fig. 2, in addition to the features described below. The device can be applied to various electronic equipment.

As shown in fig. 6, the apparatus 600 for training a model of the present embodiment includes: a first acquisition unit 601 and a training unit 602. The first obtaining unit 601 is configured to obtain a training sample set, where training samples in the training sample set include sample images, discrimination information corresponding to the sample images, and position information, where the discrimination information is used to indicate whether the corresponding sample images contain an animal object, and the position information is information used to indicate a position of the animal object in the sample images; the training unit 602 is configured to train a recognition model using a machine learning algorithm, with sample images included in training samples in a training sample set as input data, and discrimination information and position information corresponding to the input sample images as expected output data, wherein the recognition model includes a discrimination model and a positioning model, the expected output data of the discrimination model is the discrimination information, and the expected output data of the positioning model is the position information.

In this embodiment, the first obtaining unit 601 of the apparatus 600 for training a model may obtain a training sample set. The training samples in the training sample set comprise sample images, discrimination information and position information, wherein the discrimination information corresponds to the sample images and is used for indicating whether the corresponding sample images contain animal objects, and the position information is used for indicating the positions of the animal objects in the sample images.

In this embodiment, the training unit 602 may train the sample images included in the training samples in the training sample set acquired by the first acquisition unit as input data and the discrimination information and the position information corresponding to the input sample images as expected output data by using a machine learning algorithm to obtain the recognition model. The identification model comprises a discrimination model and a positioning model, expected output data of the discrimination model is discrimination information, and expected output data of the positioning model is position information.

In some optional implementations of this embodiment, the training unit 602 includes: an obtaining module (not shown in the figures) is configured to obtain an initial model, wherein the initial model comprises a first submodel, a second submodel and a third submodel. The training module (not shown in the figure) is configured to use a machine learning algorithm to use sample images included in training samples in a training sample set as input data of a first sub-model to obtain actual output data of the first sub-model, use the actual output data of the first sub-model as input data of a second sub-model and a third sub-model to obtain actual output data of the second sub-model and the third sub-model respectively, and adjust parameters of the initial model based on the actual output data and expected output data of the second sub-model and the third sub-model to obtain a trained initial model, wherein the expected output data of the second sub-model is discrimination information corresponding to the input sample images, and the expected output data of the third sub-model is position information corresponding to the input sample images. The determination module (not shown in the figures) is configured to determine a first sub-model and a second sub-model comprised by the trained initial model as discriminant models and a third sub-model comprised by the trained initial model as positioning models.

In some optional implementations of this embodiment, the training samples in the training sample set include sample images that are any one of: an image containing a cat object, an image containing a dog object. The animal subject comprises at least one of: cat subjects, dog subjects.

The apparatus for training a model according to the above-mentioned embodiment of the present disclosure acquires, by the first acquiring unit 601, a training sample set in which a training sample in the training sample set includes a sample image, discrimination information corresponding to the sample image, and position information, the discrimination information being used to indicate whether the corresponding sample image includes an animal object, and the position information being information used to indicate a position of the animal object in the sample image, and then, the training unit 602 trains, by using a machine learning algorithm, a recognition model including the discrimination model and a positioning model, the expected output data of the discrimination model being the discrimination information, and the expected output data of the positioning model being the position information, with the sample image included in the training sample set as input data and the discrimination information and the position information corresponding to the input sample image as expected output data, the training modes of the model are enriched, and the model obtained by training is beneficial to improving the image processing efficiency.

With further reference to fig. 7, as an implementation of the method shown in fig. 4 described above, the present disclosure provides an embodiment of an apparatus for generating information, the apparatus embodiment corresponding to the method embodiment shown in fig. 4, which may include the same or corresponding features as the method embodiment shown in fig. 4 and produce the same or corresponding effects as the method embodiment shown in fig. 4, in addition to the features described below. The device can be applied to various electronic equipment.

As shown in fig. 7, the apparatus 700 for generating information of the present embodiment includes: a second acquisition unit 701. Wherein the second acquiring unit 701 is configured to acquire a target image, and to perform the following processing steps: inputting the target image into a pre-trained discrimination model, and generating discrimination information for indicating whether the input target image contains an animal object; in response to determining that the generated discrimination information indicates that the input target image contains an animal subject, generating position information of the animal subject in the target image based on a pre-trained localization model, and taking the generated position information as output information generated for the target image; the discriminant model and the positioning model are obtained by training by using the method of any one embodiment of the method for training the model corresponding to fig. 2.

In the present embodiment, the second acquisition unit 701 of the apparatus for generating information 700 may acquire a target image, and perform the following processing steps: inputting the target image into a pre-trained discrimination model, and generating discrimination information for indicating whether the input target image contains an animal object; in response to determining that the generated discrimination information indicates that the input target image contains an animal subject, generating position information of the animal subject in the target image based on a pre-trained localization model, and taking the generated position information as output information generated for the target image; the discriminant model and the positioning model are obtained by training by using the method of any one embodiment of the method for training the model corresponding to fig. 2.

In some optional implementations of this embodiment, the apparatus 700 further includes: the determination unit (not shown in the figures) is configured to determine the discrimination information as output information generated for the input target image in response to determining that the generated discrimination information indicates that the input target image does not contain an animal subject.

In some optional implementations of this embodiment, the discriminant model includes a feature extraction layer. Thus, the method for generating discrimination information indicating whether an input target image includes an animal object by inputting the target image to a discrimination model trained in advance includes: inputting a target image to a feature extraction layer included in a pre-trained discrimination model, and generating feature data of the input target image; based on the generated feature data, discrimination information indicating whether or not the input target image contains an animal subject is generated.

In some optional implementation manners of this embodiment, the second obtaining unit 701 includes: an input module (not shown in the figures) is configured to input the generated feature data to a pre-trained positioning model, generating position information of the animal object in the input target image.

In some optional implementation manners of this embodiment, the second obtaining unit 701 includes: a selecting module (not shown in the figure) is configured to select a video frame from the acquired video as a target image. Thus, the apparatus 700 further comprises: an execution unit (not shown in the figures) is configured to select, in response to determining that the generated discrimination information indicates that the input target image does not contain an animal subject, a video frame that has not been selected from the video as a new target image, and to continue to execute the processing steps based on the new target image.

The apparatus for generating information according to the above embodiment of the present disclosure acquires the target image through the second acquiring unit 701, and performs the following processing steps: inputting the target image into a pre-trained discrimination model, and generating discrimination information for indicating whether the input target image contains an animal object; in response to determining that the generated discrimination information indicates that the input target image contains an animal subject, generating position information of the animal subject in the target image based on a pre-trained localization model, and taking the generated position information as output information generated for the target image; the discrimination model and the positioning model are obtained by training by adopting the method of any embodiment of the method for training the model corresponding to the above fig. 2, so that the method for generating the position information of the animal object in the target image is enriched, and the accuracy for generating the position information of the animal object in the target image is improved.

Referring now to fig. 8, a schematic diagram of an electronic device (e.g., a server or terminal device of fig. 1) 800 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The terminal device/server shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 8, an electronic device 800 may include a processing means (e.g., central processing unit, graphics processor, etc.) 801 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage means 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the electronic apparatus 800 are also stored. The processing apparatus 801, the ROM802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

Generally, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 807 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage 808 including, for example, magnetic tape, hard disk, etc.; and a communication device 809. The communication means 809 may allow the electronic device 800 to communicate wirelessly or by wire with other devices to exchange data. While fig. 8 illustrates an electronic device 800 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 8 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 809, or installed from the storage means 808, or installed from the ROM 802. The computer program, when executed by the processing apparatus 801, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a training sample set, wherein training samples in the training sample set comprise sample images, discrimination information and position information, the discrimination information corresponds to the sample images and is used for indicating whether the corresponding sample images contain animal objects, and the position information is used for indicating the positions of the animal objects in the sample images; and training to obtain a recognition model by using a machine learning algorithm and taking the sample images included in the training samples in the training sample set as input data and taking the discrimination information and the position information corresponding to the input sample images as expected output data, wherein the recognition model comprises a discrimination model and a positioning model, the expected output data of the discrimination model is the discrimination information, and the expected output data of the positioning model is the position information. Or, causing the electronic device to: acquiring a target image, and executing the following processing steps: inputting the target image into a pre-trained discrimination model, and generating discrimination information for indicating whether the input target image contains an animal object; in response to determining that the generated discrimination information indicates that the input target image contains an animal subject, generating position information of the animal subject in the target image based on a pre-trained localization model, and taking the generated position information as output information generated for the target image; the discriminant model and the positioning model are obtained by training by using the method of any one embodiment of the method for training the model corresponding to fig. 2.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first acquisition unit and a training unit. Alternatively, it can also be described as: a processor includes a second acquisition unit. Where the names of the units do not in some cases constitute a limitation of the units themselves, for example, the first acquisition unit may also be described as a "unit acquiring a training sample set".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A method for generating information, comprising:

acquiring a target image, and executing the following processing steps:

inputting the target image into a pre-trained discrimination model, and generating discrimination information for indicating whether the input target image contains an animal object;

in response to determining that the generated discrimination information indicates that the input target image contains an animal subject, generating position information of the animal subject in the target image based on a pre-trained localization model, and taking the generated position information as output information generated for the target image;

the discriminant model and the positioning model are obtained by training by adopting the following method: acquiring a training sample set, wherein training samples in the training sample set comprise sample images, discrimination information and position information, the discrimination information corresponds to the sample images and is used for indicating whether the corresponding sample images contain animal objects, and the position information is used for indicating the positions of the animal objects in the sample images; and training a convolutional neural network to obtain a recognition model by using a machine learning algorithm and taking the sample images included in the training samples in the training sample set as input data and taking discrimination information and position information corresponding to the input sample images as expected output data, wherein the recognition model comprises a discrimination model and a positioning model, the expected output data of the discrimination model is discrimination information, and the expected output data of the positioning model is position information.

2. The method of claim 1, wherein the method further comprises:

in response to determining that the generated discrimination information indicates that the input target image does not contain an animal subject, the discrimination information is taken as output information generated for the input target image.

3. The method of claim 1, wherein the discriminant model comprises a feature extraction layer; and

the method for inputting the target image into a pre-trained discrimination model and generating discrimination information for indicating whether the input target image contains an animal object comprises the following steps:

inputting a target image to a feature extraction layer included in a pre-trained discrimination model, and generating feature data of the input target image;

based on the generated feature data, discrimination information indicating whether or not the input target image contains an animal subject is generated.

4. The method of claim 3, wherein generating positional information of the animal subject in the target image based on the pre-trained localization model comprises:

the generated feature data is input to a positioning model trained in advance, and position information of the animal object in the input target image is generated.

5. The method of one of claims 1 to 4, wherein said acquiring a target image comprises:

selecting a video frame from the acquired video as a target image; and

the method further comprises the following steps:

in response to determining that the generated discrimination information indicates that the input target image does not contain an animal subject, selecting an unselected video frame from the video as a new target image, and continuing to perform the processing step based on the new target image.

6. The method according to claim 1, wherein training a convolutional neural network to obtain a recognition model by using a machine learning algorithm and using a sample image included in a training sample in the training sample set as input data and using discrimination information and position information corresponding to the input sample image as expected output data comprises:

acquiring an initial model, wherein the initial model comprises a first submodel, a second submodel and a third submodel;

using a machine learning algorithm, taking sample images included in training samples in the training sample set as input data of a first sub-model to obtain actual output data of the first sub-model, taking the actual output data of the first sub-model as input data of a second sub-model and a third sub-model to respectively obtain actual output data of the second sub-model and the third sub-model, and adjusting parameters of the initial model based on the actual output data and expected output data of the second sub-model and the third sub-model to obtain a trained initial model, wherein the expected output data of the second sub-model is discrimination information corresponding to the input sample images, and the expected output data of the third sub-model is position information corresponding to the input sample images;

and determining the first sub-model and the second sub-model included in the trained initial model as discrimination models, and determining the third sub-model included in the trained initial model as a positioning model.

7. The method of claim 1 or 6, wherein training samples in the set of training samples comprise sample images of any one of:

an image containing a cat object, an image containing a dog object; and

the animal subject comprises at least one of:

cat subjects, dog subjects.

8. An apparatus for generating information, comprising:

a second acquisition unit configured to acquire a target image, and to perform the following processing steps:

the distinguishing model and the positioning model are obtained by adopting the following devices for training: a first acquisition unit configured to acquire a training sample set, wherein training samples in the training sample set include sample images, discrimination information corresponding to the sample images, and position information, the discrimination information indicating whether the corresponding sample images contain an animal object, the position information being information indicating a position of the animal object in the sample images; and the training unit is configured to train the convolutional neural network to obtain a recognition model by using a machine learning algorithm and taking the sample images included in the training samples in the training sample set as input data and taking discrimination information and position information corresponding to the input sample images as expected output data, wherein the recognition model comprises a discrimination model and a positioning model, the expected output data of the discrimination model is discrimination information, and the expected output data of the positioning model is position information.

9. The apparatus of claim 8, wherein the apparatus further comprises:

a determination unit configured to determine the discrimination information as output information generated for the input target image in response to determining that the generated discrimination information indicates that the input target image does not contain an animal subject.

10. The apparatus according to one of claims 8-9, wherein the second obtaining unit comprises:

a selecting module configured to select a video frame from the acquired video as a target image; and

the device further comprises:

an execution unit configured to select, in response to a determination that the generated discrimination information indicates that the input target image does not contain an animal subject, a video frame that has not been selected from the video as a new target image, and to continue executing the processing step based on the new target image.

11. The apparatus of claim 8, wherein the training unit comprises:

an obtaining module configured to obtain an initial model, wherein the initial model comprises a first submodel, a second submodel and a third submodel;

a training module configured to obtain actual output data of a first sub-model by using a machine learning algorithm with sample images included in training samples in the training sample set as input data of the first sub-model, obtain actual output data of the first sub-model as input data of a second sub-model and a third sub-model, respectively obtain actual output data of the second sub-model and the third sub-model, and adjust parameters of the initial model based on the actual output data and the expected output data of the second sub-model and the third sub-model to obtain a trained initial model, wherein the expected output data of the second sub-model is discrimination information corresponding to the input sample images, and the expected output data of the third sub-model is position information corresponding to the input sample images;

a determining module configured to determine a first sub-model and a second sub-model included in the trained initial model as discriminant models, and determine a third sub-model included in the trained initial model as a positioning model.

12. The apparatus according to claim 8 or 11, wherein the training samples in the set of training samples comprise sample images of any one of:

an image containing a cat object, an image containing a dog object; and

the animal subject comprises at least one of:

cat subjects, dog subjects.

13. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

14. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-7.