CN108038473B

CN108038473B - Method and apparatus for outputting information

Info

Publication number: CN108038473B
Application number: CN201711455262.7A
Authority: CN
Inventors: 庞文杰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2017-12-28
Filing date: 2017-12-28
Publication date: 2021-04-23
Anticipated expiration: 2037-12-28
Also published as: CN108038473A

Abstract

The embodiment of the application discloses a method and a device for outputting information. One embodiment of the method comprises: acquiring an image sequence to be detected; for each image in the image sequence to be detected, inputting the image into a plurality of pre-trained recognition models to obtain recognition results corresponding to the recognition models respectively, and determining whether a target object is present in the image or not based on the obtained recognition results; and taking the image which presents the target object in the image sequence to be detected as a target image, and outputting identification information containing the target image. This embodiment increases the flexibility of the identification of the target object.

Description

Method and apparatus for outputting information

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to the technical field of internet, and particularly relates to a method and a device for outputting information.

Background

The image recognition technology refers to a technology for processing, analyzing and understanding an image by using a computer to recognize various targets and objects in different modes. In general industrial use, an industrial camera is adopted to shoot pictures, and then software is utilized to further identify and process the pictures according to the gray level difference of the pictures. Thus, a target object (e.g., a person) may be tracked using image recognition techniques.

In the conventional method, face recognition is usually performed on an image only by using a face recognition model, and if a face of a target object is recognized, the target object can be determined to be present in the image.

Disclosure of Invention

The embodiment of the application provides a method and a device for outputting information.

In a first aspect, an embodiment of the present application provides a method for outputting information, where the method includes: acquiring an image sequence to be detected; for each image in the image sequence to be detected, inputting the image into a plurality of pre-trained recognition models to obtain recognition results corresponding to the recognition models respectively, and determining whether a target object exists in the image or not based on the obtained recognition results; and taking an image which presents a target object in the image sequence to be detected as a target image, and outputting identification information containing the target image.

In some embodiments, the plurality of recognition models includes a face recognition model for recognizing a face region of the target object and at least one local recognition model, each of the at least one local recognition model for recognizing a local region of the target object.

In some embodiments, for each image in the image sequence to be detected, inputting the image to a plurality of recognition models trained in advance to obtain recognition results respectively corresponding to the recognition models, including: and for each image in the image sequence to be detected, inputting the image into a face recognition model to obtain a face recognition result, and inputting the image into each local recognition model to obtain a local recognition result corresponding to each local recognition model.

In some embodiments, determining whether the target object is present in the image based on the obtained recognition result comprises: for each image in the image sequence to be detected, in response to determining that the face recognition result corresponding to the image indicates a face area of the image not presenting the target object, and the local recognition result corresponding to the image indicates that the number of local areas of the target object presented by the image is not less than a preset value, determining that the image presents the target object.

In some embodiments, determining whether the target object is present in the image based on the obtained recognition result comprises: for each image in the image sequence to be detected, determining that the image presents the target object in response to determining that the face recognition result corresponding to the image indicates a face area presenting the target object in the image.

In some embodiments, determining whether the target object is present in the image based on the obtained recognition result comprises: for each image in the image sequence to be detected, in response to determining that the face recognition result corresponding to the image indicates a face area of the image not presenting the target object, and the local recognition result corresponding to the image indicates that the number of local areas of the target object presented by the image is smaller than a preset value, determining that the image does not present the target object.

In some embodiments, the at least one local recognition model comprises at least one of: the device comprises a clothes color identification model, a clothes style identification model, a backpack color identification model, a backpack style identification model, a hairstyle identification model, a hat identification model, a glasses identification model, a height identification model and a body type identification model.

In a second aspect, an embodiment of the present application provides an apparatus for outputting information, including: an acquisition unit configured to acquire an image sequence to be detected; the image recognition device comprises an input unit, a recognition unit and a recognition unit, wherein the input unit is configured to input each image in an image sequence to be detected to a plurality of pre-trained recognition models to obtain recognition results corresponding to the recognition models respectively, and determine whether a target object appears in the image or not based on the obtained recognition results; and the output unit is configured to take an image which presents the target object in the image sequence to be detected as a target image and output identification information containing the target image.

In some embodiments, the input unit is further configured to: and for each image in the image sequence to be detected, inputting the image into a face recognition model to obtain a face recognition result, and inputting the image into each local recognition model to obtain a local recognition result corresponding to each local recognition model.

In some embodiments, the input unit is further configured to: for each image in the image sequence to be detected, in response to determining that the face recognition result corresponding to the image indicates a face area of the image not presenting the target object, and the local recognition result corresponding to the image indicates that the number of local areas of the target object presented by the image is not less than a preset value, determining that the image presents the target object.

In some embodiments, the input unit is further configured to: for each image in the image sequence to be detected, determining that the image presents the target object in response to determining that the face recognition result corresponding to the image indicates a face area presenting the target object in the image.

In some embodiments, the input unit is further configured to: for each image in the image sequence to be detected, in response to determining that the face recognition result corresponding to the image indicates a face area of the image not presenting the target object, and the local recognition result corresponding to the image indicates that the number of local areas of the target object presented by the image is smaller than a preset value, determining that the image does not present the target object.

In a third aspect, an embodiment of the present application provides a server, including: one or more processors; a storage device to store one or more programs that, when executed by one or more processors, cause the one or more processors to implement a method as in any embodiment of a method for outputting information.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements a method as in any one of the embodiments of the method for outputting information.

According to the method and the device for outputting information, the image sequence to be detected is obtained, then for each image in the image sequence to be detected, the image is input to the plurality of recognition models trained in advance, the recognition results corresponding to the recognition models are obtained, whether the target object exists in the image is determined based on the obtained recognition results, finally, the image, which shows the target object, in the image sequence to be detected is used as the target image, the recognition information containing the target image is output, and therefore the target object can be determined whether the target object exists in the image or not by combining the recognition results corresponding to the recognition models, and flexibility of recognition of the target object is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram for one embodiment of a method for outputting information, in accordance with the present application;

FIG. 3 is a schematic diagram of an application scenario of a method for outputting information according to the present application;

FIG. 4 is a schematic block diagram illustrating one embodiment of an apparatus for outputting information according to the present application;

FIG. 5 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which the method for outputting information or the apparatus for outputting information of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal apparatuses

101, 102, 103 may have installed thereon various communication client applications, such as a camera-like application, an image processing-like application, a search-like application, and the like. In addition, the

terminal devices

101, 102, and 103 may further be connected with an image capturing device such as a camera, and acquire an image captured by the image capturing device.

The

terminal devices

101, 102, 103 may be various electronic devices having display screens and supporting network communications, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server that provides various services, such as an image processing server that processes images uploaded by the

terminal apparatuses

101, 102, 103. The image processing server may perform processing such as analysis on the received image to be detected, and feed back a processing result (e.g., an optimized image) to the terminal device.

It should be noted that the image generation method provided in the embodiment of the present application is generally executed by the server 105, and accordingly, the image generation apparatus is generally provided in the server 105.

It should be noted that the local of the server 105 may also directly store the image to be detected, or directly obtain the image acquired by the image acquisition apparatus, in this case, the server 105 may directly extract the image to be detected acquired by the local or image acquisition apparatus for detection, in this case, the exemplary system architecture 100 may not have the

terminal devices

101, 102, 103 and the network 104.

It should be noted that the

terminal devices

101, 102, and 103 may also be installed with image processing applications, and the

terminal devices

101, 102, and 103 may also perform face detection on images to be detected based on the image processing applications, in this case, the image generation method may also be executed by the

terminal devices

101, 102, and 103, and accordingly, the image generation apparatuses may also be installed in the

terminal devices

101, 102, and 103. At this point, the exemplary system architecture 100 may not have the server 105 and the network 104.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for outputting information in accordance with the present application is shown. The method for outputting information comprises the following steps:

step 201, acquiring an image sequence to be detected.

In this embodiment, the electronic device on which the method for outputting information is executed may acquire an image sequence to be detected, where the image sequence to be detected may include a sequence composed of a plurality of images according to the order of image capturing time. In practice, the images in the image sequence to be detected may be acquired by an image acquisition device such as a monitoring camera.

Here, the electronic device may be connected to an image capturing device (e.g., a camera), and store an image captured by the image capturing device. The electronic equipment can select the images stored in a certain time period as the images to be detected, and summarize the images into the image sequence to be detected according to the sequence of the shooting time. At this time, the electronic device may directly obtain the image sequence to be detected from the local. The image sequence to be detected can be sent to the electronic equipment by other electronic equipment connected with the electronic equipment in a wired connection mode or a wireless connection mode. The other electronic devices may be connected to the image capturing device, and may acquire images captured by the image capturing device. The wireless connection mode may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other currently known or future developed wireless connection modes.

Step 202, for each image in the image sequence to be detected, inputting the image into a plurality of pre-trained recognition models to obtain recognition results corresponding to the recognition models respectively, and determining whether the target object appears in the image based on the obtained recognition results.

In this embodiment, for each image in the image sequence to be detected, the electronic device may first input the image into a plurality of recognition models trained in advance to obtain recognition results corresponding to the recognition models, and then may determine whether a target object (e.g., a person) is present in the image based on the obtained recognition results. The plurality of recognition models may be models for recognizing a plurality of parts of the target object, and may include, for example, a head recognition model for recognizing a head of the target object, a human body recognition model for recognizing a body of the target object, a clothes recognition model for recognizing clothing of the target object, a shoe recognition model for recognizing shoes of the target object, and the like. Each recognition model may be obtained by performing supervised training on a model (e.g., a Convolutional Neural Network (CNN)) capable of realizing an image recognition function based on a corresponding training sample by using a machine learning algorithm. For example, the training samples used by the head recognition model may include a head image of the target object and an annotation for characterizing the head image as the head image of the target object. The convolutional neural network may include a convolutional layer, a pooling layer, a fully-connected layer, and the like, wherein the convolutional layer may be used to extract image features, the pooling layer may be used to perform down-sampling (down sampling) on input information, and the fully-connected layer may be used to output a recognition result. In practice, a Convolutional Neural Network (CNN) is a feed-forward Neural Network whose artificial neurons can respond to a part of the surrounding cells within the coverage range, and has an excellent performance for image processing, and thus, can perform image recognition using the Convolutional Neural Network.

The electronic device may determine whether the target object is present in the image in various ways based on the obtained recognition result. As an example, if the recognition result indicates that the number of the parts of the image in which the target object is present is not less than a preset number (e.g., 3, which are the head, the shoe, and the clothing of the target object, respectively), it may be determined that the target object is present in the image; if the other result indicates that the number of the parts of the image in which the target object is present is smaller than the preset value, it may be determined that the target object is not present in the image.

In some optional implementations of the embodiment, the plurality of recognition models may include a face recognition model and at least one local recognition model, where the face recognition model may be used to recognize a face region of the target object, and each of the at least one local recognition model may be used to recognize a local region of the target object (for example, a clothes color, a clothes style, a backpack color, a backpack style, and the like). It should be noted that the at least one local recognition model may include, but is not limited to, at least one of the following: the device comprises a clothes color identification model, a clothes style identification model, a backpack color identification model, a backpack style identification model, a hairstyle identification model, a hat identification model, a glasses identification model, a height identification model and a body type identification model. Each local recognition model may be obtained by performing supervised training on a model (e.g., a Convolutional Neural Network (CNN)) capable of implementing an image recognition function based on a corresponding training sample by using a machine learning algorithm. At this time, for each image in the image sequence to be detected, the electronic device may determine whether the target object is present in the image by: firstly, the image can be input into the face recognition model to obtain a face recognition result; then, the electronic device may input the image to each local recognition model to obtain a local recognition result corresponding to each local recognition model. The electronic device may determine whether the target object is present in the image in various ways based on the obtained face recognition result and the local recognition result.

In some optional implementations of the embodiment, for each image in the sequence of images to be detected, in response to determining that the face recognition result corresponding to the image indicates a face area of the image not presenting the target object and that the local recognition result corresponding to the image indicates that the number of local areas of the target object presented by the image is not less than a preset number (e.g., 3), the electronic device may determine that the image presents the target object.

In some optional implementation manners of this embodiment, for each image in the sequence of images to be detected, the electronic device may determine that the image presents the target object in response to determining that the face recognition result corresponding to the image indicates a face area in the image in which the target object is presented.

In some optional implementations of this embodiment, for each image in the sequence of images to be detected, in response to determining that the face recognition result corresponding to the image indicates a face area of the image not presenting the target object, and the local recognition result corresponding to the image indicates that the number of local areas of the target object presented by the image is smaller than a preset number (e.g., 3), it may be determined that the image does not present the target object.

Step 203, taking the image presenting the target object in the image sequence to be detected as the target image, and outputting the identification information containing the target image.

In this embodiment, the electronic device may output the identification information including the target image by using, as the target image, an image in the sequence of images to be detected in which the target object is present. The identification information may further include information such as a photographing time and a photographing position of each target image. In practice, the area of the target image output by the electronic device where the presented target object is located may be identified by a highlighted border.

In some optional implementation manners of this embodiment, the electronic device may further determine the moving path of the target object based on the shooting time and the shooting position of each target image (for example, the position of the image capturing device that captures the image may be determined as the shooting position). As an example, the target images in the image sequence to be detected are a first target image, a second target image and a third target image, respectively, the first target image is captured at 9:00 and captured at a first position, the second target image is captured at 9:01 and captured at a second position, and the third target image is captured at 9:05 and captured at a third position, so that the moving path of the target object may be from the first position to the second position, and then from the second position to the third position. After determining the moving path of the target object, the electronic device may further output moving path information representing the moving path.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for outputting information according to the present embodiment. In the application scenario of fig. 3, the image processing server may first obtain an image sequence 301 to be detected, which is formed by images acquired by the monitoring camera, and then, for each image in the image sequence 301 to be detected, the image processing server may obtain, based on a plurality of recognition models trained in advance, a recognition result corresponding to each recognition model, and then, the image processing server may determine, based on the obtained recognition result, whether a target pedestrian (for example, a certain pedestrian or a certain prisoner that needs to perform pedestrian monitoring) is present in each image, and finally, the image processing server may take, as the target image 302, an image in the image sequence to be detected, which is present with the target pedestrian, and output the recognition information 303 including the target image 302.

In the method provided by the above embodiment of the application, the sequence of the images to be detected is obtained, then for each image in the sequence of the images to be detected, the image is input to the plurality of recognition models trained in advance, the recognition results corresponding to the recognition models respectively are obtained, whether the target object is present in the image is determined based on the obtained recognition results, finally, the image in the sequence of the images to be detected, which is present with the target object, is taken as the target image, and the recognition information including the target image is output, so that whether the target object is present in the image can be determined by synthesizing the recognition results corresponding to the plurality of recognition models, and the flexibility of recognition of the target object is improved.

With further reference to fig. 4, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for outputting information, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 4, the apparatus 400 for outputting information according to the present embodiment includes: an obtaining unit 401 configured to obtain an image sequence to be detected; an input unit 402, configured to input, for each image in the image sequence to be detected, the image to a plurality of pre-trained recognition models to obtain recognition results corresponding to the recognition models, and determine whether a target object is present in the image based on the obtained recognition results; an output unit 403 is configured to output, as a target image, an image in the sequence of images to be detected, in which the target object is present, and to output identification information including the target image.

In some optional implementations of the embodiment, the plurality of recognition models may include a face recognition model and at least one local recognition model, where the face recognition model may be used to recognize a face region of the target object, and each of the at least one local recognition model may be used to recognize a local region of the target object.

In some optional implementation manners of this embodiment, the input unit 402 may be further configured to, for each image in the image sequence to be detected, input the image into the face recognition model to obtain a face recognition result, and input the image into each local recognition model to obtain a local recognition result corresponding to each local recognition model.

In some optional implementation manners of this embodiment, the input unit 402 may be further configured to, for each image in the sequence of images to be detected, determine that the image presents with the target object in response to determining that the face recognition result corresponding to the image indicates a face area in the image that does not present with the target object, and that the local recognition result corresponding to the image indicates that the number of local areas of the target object presented by the image is not less than a preset value.

In some optional implementation manners of this embodiment, the input unit 402 may be further configured to, for each image in the sequence of images to be detected, determine that the image presents the target object in response to determining that the face recognition result corresponding to the image indicates a face area presenting the target object in the image.

In some optional implementation manners of this embodiment, the input unit 402 may be further configured to, for each image in the sequence of images to be detected, determine that the image does not present the target object in response to determining that the face recognition result corresponding to the image indicates a face area in the image that does not present the target object, and that the local recognition result corresponding to the image indicates that the number of local areas of the target object presented by the image is smaller than a preset value.

In some optional implementations of the embodiment, the at least one local recognition model may include at least one of a clothes color recognition model, a clothes style recognition model, a backpack color recognition model, a backpack style recognition model, a hair style recognition model, a hat recognition model, a glasses recognition model, a height recognition model, and a body type recognition model.

In the apparatus provided by the above embodiment of the present application, the obtaining unit 401 obtains an image sequence to be detected, the input unit 402 inputs each image in the image sequence to be detected to a plurality of pre-trained recognition models to obtain recognition results corresponding to the respective recognition models, and determines whether a target object is present in the image based on the obtained recognition results, and the output unit 403 uses an image in the image sequence to be detected, which is present with the target object, as a target image, and outputs recognition information including the target image, so that it is possible to determine whether the target object is present in the image by synthesizing the recognition results corresponding to the plurality of recognition models, and flexibility of recognition of the target object is improved.

Referring now to FIG. 5, shown is a block diagram of a computer system 500 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

The following components are connected to the I/O interface 505: an input portion 506 including a touch screen, a touch pad, or the like; an output portion 507 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a semiconductor memory or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program performs the above-described functions defined in the method of the present application when executed by the Central Processing Unit (CPU) 501. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, an input unit, and an output unit. The names of these units do not in some cases form a limitation on the unit itself, for example, the acquisition unit may also be described as a "unit acquiring a sequence of images to be detected".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring an image sequence to be detected; for each image in the image sequence to be detected, inputting the image into a plurality of pre-trained recognition models to obtain recognition results corresponding to the recognition models respectively, and determining whether a target object is present in the image or not based on the obtained recognition results; and taking the image which presents the target object in the image sequence to be detected as a target image, and outputting identification information containing the target image.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for outputting information, comprising:

acquiring an image sequence to be detected;

for each image in the image sequence to be detected, inputting the image into a plurality of pre-trained recognition models to obtain recognition results corresponding to the recognition models respectively, determining whether a target object is present in the image based on the obtained recognition results, wherein the plurality of recognition models are used for recognizing a plurality of kinds of information of the target object, the plurality of kinds of information including a local area of the target object, each recognition model of the plurality of recognition models is for recognizing a local region of the target object, the recognition result is used for indicating whether a part of the target object is present in the image, the training sample used by the recognition model comprises a local image of the target object and an annotation used for characterizing the local image as the local image of the target object, and the mode for determining whether the target object is present in the image comprises: if the identification result indicates that the number of the local areas presenting the target object in the image is identified to be not less than a preset value, determining that the target object is presented in the image;

and taking an image which presents the target object in the image sequence to be detected as a target image, and outputting identification information containing the target image.

2. The method for outputting information as claimed in claim 1, wherein the plurality of recognition models includes a face recognition model for recognizing a face region of the target object and at least one local recognition model, each of the at least one local recognition model for recognizing a local region of the target object.

3. The method for outputting information according to claim 2, wherein the inputting of each image in the image sequence to be detected to a plurality of pre-trained recognition models for obtaining recognition results corresponding to the respective recognition models comprises:

and for each image in the image sequence to be detected, inputting the image into the face recognition model to obtain a face recognition result, and inputting the image into each local recognition model to obtain a local recognition result corresponding to each local recognition model.

4. The method for outputting information according to claim 3, wherein the determining whether the target object is present in the image based on the obtained recognition result comprises:

and for each image in the image sequence to be detected, determining that the image presents the target object in response to determining that the face recognition result corresponding to the image indicates a face area which does not present the target object in the image, and that the local recognition result corresponding to the image indicates that the number of local areas of the target object presented by the image is not less than a preset value.

5. The method for outputting information according to claim 3, wherein the determining whether the target object is present in the image based on the obtained recognition result comprises:

and for each image in the image sequence to be detected, determining that the image presents the target object in response to determining that the face recognition result corresponding to the image indicates the face area presenting the target object in the image.

6. The method for outputting information according to claim 3, wherein the determining whether the target object is present in the image based on the obtained recognition result comprises:

and for each image in the image sequence to be detected, determining that the target object is not presented in the image in response to determining that the face recognition result corresponding to the image indicates a face area in the image in which the target object is not presented and that the number of local areas of the target object presented by the image is indicated to be less than a preset numerical value by the local recognition result corresponding to the image.

7. Method for outputting information according to one of claims 2-6, wherein the at least one local recognition model comprises at least one of: the device comprises a clothes color identification model, a clothes style identification model, a backpack color identification model, a backpack style identification model, a hairstyle identification model, a hat identification model, a glasses identification model, a height identification model and a body type identification model.

8. An apparatus for outputting information, comprising:

an acquisition unit configured to acquire an image sequence to be detected;

an input unit, configured to input, for each image in the image sequence to be detected, the image to a plurality of recognition models trained in advance, obtain recognition results corresponding to the recognition models respectively, and determine whether a target object is present in the image based on the obtained recognition results, where the recognition models are used to recognize a plurality of information of the target object, the information includes a local area of the target object, each recognition model in the recognition models is used to recognize a local area of the target object, the recognition result is used to indicate whether a local area of the target object is present in the image, and a training sample used by the recognition model includes a local image of the target object and a label used to represent the local image as the target object, the manner of determining whether the target object is present in the image includes: if the identification result indicates that the number of the local areas presenting the target object in the image is identified to be not less than a preset value, determining that the target object is presented in the image;

and the output unit is configured to take an image which presents the target object in the image sequence to be detected as a target image and output identification information containing the target image.

9. The apparatus for outputting information as recited in claim 8, wherein the plurality of recognition models includes a face recognition model for recognizing a face region of the target object and at least one local recognition model, each of the at least one local recognition model for recognizing a local region of the target object.

10. The apparatus for outputting information of claim 9, wherein the input unit is further configured to:

11. The apparatus for outputting information of claim 10, wherein the input unit is further configured to:

12. The apparatus for outputting information of claim 10, wherein the input unit is further configured to:

13. The apparatus for outputting information of claim 10, wherein the input unit is further configured to:

14. The apparatus for outputting information according to one of claims 9-13, wherein the at least one local recognition model comprises at least one of: the device comprises a clothes color identification model, a clothes style identification model, a backpack color identification model, a backpack style identification model, a hairstyle identification model, a hat identification model, a glasses identification model, a height identification model and a body type identification model.

15. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

16. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1-7.