CN109842738B

CN109842738B - Method and apparatus for photographing image

Info

Publication number: CN109842738B
Application number: CN201910086008.7A
Authority: CN
Inventors: 李华夏
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Douyin Vision Co Ltd; Douyin Vision Beijing Co Ltd
Priority date: 2019-01-29
Filing date: 2019-01-29
Publication date: 2022-05-24
Anticipated expiration: 2039-01-29
Also published as: CN109842738A

Abstract

Embodiments of the present disclosure disclose methods and apparatus for capturing images. One embodiment of the method comprises: acquiring a target image sequence which is played on a target interface and obtained by shooting a target person, wherein the target image sequence comprises an image displayed on the target interface at present; detecting a moving target of a target image sequence, and determining action state information corresponding to images included in the target image sequence respectively, wherein the action state information is used for representing the action state of a target person in image display time, and the action state comprises a moving state and a static state; and generating an instruction for controlling the shooting of the target camera in response to detecting that the target person is converted from the motion state to the static state at the current time. This embodiment improves the flexibility of controlling the camera to take images.

Description

Method and apparatus for photographing image

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a method and a device for shooting images.

Background

With the development of computer technology, people can shoot people by using various devices such as mobile phones and tablet computers. The existing method for starting image shooting usually realizes automatic shooting of people through manual operation or setting automatic shooting time.

Disclosure of Invention

Embodiments of the present disclosure propose methods and apparatuses for capturing images.

In a first aspect, an embodiment of the present disclosure provides a method for capturing an image, the method including: acquiring a target image sequence which is played on a target interface and obtained by shooting a target person, wherein the target image sequence comprises an image displayed on the target interface at present; detecting a moving target of a target image sequence, and determining action state information corresponding to images included in the target image sequence respectively, wherein the action state information is used for representing the action state of a target person in image display time, and the action state comprises a moving state and a static state; and generating an instruction for controlling the shooting of the target camera in response to detecting that the target person is converted from the motion state to the static state at the current time.

In some embodiments, performing moving object detection on a target image sequence, and determining motion state information corresponding to images included in the target image sequence respectively includes: acquiring a speed threshold corresponding to a target image sequence; for an image in a target image sequence, determining a moving speed corresponding to the image; and determining action state information corresponding to the images included in the target image sequence respectively based on the speed threshold and the determined moving speed.

In some embodiments, the speed threshold is obtained as follows: for an image in the target image sequence, determining a human body image from the image, and determining the size of the human body image in the image; determining the size of the human body image corresponding to the target image sequence according to the determined size; and determining a speed threshold corresponding to the target image sequence based on a preset corresponding relation between the size of the human body image and the speed threshold.

In some embodiments, determining motion state information corresponding to images included in the target image sequence based on the speed threshold and the determined moving speed includes: smoothing the determined moving speed to obtain smooth moving speeds corresponding to the images in the target image sequence respectively; and determining action state information corresponding to the images included in the target image sequence respectively based on the speed threshold and the determined movement speed after smoothing.

In some embodiments, moving object detection of a sequence of object images includes: detecting a moving target of the target image sequence by using at least one of the following methods: optical flow method, background segmentation method, and interframe difference method.

In some embodiments, moving object detection of a sequence of object images includes: and detecting the moving object of the target image sequence by combining an optical flow method and a background segmentation method.

In a second aspect, an embodiment of the present disclosure provides an apparatus for capturing an image, the apparatus including: the system comprises an acquisition unit, a display unit and a display unit, wherein the acquisition unit is configured to acquire a target image sequence which is played on a target interface and is obtained by shooting a target person, and the target image sequence comprises images currently displayed on the target interface; the image processing device comprises a determining unit, a processing unit and a processing unit, wherein the determining unit is configured to detect moving targets of a target image sequence and determine motion state information corresponding to images included in the target image sequence respectively, the motion state information is used for representing motion states of a target person in image display time, and the motion states comprise a motion state and a static state; and the generating unit is configured to generate an instruction for controlling shooting of the target camera in response to detection that the target person is converted from the motion state to the static state at the current time.

In some embodiments, the determining unit comprises: an acquisition module configured to acquire a speed threshold corresponding to a target image sequence; the first determining module is configured to determine the moving speed corresponding to the image in the target image sequence; and the second determination module is configured to determine action state information corresponding to the images included in the target image sequence respectively based on the speed threshold and the determined moving speed.

In some embodiments, the second determining module comprises: the processing submodule is configured to carry out smoothing processing on the determined moving speed to obtain the moving speed after smoothing corresponding to the images in the target image sequence; and the determining sub-module is configured to determine action state information corresponding to the images included in the target image sequence respectively based on the speed threshold and the determined movement speed after smoothing.

In some embodiments, the determining unit is further configured to: detecting a moving object of the target image sequence by using at least one of the following devices: optical flow method, background segmentation method, and interframe difference method.

In some embodiments, the determining unit is further configured to: and detecting the moving object of the target image sequence by combining an optical flow method and a background segmentation method.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon; when executed by one or more processors, cause the one or more processors to implement a method as described in any implementation of the first aspect.

In a fourth aspect, embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, which computer program, when executed by a processor, implements the method as described in any of the implementations of the first aspect.

According to the method and the device for shooting the images, the target image sequence which is played on the target interface and is obtained by shooting the target person is obtained, the target image sequence is subjected to moving target detection, the current action state of the target person is determined, and the command for controlling the target camera to shoot is generated in response to the fact that the target person is detected to be converted from the moving state to the static state at the current time, so that the purpose that the camera is controlled to shoot the images through the action of the identified person is achieved, manual control is not needed, and the flexibility of controlling the camera to shoot the images is improved.

Drawings

Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for capturing an image, according to an embodiment of the present disclosure;

FIG. 3 is a schematic illustration of one application scenario of a method for capturing images according to an embodiment of the present disclosure;

FIG. 4 is a flow diagram of yet another embodiment of a method for capturing an image according to an embodiment of the present disclosure;

FIG. 5 is a schematic block diagram of one embodiment of an apparatus for capturing images according to an embodiment of the present disclosure;

FIG. 6 is a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant disclosure and are not limiting of the disclosure. It should be noted that, for the convenience of description, only the parts relevant to the related disclosure are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 of a method for capturing an image or an apparatus for capturing an image to which embodiments of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as an image processing application, a video playing application, social platform software, and the like, may be installed on the

terminal devices

101, 102, and 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal apparatuses

101, 102, 103 are hardware, various electronic apparatuses are possible. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the above-described electronic apparatuses. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server that provides various services, such as a background image processing server that processes a sequence of images displayed on the

terminal devices

101, 102, 103. The background image processing server may process the acquired sequence of images and generate a processing result (e.g., an instruction for controlling the shooting by the target camera).

It should be noted that the method for capturing an image provided by the embodiment of the present disclosure may be executed by the server 105, and may also be executed by the

terminal devices

101, 102, and 103, and accordingly, the apparatus for capturing an image may be disposed in the server 105, and may also be disposed in the

terminal devices

101, 102, and 103.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for capturing an image according to the present disclosure is shown. The method for capturing images comprises the following steps:

step 201, a target image sequence which is played on a target interface and obtained by shooting a target person is obtained.

In this embodiment, an executing subject (for example, a server or a terminal device shown in fig. 1) of the method for capturing images may acquire a target image sequence, which is played on a target interface and is obtained by capturing a target person, from a remote place or a local place through a wired connection or a wireless connection. The target interface may be an interface for displaying an image obtained by shooting a target person. For example, the target interface may be an interface of the above-described application installed on the execution main body for capturing an image. The target person may be a person for whom an image is taken, for example, the target person may be a user who uses the above-described execution subject for self-timer shooting. The target image sequence may be a sequence of images for which moving object detection is to be performed. In general, the target image sequence may include images that are part of a sequence of images taken of the target person, the target image sequence including images currently displayed on the target interface. As an example, the target image sequence may include a preset number of images including an image currently displayed on the target interface.

Step 202, performing moving object detection on the target image sequence, and determining motion state information corresponding to each image included in the target image sequence.

In this embodiment, the executing entity may perform moving object detection on the target image sequence, and determine motion state information corresponding to each of the images included in the target image sequence. The action state information is used for representing the action state of the target person at the image display time, and the action state comprises a motion state and a static state. As an example, assume that the target image sequence comprises two images, each of which corresponds to one motion state. The action state information may include, but is not limited to, information in at least one of the following forms: numbers, words, symbols, etc. For example, when the action state information is a number "1", the target person is represented as a motion state; when the motion state information is a number "0", the target character is represented as a stationary state. The image display time may be a display time corresponding to an image displayed on the target interface.

In general, for an image in the target image sequence, the motion state corresponding to the image may be determined according to a moving distance of a region composed of pixels moving at the target interface (for example, the moving distance may be a maximum moving distance among the moving distances of each pixel in the region composed of the pixels moving, or may be an average value of the moving distances of the respective pixels) in the image with respect to a target image before the image (which may be an image adjacent to the image, or an image spaced from the image by a preset number of images). For example, if the moving distance is greater than or equal to a preset distance threshold, the motion state corresponding to the image is determined to be a motion state. Or, determining the moving speed according to the moving distance and the playing time difference between the image and the target image, and if the moving speed is greater than or equal to a preset speed threshold value, determining that the action state corresponding to the image is a motion state.

The execution subject described above may perform moving object detection on a sequence of object images in various ways. Optionally, the executing body may perform moving object detection according to at least one of the following existing methods: optical flow, background segmentation, interframe difference, and the like.

In some optional implementations of the embodiment, the executing body may perform moving object detection on the target image sequence by combining an optical flow method and a background segmentation method. The optical flow method can be used for detecting the instantaneous speed of a space moving object mapped to pixels included in an image, and is a method for determining the relation between adjacent images by using the change of the pixels in a time domain in an image sequence and the correlation between the adjacent images so as to calculate the motion information of the object in the interval time of the adjacent images. The background segmentation method extracts a moving target area by using difference operation of different images. Background segmentation generally performs a difference operation between a current image and a continuously updated background image, and extracts a moving target region from the obtained difference image.

In general, the optical flow method detects the speed of a pixel corresponding to a moving object, and has high detection speed and high accuracy. On the other hand, since the background segmentation method requires background subtraction, it has a certain time delay in detecting a moving object compared to the optical flow method. By combining the two methods, false triggering of the generation of an instruction for controlling the photographing of the target camera due to noise in the image or due to small-amplitude movement of the target person can be reduced. Specifically, the above two moving object detection methods may be combined in various ways. As an example, a detected combination of pixels for characterizing a moving target may be determined as a foreground image by using an optical flow method, the foreground image is further detected by using a background segmentation method, and if the foreground image of the currently displayed image moves on a target interface relative to the foreground image of the previous image, the motion state corresponding to the currently displayed image is determined as a motion state, otherwise, the motion state is a static state. Or respectively determining the action state corresponding to the currently displayed image through an optical flow method and a background segmentation method, and if the action state corresponding to the currently displayed image is detected to be the motion state by both methods, determining the action state corresponding to the currently displayed image to be the motion state; and if the two methods both detect that the action state corresponding to the currently displayed image is the static state, determining that the action state corresponding to the currently displayed image is the static state.

And step 203, generating an instruction for controlling the target camera to shoot in response to detecting that the target person is converted from the motion state to the static state at the current time.

In this embodiment, the executing agent may generate an instruction for controlling the target camera to shoot in response to detecting that the target person is converted from the moving state to the stationary state at the current time. Specifically, the executing body may respond to detection that the action state corresponding to the image currently displayed on the target interface is a static state, and the action state corresponding to the previous image of the image currently displayed on the target interface is a motion state. And determining that the target person is converted into a static state from a motion state at the current time. The form of the above instruction for controlling the target camera to shoot may include, but is not limited to, at least one of the following: numbers, words, symbols, level signals, etc. The target camera may be a camera for capturing a target person (for example, an image or a video may be captured). The target camera may be disposed on the execution body, and in this case, the execution body may control the target camera to shoot using the instruction (for example, when the instruction is generated, shooting of an image or a video is triggered). The target camera may be provided on an electronic device that is communicatively connected to the execution main body. At this time, the execution body may transmit the instruction to the electronic apparatus, and the electronic apparatus may control the target camera to shoot using the received instruction (for example, trigger to shoot an image or a video when receiving the instruction).

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for capturing an image according to the present embodiment. In the application scenario of fig. 3, the electronic device 301 maps the image of the target person 302 onto a target interface (e.g., an interface currently displayed on a screen of the electronic device 301) via a camera disposed thereon. The electronic device 301 first obtains a target image sequence 303 (e.g., a preset number of image frames including a currently displayed image) played on a target interface. Then, the electronic device 301 performs moving object detection on the target image sequence 303, and determines motion state information corresponding to each of the images included in the target image sequence 303. When the motion state information is a number "1", the representation target person 302 is in a motion state, and when the motion state information is a number "0", the representation target person 302 is in a still state. Finally, in response to detecting that the motion state information corresponding to the image 3031 currently displayed on the target interface is "0" and the motion state information corresponding to the previous image 3032 adjacent to the image 3031 is "1", the electronic device 301 determines that the target person 302 is converted from the motion state to the static state at the current time, and generates an instruction 304 for controlling the target camera to shoot, and the electronic device controls the camera to shoot a picture of the target person according to the instruction 304.

According to the method provided by the embodiment of the disclosure, the target image sequence played on the target interface and obtained by shooting the target person is obtained, the target image sequence is subjected to moving target detection, the current action state of the target person is determined, and the instruction for controlling the target camera to shoot is generated in response to the fact that the target person is detected to be converted from the motion state to the static state at the current time, so that the aim of controlling the camera to shoot the image through the action of the identified person is achieved, manual control is not needed, and the flexibility of controlling the camera to shoot the image is improved.

With further reference to fig. 4, a flow 400 of yet another embodiment of a method for capturing an image is shown. The flow 400 of the method for capturing an image comprises the steps of:

step 401, a target image sequence played on a target interface and obtained by shooting a target person is obtained.

In this embodiment, step 401 is substantially the same as step 201 in the corresponding embodiment of fig. 2, and is not described here again.

Step 402, a speed threshold corresponding to a target image sequence is obtained.

In this embodiment, the execution subject of the method for capturing images (e.g., the server or the terminal device shown in fig. 1) may acquire the speed threshold corresponding to the target image sequence from a remote location or from a local location by a wired connection or a wireless connection. The speed threshold and the corresponding relationship between the speed threshold and the target image sequence may be preset by a technician, or may be determined by the execution subject in advance.

In some optional implementations of this embodiment, the speed threshold may be obtained as follows:

first, for an image in a target image sequence, a human body image is determined from the image, and the size of the human body image in the image is determined. Specifically, as an example, the execution body described above may determine a human body image from the image using an existing object detection model. The target detection model may be a model obtained based on training of an existing target detection network (e.g., ssd (single Shot multi box detector), dpm (deformable Part model), etc.). The object detection model may determine the position of the human body image from the image input thereto. In general, the object detection model may output coordinate information that may characterize the position of the human body image in the image. For example, the coordinate information may include two diagonal coordinates of a rectangular frame, and a rectangular region image, that is, a human body image, may be determined in the image by the two diagonal coordinates. The size of the human body image may include, but is not limited to, at least one of the following: the length, width, diagonal length, etc. of the rectangle containing the human body image.

And then, determining the size of the human body image corresponding to the target image sequence according to the determined size. As an example, the execution subject described above may determine an average value of the determined respective sizes as a human image size corresponding to the target image sequence. Alternatively, the execution subject may select a size (for example, a size corresponding to a randomly selected or selected currently displayed image) from the determined sizes as the size of the human body image corresponding to the target image sequence.

And finally, determining a speed threshold corresponding to the target image sequence based on the preset corresponding relation between the human body image size and the speed threshold. The correspondence between the human body image size and the speed threshold value may be preset. For example, the correspondence relationship between the human body image size and the speed threshold value may be characterized by a preset two-dimensional table. Generally, the larger the size of the human body image, the closer the distance between the target person and the camera is, the larger the corresponding speed threshold value is, and the smaller the size of the human body image, the farther the distance between the target person and the camera is, the smaller the corresponding speed threshold value is. Through the implementation mode, the camera can be controlled by the target person through the same action amplitude when the distances between the target person and the camera are different.

In step 403, for an image in the target image sequence, a moving speed corresponding to the image is determined.

In this embodiment, for an image in the target image sequence, the execution subject may determine a moving speed corresponding to the image. Specifically, for a certain image, the moving speed corresponding to the image may be determined according to a moving distance of a region composed of pixels moving on a target interface (for example, the moving distance may be a maximum moving distance of the moving distances of each pixel in the region composed of the pixels moving, or may be an average value of the moving distances of the pixels), and a playing time difference between the image and the target image, in relation to a target image before the image (which may be an image adjacent to the image, or an image spaced from the image by a preset number of images), and the playing time difference between the image and the target image (i.e., the moving speed is a quotient of the moving distance and the playing time difference).

Step 404, determining motion state information corresponding to the images included in the target image sequence based on the speed threshold and the determined moving speed.

In this embodiment, the execution subject may determine the motion state information corresponding to each of the images included in the target image sequence based on the speed threshold and the determined moving speed. Specifically, for an image in the target image sequence, if the moving speed corresponding to the image is greater than or equal to the speed threshold, the motion state information corresponding to the image is determined to be information for representing a motion state, and if the moving speed corresponding to the image is less than the speed threshold, the motion state information corresponding to the image is determined to be information for representing a still state.

In some optional implementations of this embodiment, the executing body may determine the action state information corresponding to each of the images included in the target image sequence according to the following steps:

firstly, smoothing the determined moving speed to obtain the smooth moving speed corresponding to each image in the target image sequence. Specifically, the execution body may perform the smoothing process on the determined moving speed in various ways. For example, the execution subject may perform a smoothing process on the determined moving speed by using an existing smoothing algorithm such as a moving window least squares polynomial smoothing algorithm, a rough penalty algorithm, or the like.

As an example, the execution body may perform smoothing processing on the determined moving speed using an exponential smoothing algorithm. The exponential smoothing algorithm may correlate the currently generated data with all of the previously generated data, i.e., the currently generated data is determined from the previously generated data and the closer the data is to the currently generated data, the more weight it takes to determine the currently generated data. Therefore, the accuracy of determining the moving speed corresponding to the image can be improved while the sudden change of the moving speed is eliminated.

Then, based on the speed threshold and the determined post-smoothing movement speed, motion state information corresponding to each of the images included in the target image sequence is determined. Specifically, for an image in the target image sequence, if the movement speed after smoothing corresponding to the image is greater than or equal to the speed threshold, it is determined that the motion state information corresponding to the image is information for representing a motion state, and if the movement speed after smoothing corresponding to the image is less than the speed threshold, it is determined that the motion state information corresponding to the image is information for representing a still state.

In response to detecting that the target person is converted from the moving state to the static state at the current time, an instruction for controlling the target camera to shoot is generated in step 405.

In this embodiment, step 405 is substantially the same as step 203 in the corresponding embodiment of fig. 2, and is not described herein again.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for capturing images in the present embodiment highlights the step of determining the motion state information corresponding to each image included in the target image sequence based on the speed threshold and the moving speed corresponding to each image. Therefore, the scheme described in the embodiment can flexibly determine the motion state information according to the speed threshold, thereby being beneficial to improving the accuracy and flexibility of shooting by controlling the target camera according to the motion state of the person.

With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for capturing an image, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable in various electronic devices.

As shown in fig. 5, the apparatus 500 for capturing an image of the present embodiment includes: an obtaining unit 501 configured to obtain a target image sequence, which is played on a target interface and obtained by shooting a target person, wherein the target image sequence includes an image currently displayed on the target interface; a determining unit 502 configured to perform moving object detection on the target image sequence, and determine motion state information corresponding to images included in the target image sequence, wherein the motion state information is used for representing motion states of a target person at an image display time, and the motion states include a motion state and a static state; a generating unit 503 configured to generate an instruction for controlling the target camera to shoot in response to detecting that the target person is converted from the moving state to the stationary state at the current time.

In this embodiment, the obtaining unit 501 may obtain the target image sequence, which is played on the target interface and obtained by shooting the target person, from a remote location or a local location through a wired connection manner or a wireless connection manner. The target interface may be an interface for displaying an image obtained by shooting a target person. For example, the target interface may be an interface of the above-described application installed on the execution main body for capturing an image. The target person may be a person on which an image is captured, and for example, the target person may be a user who performs self-shooting using the execution subject described above. The target image sequence may be a sequence of images for which moving object detection is to be performed. In general, the target image sequence may include images that are part of a sequence of images taken of the target person, the target image sequence including images currently displayed on the target interface. As an example, the target image sequence may include a preset number of images including an image currently displayed on the target interface.

In this embodiment, the determining unit 502 may perform moving object detection on the target image sequence, and determine motion state information corresponding to each of the images included in the target image sequence. The action state information is used for representing the action state of the target person at the image display time, and the action state comprises a motion state and a static state. As an example, assume that the target image sequence comprises two images, each of which corresponds to one motion state. The action state information may include, but is not limited to, information in at least one of the following forms: numbers, words, symbols, etc. For example, when the action state information is a number "1", the target person is represented as a motion state; when the motion state information is a number "0", the target character is represented as a stationary state. The image display time may be a display time corresponding to an image displayed on the target interface.

In general, for an image in the target image sequence, the motion state corresponding to the image may be determined according to a moving distance of a region composed of pixels moving at the target interface (for example, the moving distance may be a maximum moving distance among the moving distances of each pixel in the region composed of the pixels moving, or may be an average value of the moving distances of the respective pixels) in the image with respect to a target image before the image (which may be an image adjacent to the image, or an image spaced from the image by a preset number of images). For example, if the moving distance is greater than or equal to a preset distance threshold, the motion state corresponding to the image is determined to be a motion state. Or, determining the moving speed according to the moving distance and the playing time difference between the image and the target image, and if the moving speed is greater than or equal to a preset speed threshold, determining that the action state corresponding to the image is a motion state.

The determination unit 502 described above may perform moving object detection on the target image sequence in various ways. Optionally, the determining unit 502 may perform moving object detection according to at least one of the following existing methods: optical flow, background segmentation, interframe difference, and the like.

In this embodiment, the generation unit 503 may generate an instruction for controlling the target camera to shoot in response to detecting that the target person is converted from the moving state to the stationary state at the current time. Specifically, the generating unit 503 may respond to the detection that the motion state corresponding to the image currently displayed on the target interface is a static state, and the motion state corresponding to the image previous to the image currently displayed on the target interface is a motion state. And determining that the target person is converted into a static state from a motion state at the current time. The form of the above instruction for controlling the target camera to shoot may include, but is not limited to, at least one of the following: numbers, words, symbols, level signals, etc. The target camera may be a camera for shooting a target person (for example, an image or a video may be shot). The target camera may be disposed on the apparatus 500, and in this case, the apparatus 500 may control the target camera to shoot using the instruction (for example, when the instruction is generated, shooting of an image or a video is triggered). The target camera may also be disposed on an electronic device communicatively coupled to the apparatus 500. At this time, the apparatus 500 may send the instruction to the electronic device, and the electronic device may control the target camera to shoot using the received instruction (for example, when receiving the instruction, triggering to shoot an image or a video).

In some optional implementations of this embodiment, the determining unit 502 may include: an acquisition module (not shown in the figures) configured to acquire a speed threshold corresponding to a sequence of target images; a first determining module (not shown in the figure) configured to determine, for an image in the target image sequence, a corresponding moving speed of the image; and a second determining module (not shown in the figures) configured to determine motion state information corresponding to the images included in the target image sequence respectively based on the speed threshold and the determined moving speed.

In some optional implementations of this embodiment, the speed threshold may be obtained as follows: for an image in the target image sequence, determining a human body image from the image, and determining the size of the human body image in the image; determining the size of the human body image corresponding to the target image sequence according to the determined size; and determining a speed threshold corresponding to the target image sequence based on a preset corresponding relation between the size of the human body image and the speed threshold.

In some optional implementations of this embodiment, the second determining module may include: the processing submodule is configured to carry out smoothing processing on the determined moving speed to obtain the moving speed after smoothing corresponding to the images in the target image sequence; and the determining sub-module is configured to determine action state information corresponding to the images included in the target image sequence respectively based on the speed threshold and the determined movement speed after smoothing.

In some optional implementations of the present embodiment, the determining unit 502 may be further configured to: detecting a moving object of the target image sequence by using at least one of the following devices: optical flow method, background segmentation method, and interframe difference method.

In some optional implementations of this embodiment, the determining unit 502 may be further configured to: and detecting the moving object of the target image sequence by combining an optical flow method and a background segmentation method.

According to the device provided by the above embodiment of the disclosure, by acquiring the target image sequence played on the target interface and obtained by shooting the target person, the target image sequence is subjected to moving target detection, the current action state of the target person is determined, and in response to the detection that the target person is converted from the motion state to the static state at the current time, the command for controlling the target camera to shoot is generated, so that the purpose that the camera is controlled to shoot images through the action of the identified person is achieved, manual control is not needed, and the flexibility of controlling the camera to shoot images is improved.

Referring now to fig. 6, a schematic diagram of an electronic device (e.g., the server or terminal device of fig. 1) 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, memory; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may be separate and not incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a target image sequence which is played on a target interface and obtained by shooting a target person, wherein the target image sequence comprises an image displayed on the target interface at present; detecting a moving target of a target image sequence, and determining action state information corresponding to images included in the target image sequence respectively, wherein the action state information is used for representing the action state of a target person in image display time, and the action state comprises a moving state and a static state; and generating an instruction for controlling the shooting of the target camera in response to detecting that the target person is converted from the motion state to the static state at the current time.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a determination unit, and a generation unit. The names of these units do not constitute a limitation to the unit itself in some cases, and for example, the acquisition unit may also be described as a "unit that acquires a target image sequence captured by a target person played on the target interface".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A method for capturing an image, comprising:

acquiring a target image sequence which is played on a target interface and obtained by shooting a target person, wherein the target image sequence comprises an image displayed on the target interface at present;

detecting a moving object of the target image sequence, and determining action state information corresponding to images included in the target image sequence respectively, including: determining a detected combination of pixels for representing a moving target as a foreground image by using an optical flow method, and further detecting the foreground image by using a background segmentation method; if the foreground image of the currently displayed image moves on the target interface relative to the foreground image of the previous image, determining that the action state corresponding to the currently displayed image is a motion state, otherwise, determining that the action state is a static state, wherein the action state information is used for representing the action state of the target person in the image display time, and the action state comprises a motion state and a static state;

and generating an instruction for controlling shooting of a target camera in response to detection that the target person is converted from a motion state to a static state at the current time.

2. An apparatus for capturing an image, comprising:

the system comprises an acquisition unit, a display unit and a display unit, wherein the acquisition unit is configured to acquire a target image sequence which is played on a target interface and is obtained by shooting a target person, and the target image sequence comprises images currently displayed on the target interface;

the determining unit is configured to perform moving object detection on the target image sequence, and determine action state information corresponding to images included in the target image sequence respectively, and comprises: determining a detected combination of pixels for representing a moving target as a foreground image by using an optical flow method, and further detecting the foreground image by using a background segmentation method; if the foreground image of the currently displayed image moves on the target interface relative to the foreground image of the previous image, determining that the action state corresponding to the currently displayed image is a motion state, otherwise, determining that the action state is a static state, wherein the action state information is used for representing the action state of the target person in the image display time, and the action state comprises a motion state and a static state;

a generating unit configured to generate an instruction for controlling the target camera to shoot in response to detecting that the target person is converted from a moving state to a stationary state at a current time.

3. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-2.

4. A computer-readable medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method of any one of claims 1-2.