WO2020093866A1

WO2020093866A1 - Photography guiding method and apparatus, mobile terminal and storage medium

Info

Publication number: WO2020093866A1
Application number: PCT/CN2019/112566
Authority: WO
Inventors: 张渊; 郑文
Original assignee: 北京达佳互联信息技术有限公司
Priority date: 2018-11-05
Filing date: 2019-10-22
Publication date: 2020-05-14
Also published as: CN109040605A

Abstract

A photography guiding method and apparatus, a mobile terminal and a storage medium. The method comprises: acquiring an indication image including an object in a photography scenario; determining, based on a pre-trained convolutional neural network, a salient object region in the indication image, wherein the convolutional neural network is obtained by means of training according to multiple sample images and a marking image corresponding to each of the sample images, and the marking image corresponding to each of the sample images marks a salient object region in the sample image; and outputting, based on the determined salient object region, guidance information for guiding a user to compose a picture. By means of the photography guiding method and apparatus, the mobile terminal and the storage medium provided in the embodiments of the present application, an improvement in the accuracy of photography guidance can be ensured with regard to various gray differences.

Description

Shooting guidance method, device, mobile terminal and storage medium

This application requires the priority of the Chinese patent application filed on November 05, 2018 in the Chinese Patent Office with the application number 201811307419.6 and the invention titled "Photographic guidance method, device and mobile terminal and storage medium" In this application.

Technical field

The present application relates to the technical field of image processing, and in particular to a shooting guidance method, device, mobile terminal, and storage medium.

Background technique

With the development of mobile terminals such as smart phones and tablet computers, the camera function of mobile terminals is widely used. In the process of taking pictures or taking videos with a mobile terminal, in order to make the quality of the captured images better, the photographer needs to understand certain shooting knowledge, but many users do not have professional shooting knowledge and the captured images are not satisfactory.

Composition is an important factor in the quality of the captured images. In order to improve the quality of the images captured by users who do not have professional shooting knowledge, you can extract the salient target area in the image, that is, the salient target area in the viewfinder screen. Then, the composition is guided based on the salient target area, that is, the shooting guidance. Among them, the saliency target area is an area containing a saliency target. The saliency target is a target that human visually pays more attention to, such as: cars driving on roads, behaviors on snow, flowers in lush green leaves, and so on. In the related art, the saliency target area of the image is determined based on the gray value of each pixel in the image, that is, the saliency target area of the image is determined based on the gray scale feature of the image. The inventor found that if the gray value of each pixel of the image is not significantly different, the accuracy of the determined saliency target area is low, which in turn makes the accuracy of shooting guidance based on the saliency target area low.

Summary of the invention

To overcome the problems in the related art, the present application provides a shooting guidance method, device, mobile terminal, and storage medium.

According to a first aspect of the embodiments of the present application, a shooting guidance method is provided, including:

Obtain an indication image containing the target in the shooting scene;

Based on the pre-trained convolutional neural network, determine the saliency target area in the indicator image; wherein, the convolutional neural network is trained based on multiple sample images and the labeled images corresponding to each sample image, each The marked image corresponding to the sample image is marked with the saliency target area in the sample image;

Based on the determined saliency target area, guide information for guiding the user to compose a picture is output.

According to a second aspect of the embodiments of the present application, a shooting guidance device is provided, including:

The first acquisition module is configured to acquire the indication image containing the target in the shooting scene;

The first determining module is configured to determine the saliency target area in the indication image based on the pre-trained convolutional neural network; wherein, the convolutional neural network is based on multiple sample images and each sample image corresponds to From the training of labeled images, the marked image corresponding to each sample image is marked with the saliency target area in the sample image;

The instruction module is configured to output guide information for guiding the user to compose a composition based on the determined saliency target area.

According to a third aspect of the embodiments of the present application, a mobile terminal is provided, including:

processor;

Memory for storing processor executable instructions;

Wherein, the processor is configured to:

Obtain an indication image containing the target in the shooting scene;

Based on the pre-trained convolutional neural network, determine the saliency target area in the indicator image; wherein, the convolutional neural network is trained based on multiple sample images and the labeled image corresponding to each sample image, each The marked image corresponding to the sample image is marked with the saliency target area in the sample image;

According to a fourth aspect of the embodiments of the present application, there is provided a non-transitory computer-readable storage medium. When instructions in the storage medium are executed by a processor of a mobile terminal, the mobile terminal can perform a shooting guidance method.

According to a fifth aspect of the embodiments of the present application, there is provided a computer program product which, when run on a computer, enables the computer to execute a shooting guidance method.

The technical solutions provided by the embodiments of the present application may include the following beneficial effects: through a convolutional neural network, a saliency target area including an indication image of a target in a shooting scene is determined, and further, based on the determined saliency target area, output is used for Guidance information that guides the user to composition. Since this scheme does not need to consider the grayscale difference of each pixel, therefore, for various grayscale differences, this scheme can ensure the accuracy of determining the salient target area, and then ensure the accuracy of the shooting guidance. Since this scheme can be applied to various grayscale situations, this scheme is more robust to different situations. And there is no need to extract the salient target area according to the artificially defined determined features, which makes the application scene more extensive.

It should be understood that the above general description and the following detailed description are only exemplary and explanatory, and cannot limit the present application.

BRIEF DESCRIPTION

In order to more clearly explain the embodiments of the present application and the technical solutions of the prior art, the following briefly introduces the drawings required in the embodiments and the prior art. Obviously, the drawings in the following description are only For some embodiments of the application, for those of ordinary skill in the art, without paying any creative labor, other drawings may be obtained based on these drawings.

Fig. 1 is a flow chart showing a method for shooting guidance according to an exemplary embodiment.

Fig. 2 (a) is a schematic diagram showing a detection result of a salient target area according to an exemplary embodiment.

Fig. 2 (b) is another schematic diagram showing the detection result of the salient target area according to an exemplary embodiment.

Fig. 2 (c) is another schematic diagram showing the detection result of the salient target area according to an exemplary embodiment.

Fig. 3 (a) is a schematic diagram of a shooting interface according to an exemplary embodiment.

Fig. 3 (b) is a schematic diagram of guiding the main body region to the target shooting region according to an exemplary embodiment.

Fig. 3 (c) is another schematic diagram of guiding the body region to the target shooting region according to an exemplary embodiment.

Fig. 3 (d) is another schematic diagram of guiding the body area to the target shooting area according to an exemplary embodiment.

Fig. 3 (e) is another schematic diagram of guiding the body area to the target shooting area according to an exemplary embodiment.

Fig. 4 is a flowchart illustrating training a convolutional neural network according to an exemplary embodiment.

Fig. 5 is a schematic structural diagram of a preset convolutional neural network according to an exemplary embodiment.

Fig. 6 is a block diagram of a shooting guide device according to an exemplary embodiment.

Fig. 7 is a block diagram of a mobile terminal according to an exemplary embodiment.

Fig. 8 is a block diagram of a device according to an exemplary embodiment.

detailed description

In order to make the purpose, technical solutions and advantages of the present application more clear, the following describes the present application in further detail with reference to the accompanying drawings and embodiments. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all the embodiments. Based on the embodiments in the present application, all other embodiments obtained by a person of ordinary skill in the art without creative work fall within the protection scope of the present application. Fig. 1 is a flowchart of a shooting guidance method according to an exemplary embodiment. As shown in Fig. 1, the shooting guidance method may be used in a mobile terminal, and may include the following steps.

In step S11, an instruction image containing the target in the shooting scene is acquired.

When the user is to take a picture through the mobile terminal, the mobile terminal obtains an indication image containing the target in the shooting scene. Specifically, the mobile terminal can acquire the indication image in real time. It should be emphasized that since this solution is to instruct the user to compose the composition during shooting, then the indicated image is the framing picture generated by the mobile terminal during the shooting process, that is, the user ’s display screen after the user turns on the camera function. The picture presented.

Among them, the target in the shooting scene may be a target that users, animals, vehicles, buildings, etc. usually pay more attention to. The target in the shooting scene is the saliency target, and the image area where the target in the indication image is located is the saliency target area.

In addition, the mobile terminal may be a smart phone, a tablet computer, a camera, a video camera, and other devices that have a photo function.

In step S12, based on the pre-trained convolutional neural network, a saliency target area in the indication image is determined.

Among them, the convolutional neural network is a network for identifying the salient target area in the image. The convolutional neural network is trained based on multiple sample images and the labeled image corresponding to each sample image, and the marked image corresponding to each sample image is marked with the saliency target area of the sample image. In this solution, a convolutional neural network is obtained by training in advance based on multiple sample images and the labeled images corresponding to each sample image. In this way, for the indication image, the saliency target area of the indication image can be determined based on the trained convolutional neural network. The determined saliency target area may be one or more. Specifically, the instruction image is input to the trained convolutional neural network to obtain the saliency target area in the instruction image. Wherein, the saliency target area in the indication image can be embodied in the indication image through the area frame.

Because the shooting scene may include different categories of targets. In an optional manner of the embodiment of the present application, not only the marked target area is marked in the marked image corresponding to each sample image, but also the target category of the target contained in the marked target area may be marked. In this way, by training a plurality of sample images and labeled images corresponding to the sample images, the trained convolutional neural network can detect the saliency target area in the indication image, and can mark the target category corresponding to the saliency target area.

As shown in Figure 2 (a), the three white line boxes show the three saliency target regions based on the convolutional neural network, and the target categories corresponding to the three saliency target regions, such as person /0.7, sheep / 0.951, sheep / 0.9; as shown in four white lines in Figure 2 (b), four saliency target regions based on convolutional neural network are shown, and four saliency can be marked The target categories corresponding to the sexual target areas, such as person / 0.752, person / 0.561, horse / 0.959, horse / 0.916; as shown by the two white lines in Figure 2 (c), they are obtained based on the convolutional neural network. Two saliency target areas, and the target categories corresponding to the two saliency target areas can be marked, such as sheep / 0.918, sheep / 0.871.

In step S13, based on the determined saliency target area, guidance information for guiding the user to compose a picture is output.

Wherein, the guide information is information for guiding the user to compose a picture, that is, information for guiding the user to move the determined saliency target area to a certain position on the screen. In addition, the guidance information can be displayed in a floating manner on the viewfinder screen. For example, the guide information may include: a first mark marked for the salient target area to be moved, a second mark marked for the position to be placed, and a composition description for moving the first mark and the second mark to coincide .

Among them, the number of salient target regions may be one or more.

In one implementation, based on the determined saliency target area, output guide information for guiding the user to compose a picture, which may specifically include:

From the determined significant target area, select the subject area;

Output guide information for guiding the user to move the subject area to the target shooting area; wherein the target shooting area is determined by a preset composition method that is used when shooting the instruction image Composition method.

When there is only one saliency target area determined, the one saliency target area can be directly selected as the subject area.

When there are multiple saliency target areas determined, from the determined saliency target areas, there are multiple ways of selecting the subject area.

In a realizable manner, the subject area can be selected according to the areas of different saliency target areas, for example, the saliency target area with the largest area is selected as the subject area.

In another achievable way, the subject area can be selected according to the target category corresponding to different distinctive target areas. For example, when shooting an image that includes a person, it is generally expected that the person is located in a more important position in the image. At this time, it can be judged The target category corresponding to the saliency target area, the saliency target area whose target category is human is determined as the subject area, and so on.

In an optional implementation manner of the embodiment of the present application, there may be multiple body regions. For example, it is determined that the target category is a human significant target area is the subject area 1, the target category is an animal significant target area is the subject area 2, and so on.

In addition, the preset composition method may include a nine-square lattice composition, a symmetric composition, a guide line composition, a three-point composition, and the like. Different preset composition methods, the target shooting area can be different. For example: for the nine-square grid composition, the target shooting area may be the middle square area of the nine-square grid; for symmetric composition, the target shooting area may be the left-side composition of the symmetric composition, and so on.

In addition, the movement of the main body area to the target shooting area may be: the main body area all falls within the target shooting area; or, the main body area and the target shooting area completely overlap; or, the main body area and the target shooting area overlap a predetermined ratio, such as 80%, 90% and many more.

In addition, after outputting the guidance information, the user can move the mobile terminal until the determined saliency target area or subject area is moved to the target shooting area. As shown in Figure 3. Auxiliary shooting is performed under the shooting interface shown in FIG. 3 (a). The shooting interface shown in FIG. 3 (a) includes smart composition, flash, low light, and variable speed options; and also includes albums and live streaming alongside the shooting options And K song options; as shown in Figure 3 (b), Figure 3 (c), Figure 3 (d), and Figure 3 (e), in order to move the subject area to the target shooting area, an open circle 301 annotating the subject area The change in the positional relationship with the solid area 302 marked with the target shooting area. It should be emphasized that, in actual application, the open circle 301 moves down as the marked main body area moves down, that is, the open center circle 301 always marks the main body area, so that the virtual circle 301 falls into the solid circle 302 At this time, the subject area marked by the open circle also falls into the target shooting area.

In addition, it should be emphasized that, based on the determined saliency target area, there are many specific implementations of outputting guide information for guiding the user to compose a composition, and are not limited to the above implementations. For example, in another implementation manner, guidance information for guiding the user to move the significant target area of interest to the target shooting area, where the target shooting area is determined by the preset composition method The preset composition method is the composition method used when shooting the instruction image. The technical solutions provided by the embodiments of the present application may include the following beneficial effects: through a convolutional neural network, a saliency target area including an indication image of a target in a shooting scene is determined, and further, based on the determined saliency target area, output is used for Guidance information that guides the user to composition. Since this scheme does not need to consider the grayscale difference of each pixel, therefore, for various grayscale differences, this scheme can ensure the accuracy of determining the salient target area, and then ensure the accuracy of the shooting guidance. Since the scheme can be applied to various grayscale situations, the scheme is more robust to different situations. And there is no need to extract the salient target area according to the artificially defined determined features, which makes the application scene more extensive.

In an optional embodiment of the present application, after outputting guide information for guiding the user to move the subject area to the target shooting area, the method provided in the embodiment of the present application may further include: instructing the subject area to move to After the target shooting area, it can also include:

When it is detected whether the subject area moves to the target shooting area, the user is notified that the composition is successful. Specifically, as shown in FIG. 3 (e), text information can be displayed in the shooting interface, such as: "the composition is successful, and the picture can be taken." Or you can issue a voice message prompt, and so on.

In this way, the user can intuitively know that the composition is successful, and take a picture when the composition is successful, so that the target in the shooting scene can be taken when it is in the best shooting position, so that the user is not required to have professional photography knowledge to guide the user to compose the composition, thereby improving The quality of the captured image.

The saliency target area in the framing screen guides the user to compose a composition, for example, the user moves the mobile phone, so that the subject area selected from the saliency target area is located in the target shooting area, instructing the user to take a higher quality image.

Based on the above embodiments, the embodiments of the present application may further include: a step of training a convolutional neural network. Specifically, as shown in FIG. 4, it may include:

S41. Acquire multiple sample images.

In order to improve the accuracy of training, the electronic device acquires multiple sample images, such as 10,000, 20,000, etc.

In a possible implementation manner, the plurality of sample images include sample images of different categories, and the sample images of different categories include targets of different categories.

Specifically, a data set may be constructed in advance, and then sample images are obtained from the data set. Specifically, images corresponding to natural scenes can be obtained, such as selecting 7 target categories in a realizable way to construct a data set, such as person (person), cat (cat), dog, (dog), horse (horse), sheep (sheep), cow (cow), bird (bird). And for subsequent verification of the trained convolutional neural network, part of the sample image can be used as the training set and part of the test set. For example, the total training data in the data set is 1328415, and 1904 data is used as the test set to verify the convolutional neural network. effect.

Among them, there is a certain difference in the amount of data between different categories. If the images are randomly selected from the data set for training according to the traditional method, sample imbalance will occur, resulting in inaccurate model training. Therefore, it is necessary to design appropriate sampling weights for the imbalance between different types of data. In view of this, the embodiments of the present application propose the following ways to determine the sampling weights of images of different categories, the main steps are as follows:

(1). Count the total number of images in each category k_i, and the total number of all images K;

(2) Determine the sampling weight of each category as K / k_i.

The larger the number of images, the smaller the weight, and the smaller the number of images, the greater the weight. According to the determined sampling weights corresponding to different categories of images, obtain sample images from the data set to train based on the sample images to obtain image saliency targets The regional convolutional neural network can ensure that the images of each category in a batch (batch processing) are balanced during network model training, and prevent deviations in model training.

S42: Determine the label images corresponding to each sample image respectively;

S43, taking each sample image as input content and using the labeled image corresponding to each sample image as supervising content, training the preset convolutional neural network to obtain a trained convolutional neural network.

Specifically, the saliency target area corresponding to each sample image can be marked by manual marking; or the saliency target area corresponding to each sample image can be marked through detection of target characteristics, and so on.

Specifically, the preset convolutional nerve may include a parameter to be tested, input the sample image to the preset convolutional neural network, and adjust the parameter to be tested, so that the output of the preset convolutional neural network is infinitely close to the labeled image corresponding to the sample image The marked area, when the difference between the output of the preset convolutional neural network and the image marked by the marked image corresponding to the sample image converges, the parameter to be measured is determined, and the obtained preset convolution including the determined parameter to be measured is obtained The neural network is the convolutional neural network trained. The parameters to be tested may include: batch size, learning rate, and / or number of iterations, and so on.

In an optional embodiment of the present application, the preset convolutional neural network may include: a basic module, a feature module, a positioning module, and a classification module.

As shown in Figure 5. The basic module can be composed of 5 convolutional layers, such as Conv2d (3-> 8), Conv2d (8-> 16), Conv2d (16-> 32), Conv2d (32-> 64), Conv2d (64-> 128) ), Where Conv2d (3-> 8) is understood to convert a 3-channel RGB format image into an 8-channel image, and the meaning of other convolutional layers is similar; the feature module can be composed of 3 convolutional layers, such as Conv2d (128- > 256), Conv2d (256-> 128), Conv2d (128-> 256); the positioning module can be composed of a convolutional layer, such as Conv2d (*> 8); the classification module can be composed of a convolutional layer, Such as Conv2d (*> 4).

Among them, the basic module can also be called Base Model, the feature module can also be called Extra Model, the positioning module can also be called Location Layer, and the classification module can also be called Confidence Layer.

Base is mainly used to process the features from the bottom to the top of the image, and is used to provide features for Extra Model.

Extra Model mainly extracts feature maps of different scales, mainly designed for targets of different sizes in the image. Each pixel position of the feature map is mapped to the corresponding size of the bounding box on the original image, which can be Location and Layer Provide characteristic information of different size targets.

Location layer is used to return the bounding box coordinates of the target on the original image for each pixel of the feature map.

Confidence layer is used to calculate the target category of the bounding box on the original image for each pixel of the feature map.

In order to improve the accuracy of convolutional neural network training, in an optional implementation manner of the embodiments of the present application, the trained network may be optimized and evaluated.

After the preset convolutional neural network is constructed, it needs to be trained on the training data set. The preset convolutional neural network training depends on the loss function and the optimizer. Finally, the evaluation of the preset convolutional neural network needs to set an appropriate evaluation index.

Among them, the loss function uses the multi-task loss function in the optimization process of the convolutional neural network, uses the cross-entropy loss function for the classification task, and uses the L1 distance loss function for the bounding box.

Optimizer, network optimization uses stochastic gradient descent method to update the parameters of convolutional neural network.

For evaluation indicators, mean, accuracy, and average precision can be used in the embodiments of the present application to evaluate the detection result.

Optionally, in an optional embodiment of the present application, the acquiring multiple sample images includes:

Acquire multiple original images;

Adding interference to the plurality of original images respectively to obtain the original image after interference;

The original image after interference is determined as the sample image.

Among them, the interference may include horizontal flip, vertical flip, random cropping, image pixel disturbance, lighting processing, occlusion, low contrast processing, and so on.

In the process of training the network, add different types of interference to achieve data enhancement, so that the trained convolutional neural network is less affected by external interference factors, especially for different scenarios, such as occlusion, deformation, lighting and other external interference Robust, that is, it has better robustness for different scenes, and further improves the accuracy of the determined saliency target area.

Fig. 6 is a block diagram of a shooting guidance device according to an exemplary embodiment. Referring to FIG. 6, the apparatus may include a first acquisition module 601, a first determination module 602, and an instruction module 603;

The first obtaining module 601 is configured to obtain an indication image containing the target in the shooting scene;

The first determination module 602 is configured to determine a saliency target area in the indicated image based on the pre-trained convolutional neural network; wherein, the convolutional neural network is based on multiple sample images and each sample image corresponds to From the training of labeled images, the marked image corresponding to each sample image is marked with the saliency target area in the sample image;

The instruction module 603 is configured to output guide information for guiding the user to compose a composition based on the determined saliency target area.

The technical solutions provided by the embodiments of the present application may include the following beneficial effects: through a convolutional neural network, a saliency target area including an indication image of a target in a shooting scene is determined, and further, based on the determined saliency target area, output Guidance information that guides the user to composition. Since this scheme does not need to consider the grayscale difference of each pixel, therefore, for various grayscale differences, this scheme can ensure the accuracy of determining the salient target area, and then ensure the accuracy of the shooting guidance. Since the scheme can be applied to various grayscale situations, the scheme is more robust to different situations. And there is no need to extract the salient target area according to the artificially defined determined features, which makes the application scene more extensive.

Optionally, the instruction module 603 is specifically configured to: select a subject area from the determined salient target areas; output guide information for guiding the user to move the subject area to the target shooting area; wherein, The target shooting area is determined based on a preset composition method, and the preset composition method is a composition method used when shooting the instruction image.

Optionally, the device also includes:

The prompting module is configured to, after the instruction module outputs guidance information for guiding the user to move the subject area to the target shooting area, when it is detected whether the subject area moves to the target shooting area, to the user Prompt composition success.

Optionally, the device also includes:

The second acquisition module is configured to acquire multiple sample images;

The second determining module is configured to determine the mark image corresponding to each sample image respectively;

The training module is configured to take each sample image as input content and the labeled image corresponding to each sample image as supervising content to train the preset convolutional neural network to obtain a trained convolutional neural network.

Optionally, the second acquisition module is specifically configured to: acquire multiple original images; add interference to the multiple original images respectively to obtain multiple original images after interference; combine the multiple original images after interference Both are determined as sample images.

Optionally, the plurality of sample images include sample images of different categories, and the sample images of different categories include targets of different categories.

Optionally, the preset convolutional neural network includes: a basic module, a feature module, a positioning module, and a classification module.

Regarding the device in the above embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 7 is a block diagram of a mobile terminal 700 for shooting guidance according to an exemplary embodiment. For example, the mobile terminal 700 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.

7, the mobile terminal 700 may include one or more of the following components: a processing component 702, a memory 704, a power component 706, a multimedia component 708, an audio component 710, an input / output (I / O) interface 712, and a sensor component 714 ，和通信组 716.

The processing component 702 generally controls the overall operations of the mobile terminal 700, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 702 may include one or more processors 720 to execute instructions to complete all or part of the steps of the above-mentioned shooting guidance method. In addition, the processing component 702 may include one or more modules to facilitate interaction between the processing component 702 and other components. For example, the processing component 702 may include a multimedia module to facilitate interaction between the multimedia component 708 and the processing component 702.

The memory 704 is configured to store various types of data to support operations at the mobile terminal 700. Examples of these data include instructions for any application or method operating on the mobile terminal 700, contact data, phone book data, messages, pictures, videos, and so on. The memory 704 may be implemented by any type of volatile or nonvolatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable and removable Programmable read only memory (EPROM), programmable read only memory (PROM), read only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.

The power supply component 706 provides power to various components of the mobile terminal 700. The power supply component 706 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the mobile terminal 700.

The multimedia component 708 includes a screen between the mobile terminal 700 and the user that provides an output interface. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the boundary of the touch or sliding action, but also detect the duration and pressure related to the touch or sliding operation. In some embodiments, the multimedia component 708 includes a front camera and / or a rear camera. When the mobile terminal 700 is in an operation mode, such as a shooting mode or a video mode, the front camera and / or the rear camera may receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 710 is configured to output and / or input audio signals. For example, the audio component 710 includes a microphone (MIC), and when the mobile terminal 700 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal. The received audio signal may be further stored in the memory 704 or sent via the communication component 716. In some embodiments, the audio component 710 further includes a speaker for outputting audio signals.

The I / O interface 712 provides an interface between the processing component 702 and a peripheral interface module. The peripheral interface module may be a keyboard, a click wheel, or a button. These buttons may include, but are not limited to: home button, volume button, start button, and lock button.

The sensor component 714 includes one or more sensors for providing the mobile terminal 700 with status evaluation in various aspects. For example, the sensor component 714 can detect the on / off state of the mobile terminal 700, and the relative positioning of the components, such as the display and the keypad of the mobile terminal 700, the sensor component 714 can also detect the mobile terminal 700 or the mobile terminal 700 The position of the component changes, the presence or absence of user contact with the mobile terminal 700, the orientation or acceleration / deceleration of the mobile terminal 700, and the temperature change of the mobile terminal 700. The sensor assembly 714 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor component 714 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 714 may further include an acceleration sensor, a gyro sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 716 is configured to facilitate wired or wireless communication between the mobile terminal 700 and other devices. The mobile terminal 700 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 716 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 716 further includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the mobile terminal 700 may be used by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field Programming gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are used to implement the above method.

In an exemplary embodiment, a non-transitory computer-readable storage medium including instructions is also provided, for example, a memory 704 including instructions, which can be executed by the processor 720 of the mobile terminal 700 to complete the above method. For example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, or the like.

According to still another aspect of the embodiments of the present application, there is provided a computer program product which, when run on a computer, enables the computer to execute the above-mentioned shooting guidance method.

Fig. 8 is a block diagram of a device 800 for shooting guidance according to an exemplary embodiment. For example, the device 800 may be provided as a server. 8, the device 800 includes a processing component 822, which further includes one or more processors, and memory resources represented by the memory 832, for storing instructions executable by the processing component 822, such as application programs. The application programs stored in the memory 832 may include one or more modules each corresponding to a set of instructions. In addition, the processing component 822 is configured to execute instructions to perform the method steps of the above-mentioned shooting guidance method.

The device 800 may also include a power component 826 configured to perform power management of the device 800, a wired or wireless network interface 850 configured to connect the device 800 to the network, and an input output (I / O) interface 858. The device 800 can operate based on an operating system stored in the memory 832, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.

The above are only the preferred embodiments of this application and are not intended to limit this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application should be included in this application Within the scope of protection.

Claims

A shooting guide method, including:

Obtain an indication image containing the target in the shooting scene;

Based on the pre-trained convolutional neural network, determine the saliency target area in the indicator image; wherein, the convolutional neural network is trained based on multiple sample images and the labeled image corresponding to each sample image, each The marked image corresponding to the sample image is marked with the saliency target area in the sample image;

Based on the determined saliency target area, guide information for guiding the user to compose a picture is output.
The method according to claim 1, the outputting guide information for guiding the user to compose a composition based on the determined saliency target area includes:

From the determined significant target area, select the subject area;

Output guide information for guiding the user to move the subject area to the target shooting area; wherein, the target shooting area is determined based on a preset composition method, which is used when the instruction image is captured Use the composition method.
The method according to claim 2, after outputting guide information for guiding the user to move the subject area to the target shooting area, the method further comprises:

When it is detected whether the subject area moves to the target shooting area, the user is notified that the composition is successful.
The method according to any one of claims 1-3, the step of training the convolutional neural network includes:

Acquire multiple sample images;

Determine the corresponding labeled images for each sample image;

Taking each sample image as input content and using the labeled image corresponding to each sample image as supervising content, the preset convolutional neural network is trained to obtain a trained convolutional neural network.
The method according to claim 4, said acquiring a plurality of sample images, comprising:

Acquire multiple original images;

Adding interference to the multiple original images respectively to obtain multiple original images after interference;

The original images after multiple interferences are determined as sample images.
According to the method of claim 4, the plurality of sample images include different types of sample images, and the different types of sample images include different types of targets.
According to the method of claim 4, the preset convolutional neural network includes: a basic module, a feature module, a positioning module, and a classification module.
A shooting guide device, including:

The first acquisition module is configured to acquire the indication image containing the target in the shooting scene;

The first determining module is configured to determine the saliency target area in the indication image based on the pre-trained convolutional neural network; wherein, the convolutional neural network is based on multiple sample images and each sample image corresponds to From the training of labeled images, the marked image corresponding to each sample image is marked with the saliency target area in the sample image;

The instruction module is configured to output guide information for guiding the user to compose a composition based on the determined saliency target area.
The apparatus according to claim 8, the indication module is specifically configured to:

From the determined significant target area, select the subject area;

Output guide information for guiding the user to move the subject area to the target shooting area; wherein, the target shooting area is determined based on a preset composition method, which is used when the instruction image is captured Use the composition method.
The device according to claim 9, further comprising:

The prompting module is configured to, after the instruction module outputs guidance information for guiding the user to move the subject area to the target shooting area, when it is detected whether the subject area moves to the target shooting area, to the user Prompt composition success.
The apparatus according to any one of claims 8-10, the apparatus further comprising: a second acquisition module configured to acquire a plurality of sample images;

The second determining module is configured to determine the mark image corresponding to each sample image respectively;

The training module is configured to take each sample image as input content and the labeled image corresponding to each sample image as supervising content to train the preset convolutional neural network to obtain a trained convolutional neural network.
According to the apparatus of claim 11, the second acquisition module is specifically configured to:

Acquire multiple original images;

Adding interference to the multiple original images respectively to obtain multiple original images after interference;

The original images after multiple interferences are determined as sample images.
According to the apparatus of claim 11, the plurality of sample images include different types of sample images, and the different types of sample images include different types of targets.
The apparatus according to claim 11, the preset convolutional neural network includes: a basic module, a feature module, a positioning module, and a classification module.
A mobile terminal, including:

processor;

Memory for storing processor executable instructions;

Wherein, the processor is configured to:

Obtain an indication image containing the target in the shooting scene;

Based on the pre-trained convolutional neural network, determine the saliency target area in the indicator image; wherein, the convolutional neural network is trained based on multiple sample images and the labeled image corresponding to each sample image, each The marked image corresponding to the sample image is marked with the saliency target area in the sample image;

Based on the determined saliency target area, guide information for guiding the user to compose a picture is output.
The mobile terminal of claim 15, the processor outputting guide information for guiding the user to compose a composition based on the determined saliency target area, including:

From the determined significant target area, select the subject area;

Output guide information for guiding the user to move the subject area to the target shooting area; wherein, the target shooting area is determined based on a preset composition method, which is used when the instruction image is captured Use the composition method.
The mobile terminal of claim 16, the processor is further configured to: after outputting guide information for guiding the user to move the subject area to the target shooting area, when it is detected whether the subject area moves to When the target shooting area is displayed, the user is informed that the composition is successful.
The mobile terminal according to any one of claims 15-17, the processor is further configured to:

Acquire multiple sample images;

Determine the corresponding labeled images for each sample image;

Taking each sample image as input content and using the labeled image corresponding to each sample image as supervising content, the preset convolutional neural network is trained to obtain a trained convolutional neural network.
The mobile terminal of claim 18, the processor acquiring a plurality of sample images, including:

Acquire multiple original images;

Adding interference to the multiple original images respectively to obtain multiple original images after interference;

The original images after multiple interferences are determined as sample images.
According to the mobile terminal of claim 18, the plurality of sample images include different types of sample images, and the different types of sample images include different types of targets.
The mobile terminal according to claim 18, the preset convolutional neural network includes: a basic module, a feature module, a positioning module, and a classification module.
A non-transitory computer-readable storage medium, when instructions in the storage medium are executed by a processor of a mobile terminal, enable the mobile terminal to perform the shooting guidance method of any one of claims 1-7.