CN109919116B

CN109919116B - Scene recognition method and device, electronic equipment and storage medium

Info

Publication number: CN109919116B
Application number: CN201910194130.6A
Authority: CN
Inventors: 王宇鹭
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-03-14
Filing date: 2019-03-14
Publication date: 2022-05-17
Anticipated expiration: 2039-03-14
Also published as: CN109919116A

Abstract

The application provides a scene recognition method and device, electronic equipment and a storage medium, and belongs to the technical field of imaging. Wherein, the method comprises the following steps: dividing a current preview picture into N areas according to a preset rule, wherein N is a positive integer greater than 1; respectively carrying out scene recognition on the N areas by using a preset scene recognition model so as to determine a scene label corresponding to each area; determining a scene label corresponding to the current preview picture according to the scene label corresponding to each area; and determining a target shooting mode according to the scene label corresponding to the current preview picture. Therefore, by the scene recognition method, the mutual interference of various picture contents in the preview picture is reduced, the scene recognition accuracy is improved, and the user experience is improved.

Description

Scene recognition method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of imaging technologies, and in particular, to a scene recognition method and apparatus, an electronic device, and a storage medium.

Background

With the development of technologies, mobile terminals are becoming more and more popular. The camera is built in most mobile terminals, and along with the enhancement of the processing capability of the mobile terminals and the development of the camera technology, the performance of the built-in camera is more and more powerful, and the quality of the shot image is more and more high. The mobile terminal is simple to operate and convenient to carry, and people use the mobile terminal to take pictures in daily life becomes a normal state.

In the related art, the mobile terminal may determine a current shooting scene according to the entire content of the preview screen, and then select a corresponding shooting mode. However, in the scene recognition method, when a shot scene is complicated, for example, when a preview picture contains multiple contents such as a portrait, a building, a night scene, and the like, the multiple contents in the picture interfere with each other, so that the error rate of scene recognition is high, and the user experience is affected.

Disclosure of Invention

The scene recognition method, the scene recognition device, the electronic equipment and the storage medium are used for solving the problems that in the related art, when a shot scene is complex, multiple contents in a picture can interfere with each other, so that the error rate of scene recognition is high, and user experience is influenced.

An embodiment of an aspect of the present application provides a scene recognition method, including: dividing a current preview picture into N areas according to a preset rule, wherein N is a positive integer greater than 1; respectively carrying out scene recognition on the N areas by using a preset scene recognition model so as to determine a scene label corresponding to each area; determining a scene label corresponding to the current preview picture according to the scene label corresponding to each area; and determining a target shooting mode according to the scene label corresponding to the current preview picture.

Another aspect of the present application provides a scene recognition apparatus, including: the device comprises a dividing module, a display module and a display module, wherein the dividing module is used for dividing a current preview picture into N areas according to a preset rule, and N is a positive integer greater than 1; the recognition module is used for respectively carrying out scene recognition on the N areas by using a preset scene recognition model so as to determine a scene label corresponding to each area; a first determining module, configured to determine, according to the scene tag corresponding to each region, a scene tag corresponding to the current preview picture; and the second determining module is used for determining a target shooting mode according to the scene label corresponding to the current preview picture.

An embodiment of another aspect of the present application provides an electronic device, which includes: the camera module, the memory, the processor and the computer program stored on the memory and capable of running on the processor are characterized in that the processor implements the scene recognition method when executing the program.

In another aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the scene recognition method as described above.

In another aspect of the present application, a computer program is provided, which is executed by a processor to implement the scene recognition method according to the embodiment of the present application.

The scene recognition method, the device, the electronic device, the computer-readable storage medium, and the computer program provided in the embodiments of the present application may divide a current preview picture into a plurality of regions according to a preset rule, perform scene recognition on the plurality of regions respectively by using a preset scene recognition model to determine a scene tag corresponding to each region, then determine a scene tag corresponding to the current preview picture according to the scene tag corresponding to each region, and further determine a target shooting mode according to the scene tag corresponding to the current preview picture. Therefore, the current preview picture is divided into a plurality of areas, and each area is subjected to scene recognition, and then the scene label corresponding to the current preview picture can be determined according to the scene label corresponding to each area, so that the mutual interference of various picture contents in the preview picture is reduced, the scene recognition accuracy is improved, and the user experience is improved.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic flowchart of a scene recognition method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of another scene recognition method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of another scene recognition method according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a scene recognition apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of another electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments described below with reference to the accompanying drawings are illustrative and intended to explain the present application and should not be construed as limiting the present application.

The embodiment of the application provides a scene recognition method aiming at the problems that in the related art, when a shot scene is complex, various contents in a picture can interfere with each other, so that the error rate of scene recognition is high, and the user experience is influenced.

The scene recognition method provided by the embodiment of the application can divide the current preview picture into a plurality of areas according to a preset rule, respectively perform scene recognition on the plurality of areas by using a preset scene recognition model to determine the scene label corresponding to each area, then determine the scene label corresponding to the current preview picture according to the scene label corresponding to each area, and further determine the target shooting mode according to the scene label corresponding to the current preview picture. Therefore, the current preview picture is divided into a plurality of areas, and each area is subjected to scene recognition, and then the scene label corresponding to the current preview picture can be determined according to the scene label corresponding to each area, so that the mutual interference of various picture contents in the preview picture is reduced, the scene recognition accuracy is improved, and the user experience is improved.

The scene recognition method, apparatus, electronic device, storage medium, and computer program provided by the present application are described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic flowchart of a scene identification method according to an embodiment of the present application.

As shown in fig. 1, the scene recognition method includes the following steps:

step 101, dividing a current preview picture into N regions according to a preset rule, where N is a positive integer greater than 1.

It should be noted that when the current shooting scene is complicated, the current preview image may include various image contents such as a portrait, a building, a night scene, etc., and different types of image contents in the preview image interfere with each other when the preview image is subjected to scene recognition, so that the accuracy of scene recognition is low. For example, when a portrait appears in a preview picture, a current scene is usually recognized as a portrait mode, but if the portrait occupies only a small part of the preview picture and a user wants to shoot a scene in the picture, at this time, determining the current scene as the portrait mode would result in inaccurate scene recognition, and thus, the shooting effect on the scene is not ideal. Therefore, in a possible implementation form of the embodiment of the present application, a current preview picture may be divided into a plurality of regions, and scene recognition may be performed on each of the divided regions, so as to reduce mutual interference among multiple picture contents in the preview picture, and improve accuracy of scene recognition.

The preset rule may include a specific value of N and a rule followed by dividing the preview screen. For example, the preview screen is divided into 9 regions on average, or 16 regions on average, or the like.

It should be noted that the larger the value of N is, the higher the accuracy of scene recognition is, and accordingly, the complexity of scene recognition also increases. In actual use, the rule for dividing the preview picture can be preset according to actual needs so as to balance the accuracy and complexity of scene recognition. In addition, when the preview screen is divided into a plurality of regions, the preview screen may be divided into a plurality of regions having the same area on average, or the preview screen may be divided into a plurality of regions having different areas, which is not limited in the embodiment of the present application.

And 102, respectively carrying out scene recognition on the N areas by using a preset scene recognition model so as to determine a scene label corresponding to each area.

In the embodiment of the present application, after a current preview picture is divided into N areas, a preset scene recognition model may be used to respectively perform scene recognition on each divided area, so as to determine a scene tag corresponding to each area.

It should be noted that the preset scene recognition model is obtained by training according to a large amount of image data, and may be integrated in the electronic device. After the image data is input into the preset scene recognition model, the preset scene recognition model can directly output the scene label corresponding to the image data. Therefore, in a possible implementation form of the application embodiment, the image data corresponding to each divided region may be input into a preset scene recognition model to determine a scene label corresponding to each region.

For example, if the image data corresponding to the area a includes a portrait, when the image data corresponding to the area a is input into a preset scene recognition model, it may be determined that the scene tag corresponding to the area a is a "portrait"; if the brightness information in the image corresponding to the area B is smaller than the threshold, when the image data corresponding to the area B is input into the preset scene identification model, the scene label of the area B may be determined as a "night scene".

And 103, determining a scene label corresponding to the current preview picture according to the scene label corresponding to each area.

In the embodiment of the application, after the scene tag corresponding to each region is determined, the scene tag corresponding to the current preview picture can be determined according to the scene tag corresponding to each region. Specifically, the scene tag corresponding to the current preview picture can be determined according to the number of each scene tag in the preview picture.

Furthermore, statistics may be performed on the scene tags corresponding to each region, so as to determine the scene tag corresponding to the current preview picture according to the number of the scene tags. That is, in a possible implementation form of the embodiment of the present application, the step 103 may include:

counting the scene labels corresponding to each region, and determining the number of each scene label corresponding to the current preview picture;

and determining the scene label with the largest number as the scene label corresponding to the current preview picture.

The number of scene tags is the number of regions corresponding to the scene tags, and the larger the number of scene tags is, the larger the number of regions corresponding to the scene tags is. Accordingly, the number of regions corresponding to the largest number of scene tags is also the largest, that is, the area of the preview screen occupied by the region corresponding to the scene tag is also the largest or larger. Therefore, the scene tags with the largest number can reflect the scenes corresponding to most of the area of the current preview screen, and the scene tags with the largest number can be determined as the scene tags corresponding to the current preview screen.

In a possible implementation form of the embodiment of the application, a first threshold of the proportion of the scene tags may also be preset, and the scene tag corresponding to the current preview picture is determined according to a relationship between a ratio of the number of each scene tag to the number of all scene tags and the preset first threshold. Specifically, the scene label with the ratio of the number of the scene labels to the number of all the scene labels larger than the first threshold may be determined as the scene label of the current preview picture.

For example, if the preset first threshold is 40%, it is determined that the current preview picture includes two scene tags, namely a "portrait" scene tag and a "building" scene tag, where the ratio of the "portrait" scene tag to all the scene tags is 70%, and the ratio of the "building" scene tag to all the scene tags is 30%, it may be determined that the scene tag corresponding to the current preview picture is the "portrait".

In another possible implementation form of the embodiment of the application, a threshold of a proportion of a plurality of scene tags may be preset, and according to a relationship between each scene tag in a current preview picture and each threshold, a first scene tag, a second scene tag, a third scene tag, and the like are determined, so that the first scene tag is determined as a scene tag corresponding to the preview picture, or the second scene tag or the third scene tag may be determined as a scene tag corresponding to the current preview picture according to actual needs.

Specifically, if each threshold of the preset proportion of the scene tags is a first threshold and a second threshold, where the first threshold is greater than the second threshold, when the ratio of the number of the scene tags to the number of all the scene tags is greater than the first threshold, the scene tag may be determined as a first scene tag; when the ratio of the number of scene tags to the number of all scene tags is smaller than a first threshold and larger than a second threshold, determining the scene tag as a second scene tag; when the ratio of the number of scene tags to the number of all scene tags is less than the second threshold, the scene tag may be determined as a third scene tag.

And 104, determining a target shooting mode according to the scene label corresponding to the current preview picture.

In the embodiment of the application, after the scene tag corresponding to the current preview picture is determined, the target shooting mode can be determined according to the scene tag corresponding to the current preview picture, and image acquisition is performed according to the determined target shooting mode.

For example, if the scene tag corresponding to the current preview picture is a "portrait", it may be determined that the target shooting mode is a "portrait mode", and image acquisition is performed according to various shooting parameters in the "portrait mode"; if the scene label corresponding to the current preview picture is the night scene, the target shooting mode can be determined to be the night scene mode, and image acquisition is carried out according to various shooting parameters in the night scene mode.

In a possible implementation form of the embodiment of the application, the current preview picture may be further divided into a plurality of regions with different areas, for example, a focus area of interest in the preview picture may be divided into a plurality of regions with smaller areas, so as to further improve accuracy of scene recognition of the focus area of interest in the preview picture.

Another scene recognition method provided in the embodiment of the present application is further described below with reference to fig. 2.

Fig. 2 is a schematic flowchart of another scene recognition method according to an embodiment of the present application.

As shown in fig. 2, the scene recognition method includes the following steps:

step 201, dividing the current preview picture into N regions with different areas according to a preset rule, wherein N is a positive integer greater than 1.

The preview screen may include a region of interest and a region of non-interest of the user. Generally, when shooting, a user aims a lens at a subject to be shot so that the subject to be shot is located in the middle of a preview screen. Therefore, in a possible implementation form of the embodiment of the present application, a partial region in the middle of the preview screen may be determined as a region of interest in the preview screen according to a preset ratio. For example, if the preset ratio is 60%, a region having a center point of the preview screen image as a center and an area occupying 60% of the total area of the preview screen image may be determined as the region of interest on the preview screen image, and a region excluding the region of interest on the preview screen image may be determined as the region of no interest.

In a possible implementation form of the embodiment of the present application, after the region of interest and the non-region of interest in the preview screen, the region of interest and the non-region of interest may be divided according to different division rules. For example, the attention area may be divided into a plurality of areas with a smaller area, so as to improve the accuracy of scene recognition of the attention area in the preview picture; the non-attention region is divided into a small number of regions with large area, so that the number of the divided regions in the preview picture is reduced, and the complexity of scene recognition is reduced.

For example, the attention area in the preview screen may be divided into a plurality of areas with the same area, such as 9, 16, etc., on average; the non-attention region in the preview screen is divided into one region as a whole, or the non-attention region is divided into two regions having the same area.

It should be noted that the above examples are only illustrative and should not be construed as limiting the present application. In actual use, the dividing rule of the region of interest in the preview picture and the dividing rule of the non-region of interest may be preset according to actual needs, which is not limited in the embodiment of the present application.

Step 202, respectively performing scene recognition on the N regions by using a preset scene recognition model to determine a scene tag corresponding to each region.

Step 203, counting the scene tags corresponding to each area, and determining the number of each scene tag corresponding to the current preview picture.

The detailed implementation process and principle of the step 203-203 can refer to the detailed description of the above embodiments, and will not be described herein again.

And 204, determining the scene label corresponding to the current preview picture according to the number of the scene labels corresponding to the current preview picture and the total area of the area corresponding to the scene labels.

It should be noted that, in the embodiment of the present application, the scene tag corresponding to the current preview picture may be determined jointly according to the number of the scene tags corresponding to the current preview picture and the total area of the region corresponding to the scene tags. In a possible implementation form of the embodiment of the application, the scene tag corresponding to the current preview picture may be determined according to the number of each scene tag in the preview picture, that is, the scene tag with the largest number may be determined as the scene tag corresponding to the current preview picture. Since the areas of the regions included in the preview picture are different, if there are a plurality of scene tags with the largest number in the scene tags corresponding to the current preview picture, the scene tag corresponding to the current preview picture can be further determined according to the total area of the regions corresponding to the scene tags with the largest number, that is, the scene tag with the largest total area of the corresponding region and the largest number can be determined as the scene tag corresponding to the current preview picture.

For example, if it is determined that the scene tags with the largest number in the current preview screen are "portrait" and "building", the number of the scene tags is 4, the area corresponding to the "portrait" scene tag accounts for 30% of the total area of the preview screen, and the area corresponding to the "building" scene tag accounts for 40% of the total area of the preview screen, it may be determined that the scene tag corresponding to the current preview screen is "building".

Further, the preset scene recognition model can also output the scene labels corresponding to the regions at the same time, and the scene labels of the regions are the confidence degrees of the determined scene labels. Therefore, if the current preview screen includes a plurality of scene tags with the largest number, the scene tag corresponding to the current preview screen can be further determined according to the confidence of each scene tag with the largest number. That is, in a possible implementation form of the embodiment of the present application, the step 204 may include:

determining the confidence of the scene label corresponding to each region;

determining a first total confidence degree of the current preview picture corresponding to a first scene label and a second total confidence degree of the current preview picture corresponding to a second scene label;

and determining the scene label corresponding to the larger value of the first total confidence coefficient and the second total confidence coefficient as the scene label corresponding to the current preview picture.

The confidence level of the scene label corresponding to each region refers to the confidence level of the scene label corresponding to each region as the determined scene label, and can be directly output by a preset scene recognition model. The first scene tag and the second scene tag are the largest number of scene tags included in the current preview screen.

It should be noted that, after the image data of each region in the current preview image is input into the preset scene recognition model, the preset scene recognition model may output the scene tag corresponding to each region and the confidence of the scene tag at the same time, and therefore, in a possible implementation form of the embodiment of the present application, the confidence of the scene tag corresponding to each region may be determined according to the output result of the preset scene recognition model.

For example, if the image data of the area a is input into a preset scene recognition model and the output of the preset scene recognition model is "80% night scene", the confidence level of the scene label "night scene" corresponding to the area a may be determined to be 80%.

In a possible implementation form of the embodiment of the application, after the confidence level of the scene tag corresponding to each region in the current preview picture is determined, a first total confidence level of the first scene tag and a second total confidence level of the second scene tag are determined according to the confidence level of the scene tag corresponding to each region, and then the scene tag corresponding to a larger value of the first total confidence level and the second total confidence level is determined as the scene tag corresponding to the current preview picture.

Specifically, the first total confidence of the first scene label may be a sum of confidence of the scene labels corresponding to the regions corresponding to the first scene label, or may be an average of the confidence of the scene labels corresponding to the regions corresponding to the first scene label, and the second total confidence of the second scene label is determined in the same manner.

For example, assume that a first total confidence of the first scene and a second total confidence of the second scene are determined by taking an average of the confidence of the scene tags of the respective regions corresponding to the first scene tag and the second scene tag. The scene labels with the largest quantity in the current preview picture are ' portrait ' and ' building ', the areas corresponding to the portrait ' scene labels are an area A and an area B, the areas corresponding to the building ' scene labels are an area C and an area D, the confidence coefficient of the area A corresponding to the portrait ' scene label is 80%, the confidence coefficient of the area B corresponding to the portrait ' scene label is 90%, the confidence coefficient of the area C corresponding to the building ' scene label is 75%, and the confidence coefficient of the area D corresponding to the building ' scene label is 70%, then the total confidence coefficient of the portrait ' scene label is determined to be 85%, and the total confidence coefficient of the building ' scene label is 72.5%, therefore, the portrait ' scene label can be determined to be the scene label corresponding to the current preview picture.

Step 205, determining a target shooting mode according to the scene label corresponding to the current preview picture.

The detailed implementation process and principle of step 205 may refer to the detailed description of the above embodiments, and are not described herein again.

According to the scene recognition method provided by the embodiment of the application, the current preview picture can be divided into a plurality of regions with different areas according to a preset rule, scene recognition is respectively carried out on the plurality of regions to determine the scene label corresponding to each region, then the scene label corresponding to the current preview picture is determined according to the number of the scene labels corresponding to the current preview picture and the total area of the regions corresponding to the scene labels, and further the target shooting mode is determined according to the scene label corresponding to the current preview picture. Therefore, the current preview picture is divided into a plurality of areas with different areas, and the scene label corresponding to the current preview picture is determined according to the number of the scene labels corresponding to all the areas and the total area of the area corresponding to the scene labels, so that the mutual interference of various picture contents in the preview picture is reduced, the accuracy of scene recognition is further improved, and the user experience is further improved.

In a possible implementation form of the present application, the target shooting mode determined according to the scene tag corresponding to the current preview picture may be shooting by capturing multiple frames of images and synthesizing. Therefore, when the target shooting mode is determined, parameters such as the number of images to be acquired, the exposure time corresponding to each frame of image to be acquired, the sensitivity and the like can be determined, and a mode for synthesizing the acquired multi-frame images can be determined.

Another scene recognition method provided in the embodiment of the present application is further described below with reference to fig. 3.

Fig. 3 is a schematic flowchart of another scene recognition method according to an embodiment of the present application.

As shown in fig. 3, the scene recognition method includes the following steps:

step 301, dividing a current preview picture into N regions according to a preset rule, and performing scene recognition on the N regions respectively by using a preset scene recognition model to determine a scene tag corresponding to each region.

Step 302, determining a scene label corresponding to the current preview picture according to the scene label corresponding to each area.

The detailed implementation process and principle of the steps 301-302 can refer to the detailed description of the above embodiments, and are not described herein again.

And 303, determining the number of the current images to be acquired and the target exposure amount corresponding to each frame of image to be acquired according to the scene label corresponding to the current preview picture.

The exposure amount refers to the amount of light passing through the lens during the exposure time.

In the embodiment of the application, after the scene label corresponding to the current preview picture is determined, the number of the current images to be acquired and the preset exposure compensation mode can be determined according to the mapping relation between the scene label corresponding to the preview picture and the number of the images to be acquired and the mapping relation between the scene label corresponding to the preview picture and the preset exposure compensation mode, then the reference exposure amount is determined according to the illuminance of the current shooting scene, and then the target exposure amount corresponding to each frame of the images to be acquired is determined according to the determined reference exposure amount and the preset exposure compensation mode.

It should be noted that, in actual use, the mapping relationship between the scene tag corresponding to the preview image and the number of images to be acquired, and the mapping relationship between the scene tag corresponding to the preview image and the preset exposure compensation mode may be preset according to actual needs, which is not limited in this application.

In a possible implementation form of the embodiment of the application, a photometric module in the camera module may be used to obtain the illuminance of the current shooting scene, and an Automatic Exposure Control (AEC) algorithm is used to determine the reference Exposure corresponding to the current illuminance. In a shooting mode of collecting multiple frames of images, the exposure of each frame of image can be different so as to obtain images with different dynamic ranges, so that the synthesized image has a higher dynamic range, and the overall brightness and quality of the image are improved. Different exposure compensation modes can be adopted when each frame of image is collected, and the target exposure amount corresponding to each frame of image is determined according to the exposure compensation modes and the reference exposure amount determined by the current illuminance.

In the embodiment of the present application, the preset Exposure compensation mode refers to a combination of Exposure Values (EVs) preset for each frame of image. In the initial definition of exposure, exposure does not mean an exact number, but rather "the combination of all camera apertures and exposure periods that give the same exposure". The sensitivity, aperture and exposure time determine the exposure of the camera, and different combinations of parameters can produce equal exposures, i.e., the EV values of these different combinations are the same, e.g., using an 1/125 second exposure time and F/11 aperture combination with the 1/250 second exposure time and F/8.0 shutter combination, the same exposure is obtained, i.e., the EV values are the same, with the same sensitivity. When the EV value is 0, the exposure is obtained when the light sensitivity is 100, the aperture coefficient is F/1 and the exposure time is 1 second; the exposure amount is increased by one step, namely, the exposure time is doubled, or the sensitivity is doubled, or the aperture is increased by one step, and the EV value is increased by 1, namely, the exposure amount corresponding to 1EV is twice as much as the exposure amount corresponding to 0 EV. As shown in table 1, the correspondence relationship between the exposure time, the aperture, and the sensitivity, when they were changed individually, and the EV value was obtained.

TABLE 1

After the digital era of photography, the photometric function inside the camera has been very powerful, EV is often used to represent a step difference on the exposure scale, and many cameras allow setting of exposure compensation and are usually represented by EV. In this case, EV refers to a difference between the exposure amount corresponding to the camera photometric data and the actual exposure amount, for example, exposure compensation of +1EV refers to an increase of one exposure with respect to the exposure amount corresponding to the camera photometric data, that is, the actual exposure amount is twice the exposure amount corresponding to the camera photometric data.

In the embodiment of the present application, when the exposure compensation mode is preset, the EV value corresponding to the determined reference exposure amount may be preset to 0, where +1EV means increasing one-stage exposure, that is, the exposure amount is 2 times of the reference exposure amount, +2EV means increasing two-stage exposure, that is, the exposure amount is 4 times of the reference exposure amount, and-1 EV means decreasing one-stage exposure, that is, the exposure amount is 0.5 times of the reference exposure amount.

For example, if the number of images to be captured is 7 frames, the EV value range corresponding to the preset exposure compensation mode may be [ +1, +1, +1, +1,0, -3, -6 ]. The frame with the exposure compensation mode of +1EV can solve the noise problem, time domain noise reduction is carried out through the frame with higher brightness, and noise is suppressed while the details of a dark part are improved; the exposure compensation mode is a frame of-6 EV, so that the problem of high light overexposure can be solved, and the details of a high light area are reserved; the frames with the exposure compensation modes of 0EV and-3 EV can be used for maintaining the transition between highlight and dark areas and maintaining the good effect of bright-dark transition.

It should be noted that each EV value corresponding to the preset exposure compensation mode may be specifically set according to actual needs, or may be obtained according to a set EV value range and a principle that differences between the EV values are equal, which is not limited in this embodiment of the present application.

In a possible implementation form of the embodiment of the application, after the reference exposure amount is determined by the ACE algorithm according to the illuminance of the current shooting scene, the target exposure amount corresponding to each frame of image can be determined according to the reference exposure amount and the preset exposure compensation mode determined according to the scene label corresponding to the current preview picture.

For example, if the number of images to be captured is determined to be 7 frames according to the scene tag corresponding to the current preview image, the EV value corresponding to the preset exposure compensation mode is [ +1, +1, +1, +1,0, -3, -6]. The reference exposure amount determined according to the illuminance of the current shooting environment is X. Then, according to the reference exposure X and a preset exposure compensation mode, determining a target exposure corresponding to each frame of image to be acquired, and assuming that the EV value corresponding to the ith frame of image is EV_iThen it corresponds to the target exposure amount of

Stope as EV value 0The target exposure amount corresponding to the collected image is X, the target exposure amount corresponding to the image to be collected with the EV value of +1 is 2. X, and the target exposure amount corresponding to the image to be collected with the EV value of-3 is 2^-3·X。

And step 304, determining target sensitivity according to the current shaking degree of the camera module.

Wherein, the sensitivity, also called ISO value, is an index for measuring the sensitivity of the negative film to light. For a lower sensitivity film, a longer exposure time is required to achieve the same imaging as for a higher sensitivity film. The sensitivity of a digital camera is an index similar to the sensitivity of a film, and the ISO of a digital camera can be adjusted by adjusting the sensitivity of a photosensitive device or combining photosensitive points, that is, the ISO can be improved by increasing the light sensitivity of the photosensitive device or combining several adjacent photosensitive points. It should be noted that whether digital or film photography, the use of relatively high sensitivity generally introduces more noise in order to reduce the exposure time, resulting in reduced image quality.

In the embodiment of the present application, the target sensitivity is a minimum sensitivity that is determined according to a current shake degree of the image capturing module and is suitable for the current shake degree.

It should be noted that, in the embodiment of the present application, by simultaneously acquiring multiple frames of images with low sensitivity and synthesizing the acquired multiple frames of images to generate a target image, not only the dynamic range and the overall brightness of a captured image can be improved, but also noise in the image is effectively suppressed by controlling the value of the sensitivity, and the quality of the captured image is improved.

In the embodiment of the application, the current shaking degree of the mobile phone, that is, the current shaking degree of the camera module, can be determined by acquiring the current gyroscope (Gyro-sensor) information of the electronic device.

The gyroscope is also called as an angular velocity sensor and can measure the rotation angular velocity of the physical quantity during deflection and inclination. In the electronic equipment, the gyroscope can well measure the actions of rotation and deflection, so that the actual actions of a user can be accurately analyzed and judged. The gyroscope information (gyro information) of the electronic device may include motion information of the mobile phone in three dimensions in a three-dimensional space, and the three dimensions of the three-dimensional space may be respectively expressed as three directions of an X axis, a Y axis, and a Z axis, where the X axis, the Y axis, and the Z axis are in a pairwise vertical relationship.

It should be noted that, in a possible implementation form of the embodiment of the present application, the current shake degree of the camera module may be determined according to the current gyro information of the electronic device. The larger the absolute value of gyro motion of the electronic apparatus in three directions is, the larger the degree of shake of the camera module is. Specifically, absolute value thresholds of gyro motion in three directions may be preset, and the current shake degree of the camera module may be determined according to a relationship between the sum of the acquired absolute values of gyro motion in the three directions and the preset threshold.

For example, it is assumed that the preset thresholds are a third threshold a, a fourth threshold B, and a fifth threshold C, where a < B < C, and the sum of absolute values of gyro motion in three directions currently acquired is S. If S is less than A, determining that the current shaking degree of the camera module is 'no shaking'; if A < S < B, the current shaking degree of the camera module can be determined to be 'slight shaking'; if B < S < C, the current shaking degree of the camera module can be determined to be 'small shaking'; if S > C, the current shaking degree of the camera module can be determined to be large shaking.

It should be noted that the above examples are only illustrative and should not be construed as limiting the present application. During actual use, the number of the threshold values and the specific numerical values of the threshold values can be preset according to actual needs, and the mapping relation between gyro information and the jitter degree of the camera module can be preset according to the relation between the gyro information and the threshold values.

In a possible implementation form of the embodiment of the application, the target sensitivity of each frame of image can be determined according to the current shake degree of the camera module, so that the shooting time duration is controlled within a proper range. Specifically, if the current shake degree of the camera module is small, the target sensitivity can be properly compressed into a small value, so that the noise of each frame of image is effectively inhibited, and the quality of the shot image is improved; if the current shake degree of the camera module is larger, the target light sensitivity can be properly improved to a larger value so as to shorten the shooting time and avoid ghost image caused by aggravation of the shake degree.

For example, if the current shake degree of the camera module is determined as "no shake", the sensitivity may be determined to be a smaller value to obtain a higher quality image as much as possible, for example, the sensitivity is determined to be 100; if the current shaking degree of the camera module is determined to be 'slight shaking', the sensitivity can be determined to be a larger value so as to reduce the shooting time, for example, the sensitivity is determined to be 200; if the current shake degree of the camera module is determined to be small shake, the sensitivity can be further increased to reduce the shooting time duration, for example, the sensitivity is determined to be 220; if the current shake degree of the camera module is determined to be "large shake", it may be determined that the current shake degree is too large, and at this time, the sensitivity may be further increased to reduce the shooting time duration, for example, the sensitivity is determined to be 250.

And 305, determining the exposure time corresponding to each frame of image to be acquired according to the target exposure and the target sensitivity.

The exposure duration refers to the time of light passing through the lens.

The exposure amount is related to the aperture, the exposure time, and the sensitivity. The aperture, i.e., the clear aperture, determines the amount of light passing per unit time. When each frame of image to be collected corresponds to the same sensitivity and the same aperture size, the larger the exposure amount corresponding to the image to be collected is, the larger the exposure time corresponding to the image to be collected is.

In the embodiment of the application, the size of the aperture may be unchanged, so that after the target exposure and the target sensitivity of each frame of image to be acquired are determined, the exposure time corresponding to each frame of image to be acquired can be determined according to the target sensitivity and the target exposure, and the exposure time corresponding to the image to be acquired is in a direct proportion relation with the target exposure.

In one possible implementation form of the embodiment of the present application, the target sensitivity and the target sensitivity may be set according to a preset valueAnd determining the reference exposure time length by the reference exposure amount, and further determining the exposure time length corresponding to each frame of image to be acquired according to the reference exposure time length and a preset exposure compensation mode. Specifically, assuming that the reference exposure time length is T and the EV value of the ith frame of image to be acquired is EVi, the exposure time length corresponding to the ith frame of image to be acquired is

Further, in another possible implementation form of the embodiment of the application, in order to improve the quality of the night view shot image, multiple night view modes suitable for a night view shot scene may be directly preset, and when it is determined that a scene tag corresponding to a current preview picture is a "night view", the current night view mode is further determined according to other scene tags corresponding to regions in the current preview picture. The night scene mode includes shooting parameters such as the number of images to be acquired, target sensitivity, a preset exposure compensation mode and the like when the images are acquired in the night scene mode, and the preset night scene mode can be a tripod night scene mode, a handheld night scene mode, a portrait night scene mode and the like.

Specifically, when it is determined that the scene tag corresponding to the current preview picture is a "night scene", it may be further determined whether the scene tag corresponding to each region in the current preview picture includes a "portrait" scene tag, and brightness recognition may be performed on the content of the preview picture to determine the current reference exposure amount. And then determining the current night scene mode according to whether the scene labels corresponding to the regions contain the 'portrait' scene labels and the current shaking degree of the camera module. For example, if the current shake degree of the camera module is "no shake", and each region of the current preview picture does not include a "portrait" scene tag, it may be determined that the current night scene mode is the "horse night scene mode"; if the current shaking degree of the camera module is 'shaking', and each area of the current preview picture does not contain a 'portrait' scene label, the current night scene mode can be determined to be a 'handheld night scene mode'; if each area of the current preview picture contains a "portrait" scene tag, the current night scene mode can be determined to be the "portrait night scene mode".

It can be understood that after the current night view mode and the current reference exposure amount are determined according to the current jitter degree of the camera module, the scene label corresponding to the current preview picture and other scene labels corresponding to the regions, the exposure duration corresponding to each frame of image to be acquired can be determined according to the number of images to be acquired, the target light sensitivity, the preset exposure compensation mode and the reference exposure amount included in the night view mode.

Further, in a possible implementation form of the embodiment of the present application, a duration range in which the exposure duration corresponding to each frame of image to be acquired is located may also be set, so as to further improve the quality of the captured image, and the exposure duration corresponding to the image to be acquired, in which the exposure duration is not within the set duration range, is adjusted, so that the exposure duration corresponding to each frame of image to be acquired is within the set duration range.

Specifically, if the exposure duration of at least one frame of image to be acquired is greater than the set duration upper limit, updating the exposure duration corresponding to at least one frame of image to be acquired according to the set duration upper limit, wherein the value range of the set duration upper limit is 4.5s to 5.5 s; if the exposure time of at least one frame of original image is less than the set time lower limit, updating the exposure time of each frame of original image with the exposure time less than the set time lower limit according to the set time lower limit, wherein the time lower limit is more than or equal to 10 ms;

for example, assuming that the set lower limit of the duration is 10ms and the upper limit of the duration is 4.5s, determining the number of the images to be acquired as 7 frames according to the current jitter degree of the camera module, and determining the exposure durations corresponding to each frame of the images to be acquired as 220ms, 100ms, 12.5ms and 6.25ms, respectively, so that the exposure duration of the image to be acquired of the 7 th frame is less than the set lower limit of the duration, and the exposure duration of the image to be acquired with the exposure duration of the 7 th frame being 6.25ms can be updated to 10 ms.

Further, after the exposure duration of the image to be acquired, of which the exposure duration is less than the set lower duration limit or greater than the set upper duration limit, is updated, the exposure amount of the image to be acquired may be changed, so that the exposure durations of the image to be acquired of which the exposure duration is updated and the exposure durations of the images to be acquired of which the exposure durations are not updated are equal or similar, that is, the exposure amounts are equal or similar, so that the exposure compensation mode is changed, and finally the acquired target image is not in accordance with the expectation. Therefore, after the exposure time of the image to be acquired is updated, the exposure time or the sensitivity of the image to be acquired in other frames can be modified according to the ratio of the exposure time before and after the update.

In a possible implementation form of the embodiment of the application, a ratio between an updated exposure time of an image to be acquired, in which the exposure time is updated, and an updated exposure time is determined, and for each of the remaining frames of the image to be acquired, in which the exposure time is less than a set upper limit of the time and greater than a set lower limit of the time, the target sensitivity or the exposure time of each of the remaining frames of the image to be acquired is updated according to the determined ratio. Specifically, the ratio may be multiplied by the target sensitivity of the rest frames before updating the to-be-acquired image, and the product is used as the target sensitivity of the rest frames after updating the to-be-acquired image; or, taking the product of the ratio and the exposure duration before the updating of the images to be acquired of the rest frames as the exposure duration after the updating of the images to be acquired of the rest frames.

And step 306, sequentially collecting multiple frames of images according to the target sensitivity and the exposure time corresponding to each frame of image to be collected.

And 307, synthesizing the multi-frame images to generate a target image.

In the embodiment of the application, after the exposure time corresponding to each frame of image to be acquired is determined, multiple frames of images can be sequentially acquired according to the target sensitivity and the exposure time, and the acquired multiple frames of images are synthesized to generate the target image, so that the quality of the shot image is improved.

Furthermore, when the collected multi-frame images are synthesized, different weight values can be adopted for different images, so that the quality of the generated target image is better. That is, in a possible implementation form of the embodiment of the present application, the step 307 may include:

and synthesizing the multiple frames of images according to a preset weight value corresponding to each frame of image to generate a target image.

It should be noted that, in a possible implementation form of the embodiment of the present application, the acquired complete images of each frame may be sequentially superimposed according to a preset weight value corresponding to each image frame, so as to generate a composite image. The corresponding weighted values of each frame of image can be different, so that the overall brightness and dark area details of the image are improved, meanwhile, overexposure of a highlight area is prevented, and the overall quality of the shot image is improved.

In a possible implementation form of the embodiment of the application, the weight value corresponding to each frame of image may be preset according to an exposure compensation mode (that is, an EV value) corresponding to each frame of image, that is, a mapping relationship between the EV value and the weight value may be preset, and then the weight value corresponding to each frame of image may be determined according to the preset relationship between the EV value corresponding to each frame of image and the preset relationship between the EV value and the weight value, so as to synthesize a plurality of frames of acquired images, and generate a synthesized image.

In practical use, the weight value corresponding to each frame of image can be preset according to actual needs, and the embodiment of the application does not limit the weight value.

Furthermore, when the collected multi-frame images are synthesized, different synthesis modes can be adopted for synthesis according to different areas in the images, so that the quality of the synthesized images is further improved. That is, in a possible implementation form of the embodiment of the present application, the step 307 may include:

determining a synthesis mode corresponding to each region according to the scene label corresponding to each region;

and sequentially synthesizing each area in the multi-frame images according to the synthesis mode corresponding to each area to generate a target image.

In a possible implementation form of the embodiment of the present application, when synthesizing a plurality of acquired images, the scene labels corresponding to each region in the images are different, and the synthesis modes adopted in the synthesis may also be different. For example, the composition mode of each region corresponding to the scene tag corresponding to the current preview screen may be different from the composition mode of other regions, and the composition mode of each region corresponding to the scene tag corresponding to the preview screen may be preset to a more optimal composition mode, so as to further improve the image quality of each region corresponding to the scene tag corresponding to the preview screen.

For example, if the current preview screen is divided into 3 regions, the scene tag corresponding to the first region is "building", and the scene tags corresponding to the second region and the third region are "portrait", it may be determined that the scene tag corresponding to the current preview screen is "portrait", when the captured multiple frame images are synthesized, the image synthesis modes corresponding to the second region and the third region in each frame image are the same, and the synthesis modes corresponding to the second region and the third region and the first region are different, for example, the corresponding weight values are different.

The scene recognition method provided by the embodiment of the application can divide a current preview picture into a plurality of areas according to a preset rule, respectively recognize scenes in the plurality of areas to determine a scene label corresponding to the current preview picture, then determine the number of current images to be collected and a target exposure corresponding to each frame of image to be collected according to the scene label corresponding to the current preview picture, determine target light sensitivity according to the current jitter degree of a camera module, further determine an exposure time corresponding to each frame of image to be collected according to the target exposure and the target light sensitivity, and sequentially collect and synthesize a plurality of frames of images according to the target light sensitivity and the exposure time corresponding to each frame of image to be collected. Therefore, through the scene label corresponding to the current preview picture, the multi-frame images are collected and synthesized to generate the shot image, so that the accuracy of scene identification is improved, and the multi-frame images can be collected and synthesized in the corresponding target shooting mode according to the determined scene label, so that the quality of the shot image is further improved, and the user experience is improved.

In order to implement the above embodiments, the present application further provides a scene recognition apparatus.

Fig. 4 is a schematic structural diagram of a scene recognition device according to an embodiment of the present application.

As shown in fig. 4, the scene recognition apparatus 40 includes:

a dividing module 41, configured to divide a current preview picture into N regions according to a preset rule, where N is a positive integer greater than 1;

the identification module 42 is configured to perform scene identification on the N regions respectively by using a preset scene identification model to determine a scene tag corresponding to each region;

a first determining module 43, configured to determine, according to the scene tag corresponding to each region, a scene tag corresponding to the current preview picture;

and a second determining module 44, configured to determine a target shooting mode according to the scene tag corresponding to the current preview picture.

In practical use, the scene recognition apparatus provided in the embodiment of the present application may be configured in any electronic device to execute the foregoing scene recognition method.

The scene recognition device provided by the embodiment of the application can divide a current preview picture into a plurality of areas according to a preset rule, respectively perform scene recognition on the plurality of areas by using a preset scene recognition model to determine a scene label corresponding to each area, then determine the scene label corresponding to the current preview picture according to the scene label corresponding to each area, and further determine a target shooting mode according to the scene label corresponding to the current preview picture. Therefore, the current preview picture is divided into a plurality of areas, and each area is subjected to scene recognition, and then the scene label corresponding to the current preview picture can be determined according to the scene label corresponding to each area, so that the mutual interference of various picture contents in the preview picture is reduced, the scene recognition accuracy is improved, and the user experience is improved.

In a possible implementation form of the present application, the first determining module 43 is specifically configured to:

Further, in another possible implementation form of the present application, if the largest number of scene tags includes a first scene tag and a second scene tag, the scene recognition apparatus 40 further includes:

the third determining module is used for determining the confidence of the scene label corresponding to each region;

accordingly, the first determining module 43 is further configured to:

In a possible implementation form of the present application, the dividing module 41 is specifically configured to:

dividing the current preview picture into N areas with different areas;

accordingly, the first determining module 43 is further configured to:

and determining the scene label corresponding to the current preview picture according to the number of the scene labels corresponding to the current preview picture and the total area of the area corresponding to the scene labels.

In a possible implementation form of the present application, the second determining module 44 is specifically configured to:

determining the number of images to be acquired currently and the target exposure corresponding to each frame of image to be acquired;

determining target light sensitivity according to the current jitter degree of the camera module;

and determining the exposure duration corresponding to each frame of image to be acquired according to the target exposure and the target sensitivity.

Further, in another possible implementation form of the present application, the scene recognition apparatus 40 further includes:

the acquisition module is used for sequentially acquiring multiple frames of images according to the target light sensitivity and the exposure time corresponding to each frame of image to be acquired;

and the synthesis module is used for carrying out synthesis processing on the multi-frame images to generate a target image.

Further, in another possible implementation form of the present application, the synthesis module is specifically configured to:

Further, in another possible implementation form of the present application, the combining module is further configured to:

It should be noted that the foregoing explanation on the embodiments of the scene recognition method shown in fig. 1, fig. 2, and fig. 3 also applies to the scene recognition device 40 of the embodiment, and details are not repeated here.

The scene recognition device provided by the embodiment of the application can divide a current preview picture into a plurality of regions with different areas according to a preset rule, respectively perform scene recognition on the plurality of regions to determine a scene label corresponding to each region, then determine a scene label corresponding to the current preview picture according to the number of the scene labels corresponding to the current preview picture and the total area of the regions corresponding to the scene labels, and further determine a target shooting mode according to the scene label corresponding to the current preview picture. Therefore, the current preview picture is divided into a plurality of areas with different areas, and the scene label corresponding to the current preview picture is determined according to the number of the scene labels corresponding to all the areas and the total area of the area corresponding to the scene labels, so that the mutual interference of various picture contents in the preview picture is reduced, the accuracy of scene recognition is further improved, and the user experience is further improved. In order to implement the above embodiments, the present application further provides an electronic device.

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

As shown in fig. 5, the electronic device 200 includes: the scene recognition method includes a camera module 201, a memory 210, a processor 220, and a computer program stored on the memory and capable of running on the processor, where the processor executes the program to implement the scene recognition method according to the embodiment of the present application.

As shown in fig. 6, the electronic device 200 provided in the embodiment of the present application may further include:

a memory 210 and a processor 220, and a bus 230 connecting different components (including the memory 210 and the processor 220), wherein the memory 210 stores a computer program, and when the processor 220 executes the computer program, the scene recognition method according to the embodiment of the present application is implemented.

Bus 230 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Electronic device 200 typically includes a variety of electronic device readable media. Such media may be any available media that is accessible by electronic device 200 and includes both volatile and nonvolatile media, removable and non-removable media.

Memory 210 may also include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)240 and/or cache memory 250. The electronic device 200 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 260 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, commonly referred to as a "hard drive"). Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 230 by one or more data media interfaces. Memory 210 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the application.

A program/utility 280 having a set (at least one) of program modules 270, including but not limited to an operating system, one or more application programs, other program modules, and program data, each of which or some combination thereof may comprise an implementation of a network environment, may be stored in, for example, the memory 210. The program modules 270 generally perform the functions and/or methodologies of the embodiments described herein.

Electronic device 200 may also communicate with one or more external devices 290 (e.g., keyboard, pointing device, display 291, etc.), with one or more devices that enable a user to interact with electronic device 200, and/or with any devices (e.g., network card, modem, etc.) that enable electronic device 200 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 292. Also, the electronic device 200 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 293. As shown, the network adapter 293 communicates with the other modules of the electronic device 200 via the bus 230. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 200, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processor 220 executes various functional applications and data processing by executing programs stored in the memory 210.

It should be noted that, for the implementation process and the technical principle of the electronic device of the embodiment, reference is made to the foregoing explanation of the scene identification method in the embodiment of the present application, and details are not described here again.

The electronic device provided by the embodiment of the application can execute the scene recognition method, divide the current preview picture into a plurality of areas according to a preset rule, respectively perform scene recognition on the plurality of areas by using a preset scene recognition model to determine a scene tag corresponding to each area, then determine the scene tag corresponding to the current preview picture according to the scene tag corresponding to each area, and further determine the target shooting mode according to the scene tag corresponding to the current preview picture. Therefore, the current preview picture is divided into a plurality of areas, and each area is subjected to scene recognition, and then the scene label corresponding to the current preview picture can be determined according to the scene label corresponding to each area, so that the mutual interference of various picture contents in the preview picture is reduced, the accuracy of scene recognition is improved, and the user experience is improved.

In order to implement the above embodiments, the present application also proposes a computer-readable storage medium.

The computer readable storage medium stores thereon a computer program, and the computer program is executed by a processor to implement the scene recognition method according to the embodiment of the present application.

In order to implement the foregoing embodiments, a further embodiment of the present application provides a computer program, which when executed by a processor, implements the scene recognition method according to the embodiments of the present application.

In an alternative implementation, the embodiments may be implemented in any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the consumer electronic device, partly on the consumer electronic device, as a stand-alone software package, partly on the consumer electronic device and partly on a remote electronic device, or entirely on the remote electronic device or server. In the case of remote electronic devices, the remote electronic devices may be connected to the consumer electronic device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to external electronic devices (e.g., through the internet using an internet service provider).

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method for scene recognition, comprising:

dividing a current preview picture into N areas according to a preset rule, wherein N is a positive integer greater than 1;

respectively carrying out scene recognition on the N areas by using a preset scene recognition model so as to determine a scene label corresponding to each area;

determining a scene label corresponding to the current preview picture according to the scene label corresponding to each area;

determining a target shooting mode according to the scene label corresponding to the current preview picture;

the dividing of the current preview screen into N regions includes:

dividing the current preview picture into N regions with different areas, wherein the preview picture comprises an attention region and a non-attention region, and the area of the region divided by the attention region is smaller than that of the region divided by the non-attention region;

the determining the scene label corresponding to the current preview picture includes:

and determining the scene labels corresponding to the current preview picture according to the number of the scene labels corresponding to the current preview picture and the total area of the area corresponding to the scene labels, and if the scene labels with the largest number are multiple in the scene labels corresponding to the current preview picture, determining the scene labels corresponding to the current preview picture according to the total area of the area corresponding to each scene label with the largest number.

2. The method of claim 1, wherein determining the scene label corresponding to the current preview picture according to the scene label corresponding to each of the regions comprises:

counting the scene labels corresponding to each area, and determining the number of the scene labels corresponding to the current preview picture;

3. The method of claim 2, wherein if the largest number of scene tags includes the first scene tag and the second scene tag, after determining the scene tag corresponding to each region, further comprising:

determining the confidence of the scene label corresponding to each region;

4. The method of any of claims 1-3, wherein the determining the target shooting mode comprises:

5. The method of claim 4, wherein after determining the target capture mode, further comprising:

sequentially collecting multiple frames of images according to the target light sensitivity and the exposure time corresponding to each frame of image to be collected;

and synthesizing the multi-frame images to generate a target image.

6. The method of claim 5, wherein the synthesizing the plurality of frames of images to generate the target image comprises:

7. The method of claim 5, wherein the synthesizing the plurality of frames of images to generate the target image comprises:

8. A scene recognition apparatus, comprising:

the device comprises a dividing module, a display module and a display module, wherein the dividing module is used for dividing a current preview picture into N areas according to a preset rule, and N is a positive integer greater than 1;

the recognition module is used for respectively carrying out scene recognition on the N areas by using a preset scene recognition model so as to determine a scene label corresponding to each area;

a first determining module, configured to determine, according to the scene tag corresponding to each region, a scene tag corresponding to the current preview picture;

the second determining module is used for determining a target shooting mode according to the scene label corresponding to the current preview picture;

the dividing module is specifically configured to:

the first determining module is further configured to:

9. An electronic device, comprising: the camera module, the memory, the processor and the computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the scene recognition method according to any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the scene recognition method according to any one of claims 1 to 7.