CN114339054B

CN114339054B - Method and device for generating photographing mode and computer readable storage medium

Info

Publication number: CN114339054B
Application number: CN202210012847.6A
Authority: CN
Inventors: 李洪敏; 孙伟; 杜明亮
Original assignee: Beijing Huawei Digital Technologies Co Ltd
Current assignee: Beijing Huawei Digital Technologies Co Ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2024-01-09
Anticipated expiration: 2039-12-31
Also published as: CN111147751A; CN111147751B; CN114339054A

Abstract

The application provides a photographing mode generation method, a photographing mode generation device and a computer readable storage medium. Relates to the field of artificial intelligence, and the method comprises the following steps: acquiring a preview stream of a scene; predicting at least one algorithm function and functional parameters of the at least one algorithm function applied to the scene from a plurality of algorithm functions according to a preview stream of the scene, wherein the algorithm functions are used for performing algorithm processing of images or videos on the preview stream; assembling the predicted at least one algorithmic function and the functional parameters of the at least one algorithmic function to generate a photographing mode for the scene. According to the technical scheme, the scene is not required to be divided in advance and the shooting mode is not required to be defined according to the preview flow prediction and assembly algorithm function of the scene and the function parameters of the algorithm function, and the intelligent and automatic shooting mode generation is realized while the shooting mode is optimized.

Description

Method and device for generating photographing mode and computer readable storage medium

Technical Field

The present application relates to the field of image processing, and more particularly, to a method, apparatus, and computer-readable storage medium for generating a photographing mode.

Background

With the enhancement of the performance and the improvement of the computing power of the camera module of the terminal device, more and more algorithm functions and functional parameters are built in the terminal device to improve the final imaging quality of the camera photographing. Generally, a certain algorithm function or function parameter can only improve a single photographing effect, and a better photographing effect can be obtained in a certain scene by combining a plurality of algorithm functions and function parameters to form a photographing mode.

Current camera applications form multiple photographing modes by manually assembling one or more algorithmic functions for a scene, for example, many camera applications provide portrait mode, super night scene mode, large aperture mode, high dynamic range imaging (high dynamic range, HDR) mode, macro mode, etc. However, the scenes are complex and various, and the manner of artificial definition cannot cover photographing of all the scenes, which results in more and more photographing modes and has negative effects on display and user selection. In addition, camera applications provide a professional model that provides a collection of algorithmic functions, primarily for expert-aware photographing users, who need to know the impact of each algorithmic function and functional parameter on the photographing effect.

Disclosure of Invention

The application provides a photographing mode generation method, a photographing mode generation device and a computer readable storage medium, which can realize the intellectualization and automation of photographing mode generation while optimizing the photographing mode.

In a first aspect, a method for generating a photographing mode is provided, the method including: acquiring a preview stream of a scene; predicting, from a plurality of algorithmic functions, at least one algorithmic function and at least one functional parameter of the algorithmic function applied to the scene, based on a preview stream of the scene, wherein the algorithmic function is used to perform algorithmic processing of an image or video on the preview stream; the predicted at least one algorithmic function and the functional parameters of the at least one algorithmic function are assembled to generate a photographing mode for the scene.

According to the preview flow of the scene, the openable algorithm function and the functional parameter of the algorithm function are automatically predicted, the algorithm function and the functional parameter which are predicted to be opened are automatically assembled into a shooting mode, the scene is not required to be divided in advance, the shooting mode is not required to be defined, and the intelligent and automatic generation of the shooting mode is realized while the shooting mode is optimized.

With reference to the first aspect, in certain implementations of the first aspect, predicting, from a preview stream of the scene, at least one algorithm function and a functional parameter of the at least one algorithm function applied to the scene from a plurality of algorithm functions includes: determining a confidence level of each of the plurality of algorithm functions and the corresponding function parameter; the at least one algorithm function and the functional parameters of the at least one algorithm function applied to the scene are predicted based on the confidence level.

With reference to the first aspect, in certain implementations of the first aspect, determining a confidence level for each of the plurality of algorithm functions and the corresponding function parameter includes: determining a first confidence level for each algorithm function and a second confidence level for each function parameter for each algorithm function; alternatively, a confidence level of the combination of the algorithm function and the functional parameter is determined.

With reference to the first aspect, in certain implementations of the first aspect, the assembling the predicted at least one algorithm function and the functional parameters of the at least one algorithm function to generate a photographing mode for the scene includes: and assembling at least one algorithm function and the functional parameters of the at least one algorithm function according to the mutual exclusion information of the algorithm function and the functional parameters.

With reference to the first aspect, in certain implementations of the first aspect, the method further includes: a predictive model for making the prediction is trained.

Alternatively, the prediction model may be a single-task prediction model that predicts a single algorithm function or function parameter, or may be a multi-task prediction model that predicts multiple algorithm functions and function parameters simultaneously.

The scene is not required to be divided in advance, the photographing mode is defined, the starting of the algorithm functions and the function parameters is predicted by adopting an artificial intelligence mode through a prediction model, and the influence of each algorithm function and each function parameter on the photographing effect is not required to be considered by a user, so that the user operation is simplified, and the intellectualization and the automation of the photographing mode generation are improved.

With reference to the first aspect, in certain implementation manners of the first aspect, the prediction model is trained in advance according to training data, where the training data includes training pictures or training video data, a training algorithm function, and a functional parameter of the training algorithm function.

With reference to the first aspect, in certain implementation manners of the first aspect, the training picture or training video data is preview picture or video data obtained in a shooting scene, and the labeled algorithm function and the functional parameter of the algorithm function are taken as the training algorithm function and the functional parameter of the training algorithm function through labeling the preview picture or video data (for example, labeling by an intelligent labeling system based on a neural network or labeling by a user with professional background or good shooting).

With reference to the first aspect, in some implementations of the first aspect, the training picture is a captured picture, the captured picture is scored, and an algorithm function of the picture with a score higher than a first threshold and a functional parameter of the algorithm function are used as the training algorithm function and the functional parameter of the training algorithm function.

The quality of photographing mode generation can be improved by using the algorithm function of the preview picture or video data marked by a user with professional background or good photographing or using the picture with the score higher than a certain threshold value and the functional parameters of the algorithm function as the training algorithm function and the functional parameters of the training algorithm function, so that the optimization of the photographing mode is realized.

With reference to the first aspect, in certain implementations of the first aspect, the predictive model is a classifier or neural network model trained for each algorithm function and the functional parameters of the algorithm function.

With reference to the first aspect, in certain implementations of the first aspect, the predictive model is a neural network model trained for a plurality of algorithmic functions and functional parameters of the algorithmic functions.

With reference to the first aspect, in certain implementation manners of the first aspect, the acquiring a preview stream of a scene further includes: features of the scene are obtained from a preview stream of the scene.

The number of times of prediction of the algorithm function and the function parameter can be reduced through the feature matching of the scene, and the efficiency of generating the photographing mode is improved.

With reference to the first aspect, in certain implementation manners of the first aspect, before predicting, from the preview stream of the scene, at least one algorithm function and a function parameter of the at least one algorithm function applied to the scene from a plurality of algorithm functions, further includes:

according to the characteristics of the scene, matching the scene with the pictures stored in the cloud to search the pictures matched with the scene from the pictures stored in the cloud;

And taking the algorithm function corresponding to the picture matched with the scene and the function parameter of the algorithm function stored in the cloud as the predicted function parameter of the at least one algorithm function and the at least one algorithm function.

With reference to the first aspect, in certain implementations of the first aspect, the pictures saved by the cloud end are pictures with aesthetic scores higher than a second threshold.

In a second aspect, there is provided a photographing mode generating apparatus, the apparatus comprising: the acquisition module is used for acquiring a preview stream of a scene; a prediction module for predicting, from a plurality of algorithm functions, at least one algorithm function and at least one function parameter of the algorithm function applied to the scene according to a preview stream of the scene, wherein the algorithm function is used for performing an algorithm process of an image or a video on the preview stream; an assembling module for assembling the predicted at least one algorithm function and the functional parameters of the at least one algorithm function to generate a photographing mode for the scene.

According to the preview flow of the scene, the openable algorithm function and the functional parameter of the algorithm function are automatically predicted, and the predicted openable algorithm function and the predicted functional parameter are automatically assembled into a photographing mode, so that the scene is not required to be divided in advance and the photographing mode is not required to be defined, and the intellectualization and automation of optimizing the generation of the photographing mode are realized.

With reference to the second aspect, in certain implementations of the second aspect, the prediction module is further configured to determine a confidence level for each of the plurality of algorithm functions and the corresponding function parameter; the at least one algorithm function and the functional parameters of the at least one algorithm function applied to the scene are predicted based on the confidence level.

With reference to the second aspect, in certain implementations of the second aspect, the prediction module is specifically configured to determine a first confidence level for each algorithm function and a second confidence level for each function parameter for each algorithm function; alternatively, the prediction module is specifically configured to determine a confidence level of a combination of the algorithm function and the functional parameter.

With reference to the second aspect, in certain implementation manners of the second aspect, the assembling module is specifically configured to assemble at least one algorithm function and a function parameter of the at least one algorithm function according to mutual exclusion information of the algorithm function and the function parameter.

With reference to the second aspect, in certain implementations of the second aspect, the apparatus further includes a training module that trains a prediction model for making the prediction.

The scene is not required to be divided in advance, the photographing mode is defined, the starting of the algorithm functions and the function parameters is predicted by adopting an artificial intelligence mode through a prediction model, and the influence of each algorithm function and each function parameter on the photographing effect is not required to be considered by a user, so that the user operation is simplified, and meanwhile, the intellectualization and the automation of the photographing mode generation are improved.

With reference to the second aspect, in certain implementations of the second aspect, the prediction model is pre-trained according to training data, the training data including training pictures or training video data, training algorithm functions, and functional parameters of the training algorithm functions.

With reference to the second aspect, in some implementations of the second aspect, the training picture or training video data is preview picture or video data obtained in a shooting scene, and the labeled algorithm function and the functional parameter of the algorithm function are taken as the training algorithm function and the functional parameter of the training algorithm function through labeling the preview picture or video data (for example, labeling by an intelligent labeling system based on a neural network or labeling by a user with professional background or good shooting).

With reference to the second aspect, in some implementations of the second aspect, the training picture is a captured picture, the captured picture is scored, and an algorithm function of the picture with a score higher than the first threshold and a function parameter of the algorithm function are used as the training algorithm function and the function parameter of the training algorithm function.

With reference to the second aspect, in certain implementations of the second aspect, the predictive model is a classifier or neural network model trained for each algorithm function and the functional parameters of the algorithm function.

With reference to the second aspect, in certain implementations of the second aspect, the predictive model is a neural network model trained for a plurality of algorithmic functions and functional parameters of the algorithmic functions.

With reference to the second aspect, in some implementations of the second aspect, the acquiring module is further configured to acquire a feature of the scene from a preview stream of the scene.

With reference to the second aspect, in certain implementations of the second aspect, the apparatus further includes:

The matching module is used for matching the scene with the pictures stored in the cloud according to the characteristics of the scene so as to search the pictures matched with the scene from the pictures stored in the cloud; the acquisition module is also used for acquiring the algorithm function corresponding to the picture matched with the scene and the function parameter of the algorithm function, which are stored in the cloud; the prediction module is further configured to use the algorithm function corresponding to the picture matching the scene and the function parameter of the algorithm function as the predicted at least one algorithm function and the function parameter of the at least one algorithm function.

With reference to the second aspect, in some implementations of the second aspect, the pictures stored in the cloud end are pictures with aesthetic scores higher than a second threshold.

In a third aspect, there is provided a computer readable storage medium comprising a computer program which, when run on a computer device, causes a processing unit in the computer device to perform the method according to the first aspect.

In a fourth aspect, there is provided a computer program product comprising a computer program which, when run on a computer device, causes a processing unit in the computer device to perform the method according to the first aspect.

Drawings

FIG. 1 is an exemplary diagram of a photographing mode based on an artificial definition;

FIG. 2 is an exemplary diagram of an operational flow of an artificially defined photographing mode;

FIG. 3 is an example diagram of a system architecture of an embodiment of the present application;

fig. 4 is an exemplary diagram of generating a photographing mode according to a CNN model according to an embodiment of the present application;

FIG. 5 is an exemplary diagram of a hardware structure of a chip according to an embodiment of the present application;

fig. 6 is an exemplary diagram of a method for generating a photographing mode according to an embodiment of the present application;

FIG. 7 is an exemplary diagram of another method for generating a photographing mode according to an embodiment of the present application;

FIG. 8 is an exemplary diagram of a user interface for photography mode generation provided by embodiments of the present application;

FIG. 9 is a diagram illustrating another example user interface for photography mode generation provided by embodiments of the present application;

FIG. 10 is an exemplary diagram of a training method for a predictive model provided in an embodiment of the present application;

fig. 11 is an exemplary diagram of a generation apparatus of a photographing mode according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application.

For easy understanding, first, an application scenario of the embodiment of the present application will be briefly described.

In order to improve the final imaging quality of the terminal camera, a plurality of algorithm functions and functional parameters are built in the terminal equipment, wherein the algorithm functions comprise functions of image blurring, beautifying, high dynamic range imaging (high dynamic range, HDR), night scene functions, streaming shutter functions, large aperture functions and the like for performing algorithm processing on video and/or images on imaged preview streams to obtain final video and/or images. The function parameter is a parameter configuration of a certain algorithm function, such as an aperture value in a large aperture function, a beauty level in a beauty function, and the like. Generally, a certain algorithm function and its functional parameters can only improve a single photographing effect, for example, the beautifying function can only beautify a human image, and the night scene function can only optimize a night scene. If it is desired to obtain a better photographing experience in a certain scenario, it may be necessary to combine one or more of the algorithm functions (if further necessary, to consider different functional parameters of each algorithm function in the case of recombination, for example, the effect of the processing result caused by the combination of different functional parameters in the same algorithm function). This combination of one or more algorithmic functions (and functional parameters of the algorithmic functions, if more refined functional parameters are required, are also considered), may be referred to as a photography mode.

For a certain scenario, multiple photographing modes may be formed by manually assembling one or more algorithmic functions. For example, the camera application may provide an artificially assembled portrait mode, a super night view mode, a large aperture mode, an HDR mode, a macro mode, and the like. Fig. 1 is an exemplary diagram of a photographing mode based on an artificial definition, as shown in fig. 1, in which one photographing mode is formed by artificially assembling one or several algorithm functions, for example, a portrait mode may include portrait blurring, beauty, HDR, and other algorithm functions. After the photographing mode is set, at the time of actual photographing, a mode suitable for the current scene may be selected from among a plurality of photographing modes that have been set through scene recognition.

Fig. 2 is an exemplary diagram of an operation flow of an artificially defined photographing mode. As shown in fig. 2, when a face in a scene is detected, a portrait mode may be recommended to a user, so that the user may take a photograph using a combination of an algorithm function and its functional parameters in the portrait mode.

However, the photographing mode based on the human definition has the following problems. Firstly, the photographing mode is defined manually in advance, the algorithm function and the function parameter needed in the scene are determined mainly through experience of a person, and the mode is not necessarily a proper mode, for example, although a face exists in some scenes, the mode is possibly only used as a background, the image mode is not proper to be selected, or the algorithm function/function parameter of the image mode is not proper to the current scene.

On the other hand, when the recognized scene does not have the corresponding photographing mode, the photographing mode cannot be recommended to the user. The modes which are manually defined cannot cover photographing of all scenes, and if too many photographing modes are manually defined in order to adapt to complex and diverse scenes, the burden of setting personnel is increased, built-in photographing modes are increased, and the mode display and user selection are negatively affected.

In addition, a professional mode may be provided. The professional mode presents options of various algorithm functions/function parameters to the user, and does not recommend the combined photographing mode to the user, but the user uses a certain algorithm function according to own professional knowledge or experience to adjust a certain function parameter. However, the method requires the user to have a very deep photographing professional background, and can know the influence of each algorithm function and each function parameter on the photographing effect; different terminal equipment and different bottom camera specifications have different effects of algorithm functions and function parameters in the professional mode, and the difficulty of user selection is increased.

In the embodiment of the application, the proper photographing mode is not simply selected from given photographing modes, the algorithm function or the functional parameter is not simply provided for the user to select, but the proper algorithm function and the functional parameter of the algorithm function are automatically predicted and assembled from a plurality of algorithm functions according to the preview flow of the scene, so that an optimized photographing mode is generated, and the intellectualization and automation of the generation of the photographing mode of the camera are realized.

The generation of the photographing mode according to the embodiment of the present application may be performed by a neural network (model), and in order to better understand the method for generating the photographing mode according to the embodiment of the present application, related terms and concepts of the neural network are described below.

(1) Neural network

The neural network may be composed of neural units, which may be referred to as x _s And an arithmetic unit whose intercept 1 is an input, the output of the arithmetic unit may be as shown in formula (1):

wherein s=1, 2, … … n, n is a natural number greater than 1, W _s Is x _s B is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for non-linearly transforming the features in the neural network to convert the input signal in the neural unit to the output signal. The output signal of the activation function may be used as an input to a next convolutional layer, and the activation function may be a sigmoid function. A neural network is a network formed by joining together a plurality of the above-described single neural units, i.e., the output of one neural unit may be the input of another neural unit. The input of each neural unit may be connected to a local receptive field of a previous layer to extract features of the local receptive field, which may be an area composed of several neural units.

(2) Deep neural network

Deep neural networks (deep neural network, DNN), also known as multi-layer neural networks, can be understood as neural networks with multiple hidden layers. The DNNs are divided according to the positions of different layers, and the neural networks inside the DNNs can be divided into three types: input layer, hidden layer, output layer. Typically the first layer is the input layer, the last layer is the output layer, and the intermediate layers are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.

Although DNN appears to be complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression:wherein (1)>Is an input vector, +.>Is the output vector, +.>Is the offset vector, W is the weight matrix (also called coefficient), and α () is the activation function. Each layer is only for the input vector +.>The output vector is obtained by such simple operation>Since the DNN layers are many, the coefficient W and the offset vector +.>And the number of (2) is also relatively large. The definition of these parameters in DNN is as follows: taking the coefficient W as an example: it is assumed that in DNN of one three layers, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as +. >The superscript 3 represents the number of layers in which the coefficient W is located, while the subscript corresponds to the third outputLayer index 2 and input second layer index 4.

In summary, the coefficients of the kth neuron of the L-1 layer to the jth neuron of the L layer are defined as

It should be noted that the input layer is devoid of W parameters. In deep neural networks, more hidden layers make the network more capable of characterizing complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the greater the "capacity", meaning that it can accomplish more complex learning tasks. The process of training the deep neural network, i.e. learning the weight matrix, has the final objective of obtaining a weight matrix (a weight matrix formed by a number of layers of vectors W) for all layers of the trained deep neural network.

(3) Convolutional neural network

The convolutional neural network (convolutional neuron network, CNN) is a deep neural network with a convolutional structure. The convolutional neural network comprises a feature extractor consisting of a convolutional layer and a sub-sampling layer, which can be regarded as a filter. The convolution layer refers to a neuron layer in the convolution neural network, which performs convolution processing on an input signal. In the convolutional layer of the convolutional neural network, one neuron may be connected with only a part of adjacent layer neurons. A convolutional layer typically contains a number of feature planes, each of which may be composed of a number of neural elements arranged in a rectangular pattern. Neural elements of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights can be understood as the way image information is extracted is independent of location. The convolution kernel can be initialized in the form of a matrix with random size, and reasonable weight can be obtained through learning in the training process of the convolution neural network. In addition, the direct benefit of sharing weights is to reduce the connections between layers of the convolutional neural network, while reducing the risk of overfitting.

(4) Residual error network

The residual network is a deep convolutional network proposed in 2015, which is easier to optimize than the conventional convolutional neural network, and can improve accuracy by increasing a considerable depth. The core of the residual network is to solve the side effects (degradation problems) caused by increasing the depth, so that the network performance can be improved by simply increasing the network depth. The residual network typically contains many sub-modules of identical structure, and a residual network (res net) is typically used to connect a number to indicate the number of times the sub-modules repeat, e.g., res net50 indicates that there are 50 sub-modules in the residual network.

(6) Classifier

Many neural network architectures eventually have a classifier for classifying objects in the image. The classifier is generally composed of a fully connected layer (fully connected layer) and a softmax function (which may be referred to as a normalized exponential function) that is capable of outputting probabilities of different classes based on inputs.

(7) Loss function

In training the deep neural network, since the output of the deep neural network is expected to be as close to the value actually expected, the weight vector of each layer of the neural network can be updated by comparing the predicted value of the current network with the actually expected target value according to the difference between the predicted value of the current network and the actually expected target value (of course, there is usually an initialization process before the first update, that is, the pre-configuration parameters of each layer in the deep neural network), for example, if the predicted value of the network is higher, the weight vector is adjusted to be lower than the predicted value, and the adjustment is continuously performed until the deep neural network can predict the actually expected target value or the value very close to the actually expected target value. Thus, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which is a loss function (loss function) or an objective function (objective function), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, the higher the output value (loss) of the loss function is, the larger the difference is, and then the training of the deep neural network becomes a process of reducing the loss as much as possible.

(8) Back propagation algorithm

The neural network can adopt a Back Propagation (BP) algorithm to correct the values of parameters in the initial neural network model in the training process, so that the reconstruction error loss of the neural network model is smaller and smaller. Specifically, the input signal is transmitted forward until the output is generated with error loss, and the parameters in the initial neural network model are updated by back propagation of the error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion that dominates the error loss, and aims to obtain parameters of the optimal neural network model, such as a weight matrix.

Some of the basic contents of neural networks are briefly described above, and some of the specific neural networks that may be used in image processing are described below.

The system architecture of the embodiments of the present application is described in detail below with reference to fig. 3.

Fig. 3 is a schematic diagram of a system architecture of an embodiment of the present application. As shown in fig. 3, the system architecture 100 includes an execution device 110, a training device 120, a database 130, a client device 140, a data storage system 150, and a data acquisition system 160.

In addition, the execution device 110 includes a calculation module 111, an I/O interface 112, a preprocessing module 113, and a preprocessing module 114. Among other things, the calculation module 111 may include the target model/rule 101, with the preprocessing module 113 and the preprocessing module 114 being optional.

The data acquisition device 160 is used to acquire training data. For the method for generating a photographing mode according to the embodiment of the present application, the training data may include training pictures or training video data, training algorithm functions, and functional parameters of the training algorithm functions. After the training data is collected, the data collection device 160 stores the training data in the database 130 and the training device 120 trains the target model/rule 101 based on the training data maintained in the database 130.

The following describes the training device 120 obtaining the target model/rule 101 based on the training data, the training device 120 detects the input training picture or training video data, and compares the output algorithm function and function parameter with the training algorithm function and the function parameter of the training algorithm function until the difference between the algorithm function and function parameter output by the training device 120 and the training algorithm function and the function parameter of the training algorithm function is smaller than a certain threshold value, thereby completing the training of the target model/rule 101.

The above-mentioned target model/rule 101 can be used to implement the method for generating a photographing mode according to the embodiment of the present application, that is, a photographing mode of a preview scene can be generated by inputting a preview stream of the scene (after related preprocessing) into the target model/rule 101. The target model/rule 101 in the embodiment of the present application may be specifically a neural network. It should be noted that, in practical applications, the training data maintained in the database 130 is not necessarily all acquired by the data acquisition device 160, but may be received from other devices. It should be noted that the training device 120 is not necessarily completely based on the training data maintained by the database 130 to perform training of the target model/rule 101, and it is also possible to obtain the training data from the cloud or other places to perform model training, which should not be taken as a limitation of the embodiments of the present application.

The target model/rule 101 obtained by training according to the training device 120 may be applied to different systems or devices, such as the execution device 110 shown in fig. 3, where the execution device 110 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an augmented reality (augmented reality, AR)/Virtual Reality (VR), a vehicle-mounted terminal, or may also be a server or cloud. In fig. 3, the execution device 110 configures an input/output (I/O) interface 112 for data interaction with an external device, and a user may input data to the I/O interface 112 through the client device 140, where the input data may include in embodiments of the present application: a preview stream of a scene input by a client device. The client device 140 here may be in particular a terminal device.

The preprocessing module 113 and the preprocessing module 114 are used for preprocessing according to the input data scene preview stream received by the I/O interface 112, and in this embodiment of the present application, there may be no preprocessing module 113 and no preprocessing module 114 or only one preprocessing module. When the preprocessing module 113 and the preprocessing module 114 are not present, the calculation module 111 may be directly employed to process the input data.

In preprocessing input data by the execution device 110, or in performing processing related to computation or the like by the computation module 111 of the execution device 110, the execution device 110 may call data, codes or the like in the data storage system 150 for corresponding processing, or may store data, instructions or the like obtained by corresponding processing in the data storage system 150.

Finally, the I/O interface 112 presents the processing results, such as the photographing mode calculated by the object model/rule 101, to the client device 140, thereby providing the user with the processing results.

Specifically, the photographing mode obtained by processing the object model/rule 101 in the computing module 111 may be processed by the preprocessing module 113 (or may be further processed by the preprocessing module 114), and then the processing result is sent to the I/O interface, and then the processing result is sent to the client device 140 by the I/O interface for display.

It should be understood that when the preprocessing module 113 and the preprocessing module 114 are not present in the system architecture 100, the computing module 111 may also transmit the photographing mode obtained by processing to the I/O interface, and then send the photographing mode to the client device 140 for display by the I/O interface.

It should be noted that the training device 120 may generate, based on different training data, a corresponding target model/rule 101 for different targets or different tasks, where the corresponding target model/rule 101 may be used to achieve the targets or complete the tasks, thereby providing the user with the desired result.

In the case shown in FIG. 3, the user may manually give input data, which may be manipulated through an interface provided by the I/O interface 112. In another case, the client device 140 may automatically send the input data to the I/O interface 112, and if the client device 140 is required to automatically send the input data requiring the user's authorization, the user may set the corresponding permissions in the client device 140. The user may view the results output by the execution device 110 at the client device 140, and the specific presentation may be in the form of a display, a sound, an action, or the like. The client device 140 may also be used as a data collection terminal to collect input data of the input I/O interface 112 and output results of the output I/O interface 112 as new sample data as shown in the figure, and store the new sample data in the database 130. Of course, instead of being collected by the client device 140, the I/O interface 112 may directly store the input data input to the I/O interface 112 and the output result output from the I/O interface 112 as new sample data into the database 130.

It should be noted that fig. 3 is only a schematic diagram of a system architecture provided in the embodiment of the present application, and the positional relationship among devices, apparatuses, modules, etc. shown in the drawing is not limited in any way, for example, in fig. 3, the data storage system 150 is an external memory with respect to the execution device 110, and in other cases, the data storage system 150 may be disposed in the execution device 110.

As shown in fig. 3, the target model/rule 101 may be a prediction model in the embodiment of the present application, and specifically, the prediction model provided in the embodiment of the present application may be a neural network, for example, CNN and deep convolutional neural network (deep convolutional neural networks, DCNN), and so on.

Since CNN is a very common neural network, the structure of CNN will be described in detail with reference to fig. 4. As described in the basic concept introduction above, the convolutional neural network is a deep neural network with a convolutional structure, and is a deep learning architecture, in which multiple levels of learning are performed at different abstraction levels through machine learning algorithms. As a deep learning architecture, CNN is a feed-forward artificial neural network in which individual neurons can respond to a preview scene stream input thereto.

As shown in fig. 4, convolutional Neural Network (CNN) 200 may include an input layer 210, a convolutional layer/pooling layer 220 (where the pooling layer is optional), and a fully-connected layer (fully connected layer) 230. The relevant contents of these layers are described in detail below.

Convolution layer/pooling layer 220:

convolution layer:

the convolution/pooling layer 220 as shown in fig. 4 may include layers as examples 221-226, for example: in one implementation, layer 221 is a convolutional layer, layer 222 is a pooling layer, layer 223 is a convolutional layer, layer 224 is a pooling layer, layer 225 is a convolutional layer, and layer 226 is a pooling layer; in another implementation, 221, 222 are convolutional layers, 223 are pooling layers, 224, 225 are convolutional layers, and 226 are pooling layers. I.e. the output of the convolution layer may be used as input to a subsequent pooling layer or as input to another convolution layer to continue the convolution operation.

The internal principle of operation of one convolution layer will be described below using the convolution layer 221 as an example.

The convolution layer 221 may include a plurality of convolution operators, also known as kernels, which function in image processing as a filter to extract specific information from the input image matrix, which may be a weight matrix in nature, which is typically predefined, and which is typically processed on the input image in a horizontal direction, pixel by pixel (or two pixels by two pixels … …, depending on the value of the step size stride), to accomplish the task of extracting specific features from the image. The size of the weight matrix should be related to the size of the image, and it should be noted that the depth dimension (depth dimension) of the weight matrix is the same as the depth dimension of the input image, and the weight matrix extends to the entire depth of the input image during the convolution operation. Thus, convolving with a single weight matrix produces a convolved output of a single depth dimension, but in most cases does not use a single weight matrix, but instead applies multiple weight matrices of the same size (row by column), i.e., multiple homography matrices. The outputs of each weight matrix are stacked to form the depth dimension of the convolved image, where the dimension is understood to be determined by the "multiple" as described above. Different weight matrices may be used to extract different features in the image, e.g., one weight matrix is used to extract image edge information, another weight matrix is used to extract a particular color of the image, yet another weight matrix is used to blur unwanted noise in the image, etc. The plurality of weight matrixes have the same size (row and column), the feature images extracted by the plurality of weight matrixes with the same size have the same size, and the extracted feature images with the same size are combined to form the output of convolution operation.

The weight values in the weight matrices are required to be obtained through a large amount of training in practical application, and each weight matrix formed by the weight values obtained through training can be used for extracting information from an input image, so that the convolutional neural network 200 can perform correct prediction.

When convolutional neural network 200 has multiple convolutional layers, the initial convolutional layer (e.g., 221) tends to extract more general features, which may also be referred to as low-level features; as the depth of the convolutional neural network 200 increases, features extracted by the later convolutional layers (e.g., 226) become more complex, such as features of high level semantics, which are more suitable for the problem to be solved.

Pooling layer:

since it is often desirable to reduce the number of training parameters, the convolutional layers often require periodic introduction of pooling layers, one convolutional layer followed by one pooling layer, or multiple convolutional layers followed by one or more pooling layers, as illustrated by layers 221-226 in FIG. 4, 220. The only purpose of the pooling layer during image processing is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain a smaller size image. The average pooling operator may calculate pixel values in the image over a particular range to produce an average as a result of the average pooling. The max pooling operator may take the pixel with the largest value in a particular range as the result of max pooling. In addition, just as the size of the weighting matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after the processing by the pooling layer can be smaller than the size of the image input to the pooling layer, and each pixel point in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.

Full connection layer 230:

after processing by the convolutional layer/pooling layer 220, the convolutional neural network 200 is not yet sufficient to output the desired output information. Because, as previously described, the convolution/pooling layer 220 will only extract features and reduce the parameters imposed by the input image. However, in order to generate the final output information (the required class information or other relevant information), convolutional neural network 200 needs to utilize fully-connected layer 230 to generate the output of the required number of classes or groups. Thus, multiple hidden layers (231, 232 through 23n as shown in fig. 4) may be included in the fully connected layer 230, and the output layer 240, where parameters included in the multiple hidden layers may be pre-trained based on relevant training data for a specific task type, e.g., such as image recognition, image classification, image super-resolution reconstruction, etc. … …

After the hidden layers in the fully connected layer 230, i.e., the final layer of the overall convolutional neural network 200 is the output layer 240, the output layer 240 has a class-cross entropy-like loss function, specifically for calculating the prediction error, once the forward propagation of the overall convolutional neural network 200 (e.g., propagation from 210 to 240 directions in fig. 4 is forward propagation) is completed, the backward propagation (e.g., propagation from 240 to 210 directions in fig. 4 is backward propagation) will begin to update the weight values and deviations of the aforementioned layers to reduce the loss of the convolutional neural network 200 and the error between the result output by the convolutional neural network 200 through the output layer and the ideal result.

It should be noted that the convolutional neural network 200 shown in fig. 4 is only an example of a convolutional neural network, and the convolutional neural network may also exist in the form of other network models in a specific application.

Fig. 5 is a chip hardware structure provided in an embodiment of the present application, where the chip includes a neural network processor 50. The chip may be provided in an execution device 110 as shown in fig. 3 for performing the calculation of the calculation module 111. The chip may also be provided in the training device 120 as shown in fig. 3 for completing the training work of the training device 120 and outputting the target model/rule 101. The algorithm of each layer in the convolutional neural network as shown in fig. 4 can be implemented in a chip as shown in fig. 5.

The neural Network Processor (NPU) 50 is mounted as a coprocessor to a main central processing unit (central processing unit, CPU) (host CPU) which distributes tasks. The NPU has a core part of an arithmetic circuit 503, and a controller 504 controls the arithmetic circuit 503 to extract data in a memory (weight memory or input memory) and perform arithmetic.

In some implementations, the arithmetic circuitry 503 internally includes a plurality of processing units (PEs). In some implementations, the operational circuitry 503 is a two-dimensional systolic array. The arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, the operation circuit 503 is a general-purpose matrix processor.

For example, assume that there is an input matrix a, a weight matrix B, and an output matrix C. The arithmetic circuit 503 takes the data corresponding to the matrix B from the weight memory 502 and buffers the data on each PE in the arithmetic circuit 503. The arithmetic circuit 503 performs matrix operation on the matrix a data and the matrix B data from the input memory 501, and the partial result or the final result of the matrix obtained is stored in an accumulator (accumulator) 508.

The vector calculation unit 507 may further process the output of the operation circuit 503, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like. For example, the vector calculation unit 507 may be used for network calculations of non-convolutional/non-FC layers in a neural network, such as pooling, batch normalization (batch normalization), local response normalization (local response normalization), and the like.

In some implementations, the vector computation unit 507 can store the vector of processed outputs to the unified buffer 506. For example, the vector calculation unit 507 may apply a nonlinear function to an output of the operation circuit 503, such as a vector of accumulated values, to generate an activation value. In some implementations, the vector calculation unit 507 generates a normalized value, a combined value, or both. In some implementations, the vector of processed outputs can be used as an activation input to the operational circuitry 503, for example for use in subsequent layers in a neural network.

The unified memory 506 is used for storing input data and output data.

The weight data is directly transferred to the input memory 501 and/or the unified memory 506 through the memory cell access controller 505 (direct memory access controller, DMAC), the weight data in the external memory is stored in the weight memory 502, and the data in the unified memory 506 is stored in the external memory.

A bus interface unit (bus interface unit, BIU) 510 for interfacing between the main CPU, DMAC and finger memory 509 via a bus.

An instruction fetch memory (instruction fetch buffer) 509 connected to the controller 504 for storing instructions used by the controller 504;

and a controller 504 for calling the instruction cached in the instruction memory 509 to control the operation of the operation accelerator.

Typically, the unified memory 506, the input memory 501, the weight memory 502, and the finger memory 509 are on-chip (on-chip) memories, and the external memory is a memory external to the NPU, and the external memory may be a double data rate synchronous dynamic random access memory (double data rate synchronous dynamic random access memory, abbreviated as DDR SDRAM), a high bandwidth memory (high bandwidth memory, HBM), or other readable and writable memory.

In addition, in this application, the operations of the respective layers in the convolutional neural network shown in fig. 4 may be performed by the operation circuit 503 or the vector calculation unit 507.

The following describes a method for generating a photographing mode according to an embodiment of the present application in detail with reference to the accompanying drawings. The photographing mode generation method of the embodiment of the present application may be executed by the execution device 110 in fig. 3.

Fig. 6 is an exemplary diagram of a method for generating a photographing mode according to an embodiment of the present application. The method of fig. 6 may be performed by the devices of fig. 3 to 5, some or all of which may be integrated in the terminal device. As shown in fig. 6, the method 600 includes steps S610, S620, and S630, which are described in detail below.

S610, obtaining a preview stream of the scene.

It will be appreciated that the preview stream of the scene may be captured by a camera of the terminal device. For example, previewing a scene through a camera of a cell phone or other intelligent terminal automatically captures a preview stream of the scene. Optionally, the camera may be a front camera or a rear camera. Optionally, the camera may acquire a preview stream of the scene from the actual scene, or may preview the image in the memory to acquire the preview stream.

It should be appreciated that the preview stream of a scene should contain information to preview the scene. For example, people, things, scenery, lights, environmental parameters, geographical location information, time parameters, etc. in the preview scene are not limited.

Optionally, features of the scene may also be extracted after the preview stream of the scene is acquired. Alternatively, the extracted features may be objects (people, things, scenes, etc.) in the preview scene; the time of preview may be, for example, day or night, spring and summer or autumn and winter, early morning or evening; an environment that previews the scene, e.g., windy, rainy, etc.; but also natural features that can be intuitively perceived, such as color features, texture features, shape features, spatial relationship features, etc. The specific extracted features may be one or more of the above and are not limited to the examples set forth above. It should be understood that there are different extraction modes for different features, which are not further limited in this application.

S620, predicting at least one algorithm function and functional parameters of the at least one algorithm function applied to the scene from a plurality of algorithm functions according to the preview stream of the scene, wherein the algorithm functions are used for performing algorithm processing of images or videos on the preview stream.

It should be appreciated that the prediction process may be based on a predictive model. The predictive model may be trained in advance, for example, by the system architecture of FIG. 3; and model updating can be performed according to the result of actual photographing. The trained prediction model may be built in the terminal device executing the method of fig. 6, or may be stored outside the terminal device, for example, in the cloud. When the terminal equipment needs to predict the photographing mode, exchanging data required by prediction, such as scene characteristics, model parameters, photographing mode parameters and the like, with the cloud end, so that real-time prediction and assembly of the photographing mode are realized. The cloud end can also collect big data information such as the selection result of the shooting mode of the specific scene by a plurality of users so as to update the prediction model and realize continuous optimization of the prediction model.

In one implementation, the predictive model performs the predictive operation after obtaining a preview stream of the scene. The predicting operation may include predicting whether each of the algorithm functions is on, and further, predicting whether each of the function parameters of the algorithm functions is on. The predicted results (e.g., which algorithmic functions are turned on, or which algorithmic functions and which functional parameters of those algorithmic functions are turned on) are candidates for assembling the photography mode. In other words, when the final recommended photographing mode is obtained by assembling, all the predicted algorithm functions and the function parameters thereof may be used, or only part of the predicted algorithm functions or part of the predicted function parameters may be used.

Specifically, the predictive model may determine a confidence level for each of the plurality of algorithmic functions and corresponding functional parameters, and predict at least one algorithmic function and the functional parameters of the at least one algorithmic function applied to the scene based on the confidence level.

Alternatively, the above-described turn-on confidence levels may correspond to the algorithm functions and the function parameters, respectively, in other words, each algorithm function and each function parameter has a corresponding confidence level. At this time, in determining the confidence of each of the plurality of algorithm functions and the corresponding function parameter, it may be to determine a first confidence of each algorithm function and a second confidence of each function parameter of each algorithm function. Further, in this embodiment, it may be determined whether to use the algorithm function as an assembly candidate of the photographing mode (i.e., output as part of the prediction model) according to the first confidence coefficient of the algorithm function, and then determine which functional parameters of the algorithm function and their values are used as assembly candidates of the photographing mode according to the second confidence coefficient of each functional parameter of the algorithm function (i.e., output as another part of the prediction model). The hierarchical prediction mode is relatively simple and rapid in algorithm, suitable for equipment with relatively low processing capacity, and beneficial to reducing system power consumption.

As another example, it may also be that a confidence level of a combination of algorithm functions and functional parameters is determined. Specifically, the algorithm function and the function parameter may be taken as a combination to predict whether the combination is taken as an assembly candidate of the photographing mode. For example, assuming that there are algorithm functions a and B, where algorithm function a has function parameters a1 and a2 (a 1 and a2 may be different types of function parameters or different values of the same function parameter), and algorithm function B has function parameters B1 and B2, four combinations { a, a1}, { a, a2}, { B, B1} and { B, B2} may be formed; the prediction model may calculate the confidence levels corresponding to the four combinations, respectively, so that the combination with the higher confidence level is used as an assembly candidate of the photographing mode, that is, as an output of the prediction model. The above combinations may be cross-algorithm functions or cross-function parameters, and combinations such as { A, B, a1, B1} or { A, a1, a2} may be formed. The prediction mode of the confidence coefficient of the combination can consider the mutual influence among various factors in the combination more, so that an overall better prediction result is obtained, and the prediction mode is suitable for equipment with higher processing capacity.

In another implementation, the opening prediction of the algorithm function and/or the function parameter may also be performed by a cloud matching method.

Specifically, firstly, the photos meeting the requirements (for example, the aesthetic score is higher than a certain threshold value, or other judging standards reach a certain threshold value) and the algorithm functions and functional parameters used for photographing can be stored in the cloud. At this time, when the terminal device performs photographing preview, the features of the preview stream of the scene to be photographed are extracted, and the extracted features are sent to the cloud end to match with the stored photos of the cloud end, so that the most matched photo is selected, and the algorithm function and the function parameter used by the matched photo are correspondingly started. By the method, the number of times of function prediction can be reduced, and the generation process of optimizing the photographing mode is accelerated.

Optionally, after the algorithm function and the function parameter used by the photo with the best matching cloud are started, the starting confidence of the rest algorithm function and the function parameter in the terminal camera can be predicted through a prediction model, so that multi-angle prediction judgment is realized, and an optimized shooting mode is obtained.

Optionally, the algorithm function may be at least one of HDR, image blurring, skin care, light effect, beauty, large aperture function, filter, photosensitivity, streamer shutter, and color function, which are not specifically limited herein, and any algorithm function that can be applied in photographing falls within the protection scope of the present application.

It should be understood that the functional parameters are parameters built into the algorithm function. For example, for the beauty function, the function parameter may be a face thinning degree, a beauty grade, a large eye grade, a skin polishing degree, etc., and for the other algorithm functions, the function parameter may also be a large aperture value, a light efficiency type, etc., which are not limited herein.

S630, assembling the predicted at least one algorithm function and the function parameters of the at least one algorithm function to generate a photographing mode for the current scene.

The prediction result of step S620 (e.g., which algorithm functions are turned on, or which algorithm functions and which function parameters of these algorithm functions are turned on) is used as candidate data for assembling the photographing mode. In other words, in the assembly processing in step S630, all the algorithm functions and the functional parameters thereof obtained by the prediction processing in S620 may be used, or only a part of the algorithm functions or a part of the functional parameters obtained by the prediction processing in S620 may be used.

For example, as one embodiment, the at least one algorithm function and the functional parameters of the at least one algorithm function may be assembled according to mutually exclusive information of the algorithm function and/or the functional parameters. A specific implementation will be described in detail below in connection with the embodiment of fig. 7.

And combining the openable algorithm function and the functional parameter of the algorithm function with the mutual exclusion condition of the algorithm function and the functional parameter to form a final photographing mode. The final photographing mode is a photographing mode which can achieve a certain effect for the current scene, so that the photographing mode can be recommended to the user, for example, the assembled photographing mode is displayed to the user on a preview interface of the terminal equipment. Of course, the assembly scheme of the photographing modes may not be just a single one, but may provide different photographing modes obtained in a plurality of reference dimensions. For example, for the same scene with both people and mountains, a first photographing mode with more emphasis on the portrait (for example, selecting those algorithm functions and function parameters with more specific gravity of portrait image processing in the predicted algorithm functions for assembly, the mode may not be named as a portrait mode) may be assembled, or a second photographing mode with emphasis on the landscape (for example, selecting those algorithm functions and function parameters with more specific gravity of landscape image processing in the predicted algorithm functions for assembly, the mode may not be named as a landscape mode) may be assembled at the same time, and the portrait mode and the landscape mode may be simultaneously recommended to the user on the preview interface of the terminal device, for example, in the form of icons or drop-down menus for selection by the user according to preferences.

The embodiment of the application provides an intelligent and automatic photographing mode generation method. In the scheme, the scene is not required to be manually divided, the photographing mode is not required to be defined, the user's professional and experience are not required, and the user does not need to consider the influence of each algorithm function and function parameter on the photographing effect, so that the user operation is simplified, and the intelligence and automation of photographing mode generation are improved.

Fig. 7 is an exemplary diagram of another scheme for generating a photographing mode according to an embodiment of the present application. As an embodiment, as shown in fig. 7, after obtaining a preview stream of a scene, a prediction model predicts the turn-on of each algorithm function while predicting the function parameters of the algorithm function (i.e., the above-described hierarchical prediction scheme). And after the prediction is completed, assembling the function and the functional parameter of the algorithm which are started by the prediction, and generating an optimized photographing mode. After the photographing mode is obtained, processing such as mode preview and the like can be started on a user interface, and actual photographing and processing processes are performed according to the selection of a user.

It should be understood that fig. 7 represents only one implementation. Alternatively, the prediction model may be a single-task prediction model or a multi-task prediction model. The single-task prediction model predicts only a certain fixed algorithm function or function parameter, that is, each algorithm function or function parameter can have a single-task prediction model to predict whether the single-task prediction model is started or not, for example, a two-classifier or a neural network model; the multi-tasking prediction model may predict multiple algorithmic functions, alternatively, the multi-tasking prediction model may predict multiple different algorithmic functions and functional parameters simultaneously, or may predict them sequentially, such as the neural network model described in fig. 3-5. It should be understood that, whether to start a certain algorithm function and a function parameter can be predicted by a single task prediction model corresponding to the certain algorithm function, or can be predicted by a multi-task prediction model. By way of example, if there are 3 kinds of algorithm functions among the camera functions of the terminal device, the 3 kinds of algorithm functions may be predicted using 1 multitasking prediction model, or the 3 kinds of algorithm functions may be respectively predicted using 3 single-tasking prediction models, which is not limited herein.

Alternatively, the prediction model may be a plurality of two classifiers, where when the two classifiers output 1, it indicates that a certain function or function parameter is turned on, and when the two classifiers output 0, it indicates that the current preview scene does not need to turn on a certain function or function parameter. It should be appreciated that the output 1 or 0 is only one way to distinguish between on and off by artificial definition, and is not limited in this implementation.

Optionally, a certain two classifiers may respectively give a first opening confidence coefficient of a certain algorithm function, when the first opening confidence coefficient is greater than a certain threshold value, the two classifiers output 1 to indicate that a certain algorithm function is opened, or give a second opening confidence coefficient of a certain function parameter of a certain algorithm function, when the second opening confidence coefficient is greater than a certain threshold value, the two classifiers output 1 to indicate that a certain function parameter of a certain algorithm function is opened; when the opening confidence is smaller than a certain threshold, the two classifiers output 0, which indicates that a certain algorithm function or function parameter is not opened. Alternatively, the classifier may also give the turn-on confidence of the combination of the algorithm function and the function parameter, and when the turn-on confidence of the combination is greater than a certain threshold, the classifier outputs 1, which indicates that a certain algorithm function is turned on and a certain function parameter of a certain algorithm function. It should be understood that the algorithm function and the opening of the parameters of the algorithm function may be predicted by the same two classifiers, or may be predicted by different two classifiers, which is not limited. It should be understood that the same classifier may also predict the opening of a certain algorithm function and various functional parameters corresponding to the certain algorithm function at the same time.

Optionally, the prediction model may also be a neural network model, where the neural network model predicts the opening confidence degrees of multiple algorithm functions and function parameters according to the scene preview flow, and when the opening confidence degree of a certain algorithm function or function parameter is greater than a certain threshold value, it indicates that a certain algorithm function or function parameter is opened; when the opening confidence is smaller than a certain threshold value, the method indicates that a certain algorithm function or function parameter is not opened.

It should be appreciated that after the starting confidence of the algorithm function and the function parameter is predicted, the algorithm function and the function parameter which are predicted to be started may be ranked according to the starting confidence, and assembled according to the mutual exclusion condition of the algorithm function and the function parameter.

Illustratively, the predictive model gives each algorithm function on confidence, when the on confidence is greater than 0.5, the classifier outputs 1, indicating that the algorithm function is on; when the confidence is less than 0.5, the classifier outputs 0, indicating that the algorithm function is not on in the current scene. The open algorithm functions are then ranked according to the open confidence level from high to low. Because the computing capacity and photographing effect of the device are limited, some algorithm functions cannot be started at the same time, so that the algorithm functions predicted to be started need to be assembled, and therefore mutual exclusion situations of the algorithm functions need to be known in advance.

Table 1 functional mutually exclusive schematic table of algorithm

Algorithm function

1

2

3

4

5

6

7

1

Y

N

Y

N

2

Y

N

Y

N

Y

N

3

N

Y

N

4

Y

N

Y

5

N

Y

N

6

N

Y

N

Y

7

N

Y

N

Y

Illustratively, if the starting confidence predicts that three functions of beauty, large aperture and HDR can be started, the confidence is 0.9, 0.8 and 0.7 respectively, and mutual exclusion exists between the beauty and the HDR according to the mutual exclusion table, the beauty function can be selected to be used when the photographing mode is assembled, and the HDR function is not used.

The above mutual exclusion table is mainly given for the example of the algorithm function, it being understood that it is equally applicable in the prediction and assembly of functional parameters.

As another embodiment, the function start prediction may also be performed by cloud matching. Specifically, extracting features of a preview stream of a scene to be photographed, matching the extracted features with stored photos of a cloud, selecting a photo with the best matching, and correspondingly starting algorithm functions and function parameters used by the photo with the best matching of the cloud.

It should be understood that to achieve cloud matching, the best photos and the algorithm functions and function parameters used by the photos are first saved in the cloud. Optionally, an aesthetic scoring mechanism implemented by manual processing or a machine may be used to score a large number of photos taken by an existing user, and when the score is higher than a certain threshold, save the photos in the cloud, and save the algorithm functions and function parameters used in the shooting; when the score is below a certain threshold, the photo is not saved. Optionally, a labeling method may be used to label as many preview photos obtained under various shooting scenes as possible, by labeling people with professional backgrounds or good at shooting, or by intelligently labeling through a neural network algorithm according to given parameters and standards, which algorithm functions or function parameters are started under the scenes, and the preview photos and the algorithm functions and function parameters, which need to be started, of the labels are stored in the cloud.

In the embodiment of the application, the algorithm function of the preview picture or the video data with professional background or good shooting by the user or the picture with the score higher than a certain threshold value and the functional parameters of the algorithm function are used as the training algorithm function and the functional parameters of the training algorithm function, so that the quality of shooting mode generation can be improved, and the optimization of the shooting mode is realized.

It should be appreciated that when using this implementation, a matching mechanism for photos may also be provided. Specifically, the feature extraction of the scene is performed on the preview scene, then the preview scene is matched with the cloud photo to obtain the most similar photo, and it is understood that when the similarity is greater than a certain threshold, the best algorithm function and function parameter of the matched photo are used to assemble a shooting mode.

Optionally, when the similarity is greater than a certain threshold, the photographing algorithm function and the functional parameter used by the best matched photo may be started first, and then the remaining algorithm function and functional parameter configured in the camera are predicted through the prediction model.

Specifically, after the algorithm function and the function parameter used by the cloud optimization matched photo are started, the starting confidence of the rest algorithm function and the function parameter in the terminal equipment camera are predicted through a prediction model, and the algorithm function is automatically combined into an optimized photographing mode according to the starting confidence of the algorithm function and the mutual exclusion condition of the algorithm function. Optionally, when the algorithm function and the function parameter matched through the cloud are mutually exclusive with the algorithm function and the function parameter predicted and started by the prediction model, the cloud matching can be combined, and the prediction model prediction is not combined; the method can also predict the algorithm function with mutual exclusion in cloud matching by a prediction model, and determine which is started according to the starting confidence of the prediction.

It should be further understood that when the similarity of the best matching photos is lower than a certain threshold, it means that the scene to be shot has no similar scene in the cloud, the prediction can be performed again through the prediction model, and the predicted and combined optimized shooting mode and the shot photos are uploaded to the cloud for storage, so that the use of other users is facilitated.

In the embodiment of the application, the number of times of function prediction can be reduced in a cloud matching mode, and the generation process of the optimization mode is accelerated.

It should be understood that, in the embodiment of the present application, after obtaining the recommended photographing mode, the user may start processing such as mode preview on the user interface, and then the terminal performs actual photographing and processing according to the selection of the user.

Fig. 8 is a diagram of an example user interface for photographing mode generation, as one embodiment. After a user opens a camera application of the mobile phone, a camera first captures a scene to be photographed or recorded, such as a portrait scene in fig. 8. On the preview interface, a button to "start the intelligent photographing mode assembly" is presented to the user. If the user wants to take a picture using the recommended photographing mode of the embodiment of the present application, the button may be clicked.

Optionally, after the user clicks the button (i.e. the user selects to start the intelligent photographing mode assembly function), the user interface may display the algorithm function in the recommended photographing mode for the current scene, for example, in the portrait scene in fig. 8, the recommended photographing mode is assembled by three algorithm functions of portrait blurring, beauty and HDR.

Further alternatively, the user may also view on the user interface the function parameters used by the algorithm functions in the recommended mode. For example, if the user clicks the "portrait blurring" button, the user may be presented with specific function parameters used in the "portrait blurring" algorithm function in a drop-down menu.

In one embodiment, when the user selects the recommended photographing mode and clicks to photograph, a picture corrected by the recommended mode is obtained. In another embodiment, after the user selects to start the intelligent photographing mode assembly function, the preview stream can be processed in real time to present the preview stream processed by the recommended photographing mode, and then the picture or video at the moment is recorded according to the photographing operation of the user, so that more real-time experience can be provided for the user.

As another example, fig. 9 is a diagram of another user interface example of photographing mode generation. In fig. 9, the preview stream presented to the user is a preview stream modified according to the intelligently assembled photographing mode, at this time, a prompt message may be presented on the user interface: "has the smart photo mode assembly function turned on, does this function need to be turned off? "and buttons" yes "and" no ".

Alternatively, if the user wants to use the modified preview scene, the user may click "no" or may not perform any operation, and take a picture or record, and then obtain a picture/video modified by the recommended mode. If the user does not want to use the modified preview scene, the user can click "yes" to restore to the unprocessed original preview stream and take a picture or record. It should be understood that the above user interface is merely an example, and other words may be selected for a button or prompt in the user interface, so long as the user can be reminded to click a certain button to turn on/off the recommended photographing mode, and the embodiment of the present application is not limited specifically.

It should be understood that in the embodiment of the present application, after the recommended photographing mode is obtained, the user may decide whether to turn on/off the recommended photographing mode according to his own needs; or after the user selects the recommended photographing mode, the user equipment performs intelligent assembly to obtain the recommended photographing mode, and the sequence is not limited.

In the embodiments of fig. 8 and 9, a plurality of recommended photographing modes may also be provided to the user at the same time for the user to select or switch, and the preview stream corrected by the current photographing mode may be presented on the screen in real time according to the result of the user selection or switch. Thus, the user can conveniently select a more satisfactory photographing mode. Further optionally, the user may collect and process the photographing mode by the cloud end, and use the photographing mode as data input for training the prediction model, so as to continuously optimize the prediction model.

Fig. 10 is an exemplary diagram 1000 of a training method for a predictive model according to an embodiment of the present application. The training method 1000 may be specifically performed by the training device 120 shown in fig. 3, the data set in the method 1000 may be training data maintained in the database 130 shown in fig. 3, and optionally, the method 1000 may be performed in the training device 120, or may be performed in advance by other functional modules before the training device 120, that is, the training data received or obtained from the database 130 is preprocessed.

Alternatively, the method 1000 may be processed by a CPU, or may be processed by both the CPU and the GPU, or may not use the GPU, and other suitable processors for neural network computing may be used, which is not limited in this application.

As shown in fig. 10, in the embodiment of the present application, the data set is training data, including training pictures or training video data, training algorithm functions corresponding to the training pictures, and functional parameters of the training algorithm functions.

It should be appreciated that the training data may be generated in a variety of ways. For example, it may be to obtain as many preview pictures or video data as possible in various shooting scenes, and then find labels for people who have professional backgrounds or who are good at shooting, and label the objects as to which algorithm functions and function parameters are started in the scenes. The method can also score a plurality of photos shot by users through an aesthetic scoring algorithm, and record the algorithm functions and the function parameters of the photos with high scores.

Alternatively, in embodiments of the present application, the predictive model may be a single-task predictive model that predicts for a single algorithmic function or functional parameter. Illustratively, the prediction model may be a classifier trained for each algorithm function or function parameter, or a neural network trained for each algorithm function or function parameter, for which the preview stream may be simply pre-processed (e.g., normalized) and then trained; or classifying or regressing the preview flow by using a support vector machine, a random forest, a nearest neighbor algorithm and the like after extracting the characteristics.

Alternatively, the predictive model may be a multi-tasking predictive model that predicts multiple algorithmic functions and functional parameters simultaneously. For example, the neural network model may predict and train multiple algorithmic functions and functional parameters simultaneously.

It should be appreciated that the functional attributes in fig. 10 are predicted algorithm functions and function parameters.

It should be appreciated that predictive models are typically trained to maximize the approach of the predicted value to the true value, and the goal of the training is to minimize the cross entropy loss of the predicted value and the true value.

It should be appreciated that after the prediction model is trained, it is also necessary to detect the prediction model, specifically, input a training picture, and compare the predicted algorithm function and function parameter with the loss value of the trained algorithm function and function parameter.

Fig. 11 is an exemplary diagram 1100 of a photographing mode generation apparatus provided in an embodiment of the present application. The apparatus 1100 may be implemented as a terminal device, for example, with a camera and image processing capabilities, or as any other suitable form of device. As shown in fig. 11, the apparatus 1100 includes an acquisition module 1110, a prediction module 1120, and an assembly module 1130. The apparatus 1100 may implement the various processes of the various method embodiments previously described and will not be described in detail to avoid repetition.

The acquisition module 1110 is configured to acquire a preview stream of a scene. The prediction module 1120 is configured to predict, from a preview stream of the scene, at least one algorithm function and a function parameter of the at least one algorithm function, where the algorithm function is used for performing an algorithm process of an image or a video on the preview stream, from a plurality of algorithm functions. The assembling module 1130 is configured to assemble the predicted at least one algorithm function and the functional parameters of the at least one algorithm function to generate a photographing mode for the scene.

The prediction module 1120 may also be configured to determine a confidence level for each of the plurality of algorithmic functions and corresponding functional parameters; predicting at least one algorithm function and a functional parameter of the at least one algorithm function applied to the scene according to the confidence level.

Specifically, as one embodiment, the prediction module 1120 may be configured to determine a first confidence level for each of the algorithm functions and a second confidence level for each of the function parameters of each of the algorithm functions. Alternatively, as another example, the prediction module 1120 may determine a confidence level for the combination of the algorithm function and the functional parameter.

It should be appreciated that the assembly module 1130 may assemble the at least one algorithm function and the functional parameters of the at least one algorithm function based on mutually exclusive information of the algorithm function and the functional parameters.

Optionally, the apparatus 1100 may further comprise a training module 1140 to train a prediction model for making the prediction. Of course, the training module 1140 may not be located in the same hardware device as the prediction module 1120, for example, the prediction module 1120 may be implemented in a terminal device, the training module 1140 may be implemented in a cloud, and the trained training model may be input into the terminal device in advance, or the training model in the terminal device may be updated online.

Optionally, the prediction model is pre-trained according to training data, wherein the training data comprises training picture/video data, training algorithm functions and functional parameters of the training algorithm functions.

Alternatively, the training picture/video data may be preview picture or video data obtained in a shooting scene, the preview picture or video data is labeled by a user with a professional background or good at shooting or an intelligent labeling system (such as an intelligent system based on a neural network or other algorithms), and the labeled algorithm function and the functional parameters of the algorithm function are used as the training algorithm function and the functional parameters of the training algorithm function.

Optionally, the training picture may be a photographed picture, the photographed picture is scored, and an algorithm function of the picture with a score higher than a first threshold and a function parameter of the algorithm function are used as a training algorithm function and a function parameter of the training algorithm function.

It should be understood that the predictive model is a classifier or neural network model trained for each algorithm function and the functional parameters of the algorithm function, and may also be a neural network model trained for a plurality of algorithm functions and the functional parameters of the algorithm function.

Optionally, the acquiring module 1110 may also acquire the features of the previewed scene.

Optionally, the apparatus 1100 may further include a matching module 1150, according to the characteristics of the scene, for matching the scene with the pictures stored in the cloud, so as to find the pictures matching the scene from the pictures stored in the cloud. At this time, the cloud end can store the algorithm function and the function parameters thereof corresponding to each picture. The matching module 1150 may send the information of the features of the scene to the cloud end, so that the cloud end may perform the operation of matching the pictures; the most common logo (e.g., portrait, landscape, night view) and the corresponding algorithm function and function parameters of the logo may also be pre-downloaded from the cloud to perform the matching operation locally on the device 1100, which may provide matching efficiency.

The obtaining module 1110 may be further configured to obtain an algorithm function and a function parameter of the algorithm function, where the algorithm function and the function parameter are stored in the cloud and correspond to a picture of the scene.

The prediction module 1120 may use the algorithm function and the function parameter of the algorithm function corresponding to the picture of the scene obtained by the obtaining module 1100 as the predicted at least one algorithm function and the function parameter of the at least one algorithm function. For example, the preview scene may be matched with the picture stored in the cloud, and if the matching value is higher than a certain threshold, the photographing algorithm function of the picture stored in the cloud and the functional parameters of the algorithm function are started.

It should be appreciated that the pictures stored by the cloud may be pictures with aesthetic scores above a second threshold.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.

The method in the embodiments of the present application, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium, and based on such understanding, the technical solution or part of the technical solution of the present application may be embodied in the form of a software product stored in a storage medium, where the computer software product includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application. The storage medium includes at least: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for generating a photographing mode, comprising:

acquiring a preview stream of a scene;

predicting at least one algorithm function and functional parameters of the at least one algorithm function applied to the scene from a plurality of algorithm functions according to a preview stream of the scene, wherein the algorithm functions are used for performing algorithm processing of images or videos on the preview stream;

assembling the predicted at least one algorithmic function and functional parameters of the at least one algorithmic function to generate a photographing mode for the scene;

recommending the photographing mode;

and according to the selection of the user, determining to take a picture by using the photographing mode.

2. The method of claim 1, wherein predicting at least one algorithmic function and a functional parameter of the at least one algorithmic function applied to the scene from among a plurality of algorithmic functions based on a preview stream of the scene comprises:

Determining a confidence level of each of the plurality of algorithm functions and the corresponding function parameter;

predicting at least one algorithm function and a functional parameter of the at least one algorithm function applied to the scene according to the confidence level.

3. The method of claim 2, wherein determining the confidence level for each of the plurality of algorithmic functions and corresponding functional parameters comprises:

determining a first confidence level of each algorithm function and a second confidence level of each function parameter of each algorithm function; or,

confidence in the combination of the algorithm function and the functional parameter is determined.

4. A method according to any one of claims 1 to 3, wherein said assembling the predicted at least one algorithmic function and the functional parameters of the at least one algorithmic function to generate a photographing mode for the scene comprises:

and assembling the at least one algorithm function and the functional parameters of the at least one algorithm function according to mutual exclusion information of the algorithm function and the functional parameters.

5. A method according to any one of claims 1 to 3, further comprising:

A prediction model for making the prediction is trained.

6. The method of claim 5, wherein the predictive model is a classifier or neural network model trained for each algorithmic function and functional parameters of the algorithmic function.

7. The method of claim 5, wherein the predictive model is a neural network model trained on a plurality of algorithmic functions and functional parameters of the algorithmic functions.

8. A method according to any one of claims 1 to 3, wherein the acquiring a preview stream of a scene further comprises: and acquiring the characteristics of the scene from the preview stream of the scene.

9. The method of claim 8, further comprising, prior to predicting at least one algorithmic function and a functional parameter of the at least one algorithmic function applied to the scene from a plurality of algorithmic functions based on a preview stream of the scene:

matching the scene with the pictures stored in the cloud according to the characteristics of the scene so as to search the pictures matched with the scene from the pictures stored in the cloud;

and taking the algorithm function corresponding to the picture matched with the scene and the function parameter of the algorithm function stored in the cloud as the predicted at least one algorithm function and the function parameter of the at least one algorithm function.

10. A computer readable storage medium, characterized in that the computer readable storage medium stores a program code comprising instructions for performing part or all of the steps of the method according to any of claims 1-9.