CN111797854B

CN111797854B - Scene model building method and device, storage medium and electronic equipment

Info

Publication number: CN111797854B
Application number: CN201910282033.2A
Authority: CN
Inventors: 何明; 陈仲铭; 黄粟; 刘耀勇; 陈岩
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-04-09
Filing date: 2019-04-09
Publication date: 2023-12-15
Anticipated expiration: 2039-04-09
Also published as: CN111797854A

Abstract

The application discloses a scene model building method, a device, a storage medium and electronic equipment, wherein the scene model building method comprises the following steps: acquiring perception data of a scene; training at least two sub-models based on an optimization objective function and the perception data to obtain the optimization objective function values of the at least two sub-models, wherein the optimization objective function corresponding to the sub-models comprises at least two evaluation indexes; and when the optimized objective function value of the at least two sub-models meets a preset condition, weighting the at least two sub-models to construct a scene model. In the embodiment of the application, the electronic equipment takes an optimization objective function comprising at least two evaluation indexes as a sub-model training target to construct a sub-model. Therefore, when the electronic equipment performs intelligent operation, the established scene model can also meet evaluation indexes of multiple dimensions, and the recognition range and recognition accuracy of the scene model can be improved.

Description

Scene model building method and device, storage medium and electronic equipment

Technical Field

The present application relates to the field of communications technologies, and in particular, to a method and apparatus for setting up a scene model, a storage medium, and an electronic device.

Background

With the development of electronic technology, electronic devices such as smartphones are becoming more and more intelligent. The electronic device may perform data processing through a variety of algorithmic models to provide various functions to the user. For example, the electronic device may learn behavior features of the user according to an algorithmic model to provide personalized services to the user.

At present, the recognition model in the related technology has single optimization target in the training and learning process, only one dimension of optimization target is considered, a plurality of optimization targets are difficult to consider, the recognition range of the scene model is limited, and the recognition accuracy of the scene model is limited.

Disclosure of Invention

The embodiment of the application provides a scene model building method, a device, a storage medium and electronic equipment, which can improve the recognition range and recognition accuracy of a scene model.

In a first aspect, an embodiment of the present application provides a method for establishing a scene model, including:

acquiring perception data of a scene;

training at least two sub-models based on an optimization objective function and the perception data to obtain the optimization objective function values of the at least two sub-models, wherein the optimization objective function corresponding to the sub-models comprises at least two evaluation indexes;

And when the optimized objective function value of the at least two sub-models meets a preset condition, weighting the at least two sub-models to construct a scene model.

In a second aspect, an embodiment of the present application provides a scene recognition method, including:

receiving a scene recognition request, wherein the scene recognition request comprises a task identifier and data to be recognized;

matching the task identifier with a preset mapping table to obtain an optimized objective function of scene identification, wherein the mapping table comprises the relation between the task identifier and the optimized objective function;

and carrying out scene recognition on the data to be recognized by utilizing a scene model according to the optimized objective function of scene recognition, wherein the scene model is constructed by the scene model building method.

In a third aspect, an embodiment of the present application provides a scene model building apparatus, including:

the acquisition module is used for acquiring the perceived data of the scene;

the training module is used for training at least two sub-models based on the optimization objective function and the perception data to obtain the optimization objective function values of the at least two sub-models, wherein the optimization objective function corresponding to the sub-models comprises at least two evaluation indexes;

And the weighting processing module is used for carrying out weighting processing on the at least two sub-models to construct a scene model when the optimized objective function value of the at least two sub-models meets a preset condition.

In a fourth aspect, an embodiment of the present application provides a scene recognition apparatus, including:

the receiving module is used for receiving a scene recognition request, wherein the scene recognition request comprises a task identifier and data to be recognized;

the matching module is used for matching the task identifier with a preset mapping table to obtain an optimized objective function of scene identification, wherein the mapping table comprises a relation between the task identifier and the optimized objective function;

the identification module is used for carrying out scene identification on the data to be identified by utilizing a scene model according to the optimized objective function of scene identification, wherein the scene model is constructed by the scene model establishment method.

In a fifth aspect, an embodiment of the present application provides a storage medium having stored thereon a computer program which, when run on a computer, causes the computer to perform the above-described scene model building method or scene recognition method.

In a sixth aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory has a computer program, and the processor is configured to execute the above-mentioned scene model building method or scene recognition method by calling the computer program.

In the embodiment of the application, after the perceived data of the scene is acquired, the electronic device can train at least two sub-models based on the optimized objective function and the perceived data, wherein the optimized objective function corresponding to the sub-models comprises at least two evaluation indexes, and then the at least two sub-models are weighted to construct the scene model. Therefore, when the electronic equipment performs intelligent operation, the established scene model can meet evaluation indexes of multiple dimensions, and the recognition range and recognition accuracy of the scene model can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a first application scenario of a scenario model building method according to an embodiment of the present application.

Fig. 2 is a schematic diagram of a second application scenario of the scenario model establishment method provided in the embodiment of the present application

Fig. 3 is a schematic structural diagram of a scene model according to an embodiment of the present application.

Fig. 4 is a first flow chart of a scene model building method according to an embodiment of the present application.

Fig. 5 is a second flow chart of a scene model building method according to an embodiment of the present application.

Fig. 6 is a third flow chart of a scene model building method according to an embodiment of the present application.

Fig. 7 is a fourth flowchart of a scene model building method according to an embodiment of the present application.

Fig. 8 is a first structural schematic diagram of a scene model building apparatus according to an embodiment of the present application.

Fig. 9 is a second schematic structural diagram of a scene model building apparatus according to an embodiment of the present application.

Fig. 10 is a third schematic structural diagram of a scene model building apparatus according to an embodiment of the present application.

Fig. 11 is a fourth schematic structural diagram of a scene model building apparatus according to an embodiment of the present application.

Fig. 12 is a flowchart of a scene recognition method according to an embodiment of the present application.

Fig. 13 is a schematic structural diagram of a scene recognition device according to an embodiment of the present application.

Fig. 14 is a schematic first structural diagram of an electronic device according to an embodiment of the present application.

Fig. 15 is a second schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by a person skilled in the art without any inventive effort, are intended to be within the scope of the present application based on the embodiments of the present application.

Referring to fig. 1, fig. 1 is a schematic diagram of a first application scenario of a scenario model building method according to an embodiment of the present application. The scene model building method is applied to the electronic equipment. The electronic equipment is provided with a scene perception architecture. The scene perception architecture is the integration of hardware and software in the electronic device for implementing the scene model building method.

The scene perception architecture comprises an information perception layer, a data processing layer, a feature extraction layer, a scene modeling layer and an intelligent service layer.

The information sensing layer is used for acquiring information of the electronic equipment or information in an external environment. The information sensing layer may include a plurality of sensors. For example, the information sensing layer includes a plurality of sensors such as a distance sensor, a magnetic field sensor, a light sensor, an acceleration sensor, a fingerprint sensor, a hall sensor, a position sensor, a gyroscope, an inertial sensor, a posture sensor, a barometer, a heart rate sensor, and the like. Wherein the distance sensor may be used to detect a distance between the electronic device and an external object. The magnetic field sensor may be used to detect magnetic field information of an environment in which the electronic device is located. The light sensor may be used to detect light information of an environment in which the electronic device is located. The acceleration sensor may be used to detect acceleration data of the electronic device. The fingerprint sensor may be used to collect fingerprint information of a user. The Hall sensor is a magnetic field sensor manufactured according to the Hall effect and can be used for realizing automatic control of electronic equipment. The location sensor may be used to detect the geographic location where the electronic device is currently located. Gyroscopes may be used to detect angular velocities of an electronic device in various directions. Inertial sensors may be used to detect motion data of the electronic device. The gesture sensor may be used to sense gesture information of the electronic device. Barometers may be used to detect the air pressure of an environment in which an electronic device is located. The heart rate sensor may be used to detect heart rate information of the user.

The data processing layer is used for processing the data acquired by the information sensing layer. For example, the data processing layer may perform data cleaning, data integration, data transformation, data reduction, and the like on the data acquired by the information sensing layer.

In the embodiment of the application, the information sensing layer can acquire the information of the electronic equipment or the information in the external environment by utilizing a plurality of sensors, and then the information acquired by the information sensing layer is screened by the data processing layer to be used as the sensing data of the scene so as to train the sub-model, thereby constructing the scene model. In addition, after the scene model is built, the information sensing layer can also acquire information of the electronic equipment or information in an external environment by utilizing a plurality of sensors, then the information acquired by the information sensing layer is screened by the data processing layer to be used as data to be identified in the scene identification request, and then the scene model is utilized to identify the scene.

The data cleaning refers to cleaning a large amount of data acquired by the information sensing layer to remove invalid data and repeated data. The data integration refers to integrating a plurality of single-dimensional data acquired by an information sensing layer into a higher or more abstract dimension so as to comprehensively process the plurality of single-dimensional data. The data transformation refers to performing data type conversion or format conversion on the data acquired by the information sensing layer, so that the transformed data meets the processing requirement. Data reduction refers to maximally simplifying the data volume on the premise of keeping the original appearance of the data as much as possible.

The feature extraction layer is used for extracting features of the data processed by the data processing layer so as to extract features included in the data. The extracted features can reflect the state of the electronic equipment itself or the state of the user or the environmental state of the environment where the electronic equipment is located, etc.

The feature extraction layer may extract features by filtration, packaging, integration, or the like, or process the extracted features.

Filtering means that the extracted features are filtered to delete redundant feature data. Packaging methods are used to screen the extracted features. The integration method is to integrate multiple feature extraction methods together to construct a more efficient and accurate feature extraction method for extracting features.

The scene modeling layer is used for constructing a model according to the features extracted by the feature extraction layer, and the obtained model can be used for representing the state of the electronic equipment or the state of a user or the state of the environment and the like. For example, the scenario modeling layer may construct a key value model, a pattern identification model, a graph model, a physical relationship model, an object-oriented model, and the like from the features extracted by the feature extraction layer. The submodel in the embodiment of the application can be constructed according to the features extracted by the feature extraction layer. The constructed sub-model is only an initial model, and a series of operations such as training are needed by using perception data so as to optimize parameters of the sub-model, thereby constructing a scene model.

The intelligent service layer is used for providing intelligent service for users according to the model constructed by the scene modeling layer. For example, the intelligent service layer may provide basic application services for users, may perform system intelligent optimization for electronic devices, and may provide personalized intelligent services for users. In the embodiment of the application, after scene recognition is performed according to the scene recognition method, the intelligent service layer can provide intelligent service for the user according to the scene recognition result. For example, assuming that scene recognition is performed according to the scene recognition method, and as a result, a getting-up event is generated, the intelligent service layer may start an alarm clock at the occurrence time of the getting-up event, may start early-time broadcasting at the occurrence time of the getting-up event, and so on.

In addition, the scene perception architecture can also comprise a plurality of algorithms, each algorithm can be used for analyzing and processing data, and the algorithms can form an algorithm library. For example, the algorithm library may include a markov algorithm, an implicit dirichlet distribution algorithm, a bayesian classification algorithm, a support vector machine, a K-means clustering algorithm, a K-nearest neighbor algorithm, a conditional random field, a residual network, a long-short term memory network, a convolutional neural network, a cyclic neural network, and the like.

Referring to fig. 2, fig. 2 is a schematic diagram of a second application scenario of the scenario model building method according to the embodiment of the present application. As shown in fig. 2, two combinable sub-models are taken as an example for analysis. The general flow of scene model building includes: (1) The method comprises the steps that electronic equipment collects scene data of a user, wherein the scene data mainly comprise environment data, terminal operation data and user behavior data; (2) Setting a plurality of evaluation dimensions of scene modeling, wherein the evaluation dimensions mainly comprise three dimensions of accuracy, recall rate and model complexity, and in an actual environment, a specific evaluation dimension can be formulated according to actual task requirements; (3) Constructing a joint depth network model, inputting the joint depth network model into scene data, and optimizing targets such as accuracy, recall rate and model complexity; (4) And (3) combining the last hidden layers of the two sub-models of the combined depth network model constructed in the previous step, and constructing a full connection layer. The optimization target of the output of the full connection layer is the weighted sum of the output of the first sub-model and the output of the second sub-model, and the specific weight can be manually formulated by an expert, such as 50% and the like; (5) Training and learning the model in the last step until the model converges; (6) Based on the previous step, a converged joint depth network is obtained, and a plurality of constraint conditions can be met at the same time, but the traditional scene model building method can only meet the constraint condition of one dimension. The final scene modeling model can meet the accuracy requirement and simultaneously meet the requirements of low recall rate and model complexity; (7) And based on the last step, acquiring a joint depth network, and identifying the scene category of the user. The optimized objective function applied in the model training process comprises at least two evaluation indexes, so that a plurality of evaluation indexes can be simultaneously satisfied when the scene model is utilized for scene recognition, and the recognition range and recognition accuracy of the scene model are improved.

Wherein, the step (3) can be implemented as follows: the combined depth network model mainly comprises two sub-models, wherein the optimization target of the first sub-model is accuracy and model complexity, and the optimization target of the second network is recall rate and model complexity; the two sub-models are respectively learned and trained until convergence, and the complexity and the operation time in the combined training are mainly reduced.

The embodiment of the application provides a scene model building method which can be applied to electronic equipment. The electronic device may be a smart phone, a tablet computer, a gaming device, an AR (Augmented Reality ) device, an automobile, a data storage device, an audio playing device, a video playing device, a notebook, a desktop computing device, a wearable device such as a watch, glasses, a helmet, an electronic bracelet, an electronic necklace, an electronic article of clothing, etc.

The embodiment of the application provides a scene model building method, which can comprise a plurality of sub-models, wherein the sub-models can be mutually combined. The scene model mainly comprises a first scene sub-model and a second scene sub-model, the first scene sub-model can be combined with the last hidden layer of the second scene sub-model through the last hidden layer of the first scene sub-model, and the combined first scene sub-model and the combined second scene sub-model are used for constructing a full connection model. The full-connection model is correspondingly provided with an optimization objective function, and the evaluation index related to the optimization objective function is equal to the sum of the evaluation indexes of the optimization objective function corresponding to each sub-model. The scene model is applicable to an electronic device as shown in fig. 3.

Referring to fig. 4, fig. 4 is a first flow chart of a scene model building method according to an embodiment of the application. The following will describe from the perspective of an electronic device, and the flow of the scene model building method provided by the embodiment of the application may be as follows:

101. and obtaining the perception data of the scene.

The electronic device may obtain perceived data of the scene. The perceived data of the scene may be the perceived data of the current scene, or may be the perceived data of the scene stored in the electronic device for a certain time, which is not limited herein. The sensory data may include any data. For example, the awareness data may include environmental data, operational data, and user behavior data, among others. The environmental data may include various data such as ambient temperature, ambient picture, and ambient light intensity. The operational data may include a variety of data, such as text data displayed on the electronic device. The user behavior data may include a variety of data such as image data, audio data, and the like.

The electronic device can collect the sensing data of the scene through the information sensing layer in the panoramic sensing architecture. For example, the electronic device may detect an ambient temperature through a temperature sensor, an ambient light intensity through a light sensor, image data in the surrounding environment through a camera, audio data in the surrounding environment through a microphone, and text data displayed on the electronic device through a display control circuit.

In an embodiment of the application, the perceptual data of the scene includes a plurality of features characterizing the scene, which may be used to reflect the situation of the scene. Wherein the plurality of features may include any physical object, virtual object, concept name, and the like. For example, the plurality of features may include people, animals, buildings, cell phones, games, novels, meetings, temperatures, ambient light intensities, and the like. After the perceived data of the scene is input into the sub-model of the electronic device, the electronic device can extract the features in the perceived data and output the scene label corresponding to the perceived data according to the features.

102. Training at least two sub-models based on the optimization objective function and the perception data to obtain the optimization objective function values of the at least two sub-models, wherein the optimization objective function corresponding to the sub-models comprises at least two evaluation indexes.

In the embodiment of the application, the optimization objective function is a judgment index for training the submodel. When the optimization objective function value converges, the weight parameter representing the sub-model is optimized, and model training is completed. The optimization objective functions corresponding to the different sub-models may not be the same. For example, the optimization objective function of the first scene sub-model may be expressed as z=argmin (pre+com), and the optimization objective function of the second scene sub-model may be expressed as z=argmin (recall+com), where the optimization objective functions corresponding to the two sub-models are different. Where Pre represents the accuracy of the evaluation index, recall represents the Recall rate of the evaluation index, and Com represents the complexity of the evaluation index model.

It should be noted that, the optimization objective function corresponding to each sub-model includes at least two evaluation indexes. For example, the optimization objective function of the first scene sub-model may be expressed as z=argmin (pre+com), where the optimization objective function corresponding to the first scene sub-model includes two evaluation indexes of accuracy and model complexity; the optimization objective function of the second scene sub-model may be expressed as z=argmin (recall+com), and the optimization objective function corresponding to the second scene sub-model includes two evaluation indexes of Recall rate and model complexity.

Wherein the evaluation index is a main component of the optimization objective function. The evaluation index needs to be designed in the sub-model building stage, and then model weight parameter optimization is carried out by taking an optimization objective function formed by the evaluation index as a target in the training stage, so that the scene recognition of the optimized scene model is more accurate. The evaluation index is diverse, such as accuracy, recall, model complexity, novelty, diversity, and so forth.

In some embodiments, the calculation formula for accuracy (Pre) may beWherein y is _i Scene category representing the ith record, +.>And representing the ith scene category learned by the sub model.

In some embodiments, the calculation formula for Recall (Recall) may be expressed as Wherein (1)>And (3) representing the ith scene category learned by the sub-model, wherein y represents the scene category of the preset scene label corresponding to the perception data.

In some embodiments, the calculation formula of the model complexity (Com) may be expressed as com=c ₁ +C ₂ . Wherein C is ₁ Indicating the network layer number, C ₂ Indicating the number of neurons.

In some embodiments, before step 102 is performed, the following operations are also performed: establishing at least two sub-models; and generating an optimization objective function corresponding to the at least two sub-models, wherein the optimization objective function corresponding to the sub-models comprises at least two evaluation indexes so as to train the at least two sub-models.

In some embodiments, the submodel may be constructed using a deep neural network (Deep Neural Network).

When training at least two sub-models, the time limit is not specific, the plurality of sub-models can be trained at the same time, one sub-model can be trained until the corresponding optimization objective function value converges, the other sub-models are trained one by one, and the sub-model is not limited after training.

103. And when the optimized objective function value of the at least two sub-models meets a preset condition, weighting the at least two sub-models to construct a scene model.

In some embodiments, the at least two sub-models may be weighted after all of the optimization objective function values corresponding to the at least two sub-models converge to establish the scene model. In some embodiments, the optimization objective function values corresponding to the at least two sub-models may not be equal and all converge, and the weighting process is directly performed on the at least two sub-models to build the scene model. Because the optimization objective function values corresponding to the single sub-model are converged, but the weighted values of some sub-models are not necessarily converged, there is no time-sequence relationship between the step 102 and the step 103, and whether the step 102 is executed has no influence on the execution of the step 103.

In the embodiment of the present application, before executing step 103, the following operations are further required to be executed: combining the last hidden layers of the at least two sub-models; and acquiring the proportion parameters of the weighting process so as to carry out the weighting process on the at least two sub-models. Wherein each submodel includes a plurality of hidden layers. The proportion parameters can be customized by a user, and can be preferentially specified by an electronic equipment manufacturer after multiple experiments.

For example, assume that an electronic device a includes a first scene sub-model, a second scene sub-model, and a third sub-model, where any two sub-models are combined with each other through the last hidden layer of the two sub-models; the optimization objective function of the first scene sub-model is z=argmin (pre+com), the optimization objective function of the second scene sub-model is z=argmin (recall+com), and the optimization objective function of the third sub-model is z=argmin (var+com); where Var in the optimization objective function represents the evaluation index diversity, otherwise refer to the description above. When the electronic device performs weighting processing on the combined first scene sub-model and second scene sub-model, at this time, an optimization objective function of comprehensive training of the first scene sub-model and the second scene sub-model is z=argmin [ (pre+com) ×a++ (recall+com) ×1-a%) ]; when the electronic device performs weighting treatment on the combined first scene sub-model and the combined third sub-model, at this time, an optimization objective function of comprehensive training of the first scene sub-model and the combined third sub-model is z=argmin [ (pre+com) ×a++ (var+com) ×1-a%) ]; when the electronic equipment performs weighting treatment on the combined second scene sub-model and the combined third sub-model, at the moment, the optimization objective function of the comprehensive training of the second scene sub-model and the combined third sub-model is Z=argmin [ (recall+com) ×a++ (Var+com) ×1-a%) ]; when the electronic device performs weighting processing on the combined first scene sub-model, second scene sub-model and third sub-model, at this time, an optimization objective function of the comprehensive training of the first scene sub-model, second scene sub-model and third sub-model is z=argmin [ (pre+com) ×a++ (recall+com) ×b++ (var+com) ×1-a% b%) ]. The Com calculated value of the first scene sub-model, the Com calculated value of the second scene sub-model, and the Com calculated value of the third scene sub-model are not necessarily the same, because the number of network layers of the first scene sub-model, the second scene sub-model, and the third scene sub-model is not necessarily the same as the number of neurons.

It should be noted that, during the comprehensive training, weighting processing may be performed on several sub-models according to actual practice, for example, a mode of performing comprehensive training by two sub-models may be adopted, a mode of performing comprehensive training by three sub-models may be adopted, and the like may be set by a user, which is not limited specifically herein.

The electronic device may then provide personalized services to the user based on the scene model. For example, when the user turns on the driving mode, the electronic device may query the state of the user in the driving mode according to the scene model, for example, determine whether the user waits for a traffic light, and make a corresponding decision according to the determined result.

From the above, in the embodiment of the present application, when the scene model is built, the perceived data of the scene may be acquired first. Then, training at least two sub-models based on the optimized objective function and the perception data, wherein the optimized objective function corresponding to the sub-model comprises at least two evaluation indexes. Finally, weighting the at least two sub-models to construct a scene model. Thus, when training at least two sub-models, optimizing the weight parameters of the sub-models with an optimization objective function including a plurality of evaluation indexes as a target, thereby establishing a scene model. The more comprehensive the evaluation index is during training, the more accurate the weight parameters of the constructed scene model are, and the more the dimensions of the evaluation index can be met by the model, so that the scene model is more accurate during scene recognition, the scene model can recognize the data requiring the multidimensional evaluation index, and the recognition range and recognition accuracy of the scene model can be improved.

Referring to fig. 5, fig. 5 is a second flow chart of a scene model building method according to an embodiment of the application. The scene model building method can be applied to the electronic device, as shown in fig. 5, and the flow of the scene model building method can be as follows:

201. and obtaining the perception data of the scene.

202. And inputting the perception data into the sub-model for training so as to output a corresponding scene label.

In some embodiments, before step 202 is performed, the following operations are also performed: establishing at least two sub-models; and generating an optimization objective function corresponding to the at least two sub-models, wherein the optimization objective function corresponding to the sub-models comprises at least two evaluation indexes so as to train the at least two sub-models.

203. And obtaining the optimized objective function value of the sub-model of the perception data input according to the optimized objective function, the preset scene label and the scene label.

Each piece of perception data is correspondingly provided with a preset scene label, and the preset scene label corresponding to the sample data can be set by a user. The preset scene tag is mainly used for matching the scene tag corresponding to the output perception data of the sub-model when the sub-model is trained so as to determine the correctness of scene recognition. If the matching is successful, the scene identification is successful, and if the matching is failed, the scene identification is failed.

In some embodiments, the calculation formula for accuracy (Pre) may beWherein y is _i Scene category representing sub-model identification of the ith record,/->The representation sub-model identifies the correct ith scene category.

In some embodiments, the calculation formula for Recall (Recall) may be expressed as Wherein (1)>And (3) the sub-model is shown to identify the correct ith scene category, and y is shown to be the scene category of the preset scene label corresponding to the perception data.

204. And if the optimization objective function value is converged, establishing a scene sub-model.

The number of scene sub-models established by the electronic equipment is equal to the number of sub-models for training. For example, training the first scene sub-model of the perception data input to output a corresponding scene tag; acquiring an optimized objective function value of a first scene sub-model input by the perception data according to the optimized objective function, a preset scene tag and the scene tag; and if the optimization objective function value is converged, establishing a first scene sub-model.

It should be noted that, when the perceptual data input to the first scene sub-model tends to infinity, the optimized objective function value tends to a certain finite value. If the optimized objective function value converges to establish the first scene sub-model, no matter what kind of data is input to the first scene sub-model, the kind, the number and the like of the data are not limited, and the re-acquired optimized objective function value always approximates the limited value. Wherein the optimization objective function value may change every time data is entered in the sub-model/scene sub-model.

205. And if the optimized objective function value is not converged, adjusting the weight parameter of the sub-model input by the perception data until the optimized objective function value is converged.

In some embodiments, the following operations may be performed directly after performing step 201. Referring to fig. 6, fig. 6 is a third flow chart of a scene model building method according to an embodiment of the present application.

206. And obtaining the optimized objective function values of the at least two sub-models.

In the embodiment of the present application, before executing step 206, the following operations are further performed: combining the last hidden layers of the at least two sub-models; and acquiring the proportion parameters of the weighting process so as to carry out the weighting process on the at least two sub-models. Wherein each submodel includes a plurality of hidden layers. The proportion parameters can be customized by a user, and can be preferentially specified by an electronic equipment manufacturer after multiple experiments.

207. And weighting the optimized objective function value to obtain a weighted value of the optimized objective function value.

The electronic equipment carries out weighting treatment on the optimized objective function value, and the essence is to comprehensively train the full-connection layer. The full-connection layer is constructed by the combined sub-models, the full-connection layer is correspondingly provided with an optimization objective function, and the evaluation index related to the optimization objective function is equal to the sum of the evaluation indexes of the optimization objective function corresponding to each combined sub-model.

For example, assume that an electronic device a includes a first scene sub-model, a second scene sub-model, and a third sub-model, where any two sub-models are combined with each other through the last hidden layer of the two sub-models; the optimization objective function of the first scene sub-model is z=argmin (pre+com), the optimization objective function of the second scene sub-model is z=argmin (recall+com), and the optimization objective function of the third scene sub-model is z=argmin (var+com); where Var in the optimization objective function represents the evaluation index diversity, otherwise refer to the description above. When the electronic device performs weighting processing on the combined first scene sub-model and second scene sub-model, at this time, an optimization objective function of comprehensive training of the first scene sub-model and the second scene sub-model is z=argmin [ (pre+com) ×a++ (recall+com) ×1-a%) ]; when the electronic device performs weighting treatment on the combined first scene sub-model and the combined third sub-model, at this time, an optimization objective function of comprehensive training of the first scene sub-model and the combined third sub-model is z=argmin [ (pre+com) ×a++ (var+com) ×1-a%) ]; when the electronic equipment performs weighting treatment on the combined second scene sub-model and the combined third sub-model, at the moment, the optimization objective function of the comprehensive training of the second scene sub-model and the combined third sub-model is Z=argmin [ (recall+com) ×a++ (Var+com) ×1-a%) ]; when the electronic device performs weighting processing on the combined first scene sub-model, second scene sub-model and third sub-model, at this time, an optimization objective function of comprehensive training of the first scene sub-model, second scene sub-model and third sub-model is z=argmin [ (pre+com) ×a++ (recall+com) ×b++ (var+com) ×1-a% -b%) ]. The Com calculated value of the first scene sub-model, the Com calculated value of the second scene sub-model, and the Com calculated value of the third scene sub-model are not necessarily the same, because the number of network layers of the first scene sub-model, the second scene sub-model, and the third scene sub-model is not necessarily the same as the number of neurons.

208. If the weighted value converges, a scene model is established.

It should be noted that, when the perceptual data input to the at least two sub-models tends to infinity, the weighted value tends to a certain finite value. If the weighting value converges to establish the scene model, no matter what kind of data is input to the scene model, the kind, the number and the like of the data are not limited, and the re-acquired weighting value always approximates the limited value. Wherein the weighting value may change every time data is entered in the scene model. The finite value is not particularly limited, and may be any value.

The scene model comprises at least two sub-models, and the optimization objective function corresponding to each sub-model comprises at least two evaluation indexes. The at least two evaluation indexes included between the sub-models are not necessarily identical.

In some embodiments, the scene model mainly includes a first scene sub-model and a second scene sub-model; the evaluation index of the optimization objective function of the first scene sub-model comprises accuracy and model complexity; the evaluation index of the optimized objective function of the second scene sub-model comprises recall and model complexity.

209. And if the weighted value is not converged, adjusting the weight parameters of the at least two sub-models until the weighted value is converged.

Wherein the weighted value does not converge, which means that the value of the weighted value fluctuates in a larger range for a certain finite value when the perceptual data input to the at least two sub-models is sufficiently large. The finite value is not particularly limited, and may be any value.

In some embodiments, after the scene model is built, the following operations may also be performed. Referring to fig. 7, fig. 7 is a fourth flowchart of a scene model building method according to an embodiment of the present application.

210. A scene recognition request is received, wherein the scene recognition request includes a task identification and data to be recognized.

Specifically, the scene recognition request may be triggered by the terminal itself or may be triggered by the user operating the terminal. For example, when it is detected that the user turns on the driving mode, the electronic device receives a scene recognition request.

The task identifiers are certificates for classifying the data to be identified according to a certain rule, and corresponding task identifiers are set for each data to be identified. The data to be identified is classified in various ways, for example, classification is performed according to the specific usage path of the data to be identified, and the classification may include recommending task identification, classifying task identification, and the like. The classification of the data to be identified is not particularly limited herein.

211. And matching the task identifier with a preset mapping table to obtain an optimized objective function of scene identification, wherein the mapping table comprises the relation between the task identifier and the optimized objective function.

A preset mapping table between task identifiers and optimization objective functions can be preset in the electronic device. Wherein, a corresponding optimization objective function can be set for each task identifier in a plurality of task identifiers in a manually set manner. And then, establishing a preset mapping table between the task identifications and the optimization objective function according to the task identifications and the optimization objective function corresponding to each task identification.

For example, a plurality of task identifications may be determined by an expert in the art, and then an optimization objective function may be set for each task identification. And then, storing the task identifications, the optimizing objective functions and the corresponding relation between each task identification and the optimizing objective function in a database form, and establishing a preset mapping table between the task identifications and the optimizing objective functions. The preset mapping table may refer to table 1 specifically:

TABLE 1

Task identification	Optimizing an objective function
		Classification task	Z＝argmin(Pre+Com)
Recommending tasks	Z＝argmin(Pre+Recall+Com)

Where Pre represents the accuracy of the evaluation index, recall represents the Recall rate of the evaluation index, and Com represents the complexity of the evaluation index model.

212. And carrying out scene recognition on the data to be recognized by utilizing the scene model according to the optimized objective function of scene recognition.

The electronic equipment can acquire the specific position of the scene model to be input by the data to be identified according to the optimized objective function of scene identification, and then input the data to be identified to the specific position for scene identification.

For example, assume that the scene model includes a first scene sub-model whose optimization objective function is z=argmin (pre+com), and a second scene sub-model whose optimization objective function is z=argmin (recall+com); if the task identifier of the data M to be identified is a recommended task, the optimization objective function of the recommended task in the preset mapping table is z=argmin (pre+recall+com). When step 212 is specifically executed, firstly, specific positions of a scene model to be input of data M to be identified, namely a first scene sub-model and a second scene sub-model, need to be obtained according to an optimization objective function Z=argmin (Pre+recall+com) of scene identification; secondly, inputting the data to be identified into the specific position for scene identification.

As can be seen from the above, in the embodiment of the present application, when the electronic device builds the scene model, the sub-model of the perception data input can be trained to output the corresponding scene tag. And then, according to the optimized objective function, the preset scene label and the scene label, acquiring the optimized objective function value of the sub-model input by the perception data until the optimized objective function value is converged, and establishing a scene sub-model. And finally, obtaining the optimized objective function values of the at least two sub-models, and carrying out weighting treatment until the weighted values are converged, so as to establish a scene model. Thus, when training at least two sub-models, optimizing the weight parameters of the sub-models with an optimization objective function including a plurality of evaluation indexes as a target, thereby establishing a scene model. The more comprehensive the evaluation index is during training, the more accurate the weight parameters of the constructed scene model are, and the more the dimensions of the evaluation index can be met by the model, so that the scene model is more accurate during scene recognition, the scene model can recognize the data requiring the multidimensional evaluation index, and the recognition range and recognition accuracy of the scene model can be improved.

In order to facilitate better implementation of the scene model building method provided by the embodiment of the application, the embodiment of the application also provides a scene model building device based on the scene model building method. The meaning of the nouns is the same as that in the scene model building method, and specific implementation details can be referred to the description in the method embodiment.

Referring to fig. 8, fig. 8 is a first structural schematic diagram of a scene model building apparatus according to an embodiment of the application. Specifically, the scene model creation apparatus 300 includes: an acquisition module 301, a training module 302 and a weighting processing module 303.

The acquisition module 301 is configured to acquire perceptual data of a scene.

The training module 302 is configured to train at least two sub-models to obtain an optimized objective function value of the at least two sub-models based on the optimized objective function and the perception data, where the optimized objective function corresponding to the sub-models includes at least two evaluation indexes.

And the weighting processing module 303 is configured to perform weighting processing on the at least two sub-models to construct a scene model when the optimized objective function value of the at least two sub-models meets a preset condition.

In some embodiments, referring to fig. 9, fig. 9 is a second structural schematic diagram of a scene model building apparatus according to an embodiment of the present application. The training module 302 may further include:

the training submodule 3021 is used for inputting the perception data into the submodule to train so as to output a corresponding scene label.

The first obtaining submodule 3022 is configured to obtain an optimized objective function value of the sub-model of the sensory data input according to the optimized objective function, the preset scene tag, and the scene tag.

A first building sub-module 3023, configured to build a scene sub-model if the optimization objective function value converges.

And a first adjustment submodule 3024, configured to adjust the weight parameter of the sub-model of the perceptual data input if the optimization objective function value does not converge, until the optimization objective function value converges.

In some embodiments, referring to fig. 10, fig. 10 is a third structural schematic diagram of a scene model building apparatus according to an embodiment of the present application. The weighting processing module 303 may further include:

a second obtaining sub-module 3031 is configured to obtain the optimized objective function values of the at least two sub-models.

The weighting processing sub-module 3032 is configured to perform weighting processing on the optimized objective function value to obtain a weighted value of the optimized objective function value.

The second building sub-module 3033 is configured to build a scene model if the weighted values converge.

And a second adjustment submodule 3034, configured to adjust the weight parameters of the at least two submodules until the weight value converges if the weight value does not converge.

In some embodiments, referring to fig. 11, fig. 11 is a fourth structural schematic diagram of a scene model building apparatus according to an embodiment of the present application. The scene model building apparatus 300 may further include:

The receiving module 304 is configured to receive a scene recognition request, where the scene recognition request includes a task identifier and data to be recognized.

The matching module 305 is configured to match the task identifier with a preset mapping table to obtain an optimized objective function for scene identification, where the mapping table includes a relationship between the task identifier and the optimized objective function.

The recognition module 306 is configured to perform scene recognition on the data to be recognized by using the scene model according to an optimized objective function of scene recognition.

As can be seen from the above, the scene model building apparatus 300 provided in the embodiment of the present application includes: an acquisition module 301, configured to acquire perceived data of a scene; the training module 302 is configured to train at least two sub-models to obtain an optimized objective function value of the at least two sub-models based on the optimized objective function and the perception data, where the optimized objective function corresponding to the sub-models includes at least two evaluation indexes; and the weighting processing module 303 is configured to perform weighting processing on the at least two sub-models to construct a scene model when the optimized objective function value of the at least two sub-models meets a preset condition. Therefore, when the electronic equipment performs intelligent operation, the established scene model can meet evaluation indexes of multiple dimensions, and the recognition range and recognition accuracy of the scene model can be improved.

The embodiment of the application also provides a scene recognition method, referring to fig. 12, fig. 12 is a schematic flow chart of the scene recognition method provided by the embodiment of the application.

401. Receiving a scene recognition request, wherein the scene recognition request comprises a task identifier and data to be recognized;

402. matching the task identifier with a preset mapping table to obtain an optimized objective function of scene identification, wherein the mapping table comprises the relation between the task identifier and the optimized objective function;

403. and carrying out scene recognition on the data to be recognized by utilizing a scene model according to the optimized objective function of the scene recognition, wherein the scene model is constructed by the scene model building method in the embodiment.

The specific explanation of this embodiment may be referred to the above embodiments, and will not be repeated here.

In the embodiment of the application, the scene recognition can be performed according to the task selection scene recognition optimization objective function, for example, the task selection with higher accuracy requirement comprises the optimization objective function with more evaluation indexes, and the task selection with lower accuracy requirement comprises the optimization objective function with less evaluation indexes, so as to improve the flexibility of the scene recognition.

In order to facilitate better implementation of the scene recognition method provided by the embodiment of the application, the embodiment of the application also provides a scene recognition device based on the scene recognition method. The meaning of the nouns is the same as that in the scene recognition method, and specific implementation details can refer to the description in the embodiment of the method.

Referring to fig. 13, fig. 13 is a schematic structural diagram of a scene recognition device according to an embodiment of the application. Specifically, the scene recognition apparatus 500 includes: a receiving module 501, a matching module 502, and an identifying module 503.

A receiving module 501, configured to receive a scene recognition request, where the scene recognition request includes a task identifier and data to be recognized;

the matching module 502 is configured to match the task identifier with a preset mapping table to obtain an optimized objective function for scene identification, where the mapping table includes a relationship between the task identifier and the optimized objective function;

the recognition module 503 is configured to perform scene recognition on the data to be recognized by using a scene model according to an optimized objective function of scene recognition, where the scene model is constructed by using the scene model building method described in the foregoing embodiment.

The embodiment of the application also provides electronic equipment. The electronic device may be a smart phone, a tablet computer, a gaming device, an AR (Augmented Reality ) device, an automobile, a data storage device, an audio playing device, a video playing device, a notebook computer, a desktop computing device, a wearable device such as an electronic watch, electronic glasses, an electronic helmet, an electronic bracelet, an electronic necklace, an electronic article of clothing, or the like.

Referring to fig. 14, fig. 14 is a first structural schematic diagram of an electronic device according to an embodiment of the present application.

Wherein the electronic device 600 comprises a processor 601 and a memory 602. The processor 601 is electrically connected to the memory 602.

The processor 601 is a control center of the electronic device 600, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or calling computer programs stored in the memory 602, and calling data stored in the memory 602, thereby performing overall monitoring of the electronic device.

In this embodiment, the processor 601 in the electronic device 600 loads instructions corresponding to the processes of one or more computer programs into the memory 602 according to the following steps, and the processor 601 executes the computer programs stored in the memory 602, so as to implement various functions:

acquiring perception data of a scene;

In some embodiments, when training at least two sub-models based on the optimization objective function and the perceptual data, the processor 601 performs the steps of:

inputting the perception data into a sub-model for training so as to output a corresponding scene label;

acquiring an optimized objective function value of a sub-model of the perception data input according to the optimized objective function, a preset scene tag and the scene tag;

if the optimization objective function value is converged, a scene sub-model is established;

and if the optimized objective function value is not converged, adjusting the weight parameter of the sub-model input by the perception data until the optimized objective function value is converged.

In some embodiments, before weighting the at least two sub-models to construct the scene model, the processor 601 performs the steps of:

combining the last hidden layers of the at least two sub-models;

and acquiring the proportion parameters of the weighting process so as to carry out the weighting process on the at least two sub-models.

In some embodiments, when weighting the at least two sub-models to construct the scene model, the processor 601 performs the steps of:

Obtaining the optimized objective function values of the at least two sub-models;

weighting the optimized objective function value to obtain a weighted value of the optimized objective function value;

if the weighted value converges, a scene model is established;

and if the weighted value is not converged, adjusting the weight parameters of the at least two sub-models until the weighted value is converged.

In some embodiments, after weighting the at least two sub-models to construct the scene model, the processor 601 performs the steps of:

and carrying out scene recognition on the data to be recognized by utilizing the scene model according to the optimized objective function of scene recognition.

Memory 602 may be used to store computer programs and data. The memory 602 stores computer programs that include instructions executable in a processor. The computer program may constitute various functional modules. The processor 601 executes various functional applications and data processing by invoking computer programs stored in the memory 602.

In some embodiments, referring to fig. 15, fig. 15 is a second structural schematic diagram of an electronic device according to an embodiment of the present application.

Wherein the electronic device 600 further comprises: a display 603, a control circuit 604, an input unit 605, a sensor 606, and a power supply 607. The processor 601 is electrically connected to the display 603, the control circuit 604, the input unit 605, the sensor 606, and the power supply 607, respectively.

The display 603 may be used to display information entered by a user or provided to a user as well as various graphical user interfaces of the electronic device, which may be composed of images, text, icons, video, and any combination thereof.

The control circuit 604 is electrically connected to the display screen 603, and is used for controlling the display screen 603 to display information.

The input unit 605 may be used to receive entered numbers, character information, or user characteristic information (e.g., fingerprints), and to generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control. The input unit 605 may include a fingerprint recognition module.

The sensor 606 is used to collect information of the electronic device itself or information of a user or external environment information. For example, the sensor 606 may include a plurality of sensors such as a distance sensor, a magnetic field sensor, a light sensor, an acceleration sensor, a fingerprint sensor, a hall sensor, a position sensor, a gyroscope, an inertial sensor, a gesture sensor, a barometer, a heart rate sensor, and the like.

The power supply 607 is used to power the various components of the electronic device 600. In some embodiments, the power supply 607 may be logically connected to the processor 601 through a power management system, so as to perform functions of managing charging, discharging, and power consumption management through the power management system.

Although not shown in fig. 15, the electronic device 600 may further include a camera, a bluetooth module, etc., which will not be described herein.

As can be seen from the above, the embodiment of the present application provides an electronic device, which performs the following steps: acquiring perception data of a scene; training at least two sub-models based on the optimized objective function and the perception data, wherein the optimized objective function corresponding to the sub-model comprises at least two evaluation indexes; the at least two sub-models are weighted to construct a scene model. Therefore, when the electronic equipment performs intelligent operation, the established scene model can meet evaluation indexes of multiple dimensions, and the recognition range and recognition accuracy of the scene model can be improved.

The embodiment of the application also provides a storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer executes the scene model building method according to any one of the embodiments.

It should be noted that, those skilled in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the storage medium may include, but is not limited to: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

The method, the device, the storage medium and the electronic equipment for establishing the scene model provided by the embodiment of the application are described in detail. The principles and embodiments of the present application have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present application and the core ideas thereof; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, the present description should not be construed as limiting the present application.

Claims

1. A method for creating a scene model, comprising:

obtaining perception data of a scene, wherein the perception data comprises environment data, operation data and user behavior data;

Training at least two sub-models based on an optimization objective function and the perception data to obtain the optimization objective function values of the at least two sub-models, wherein the optimization objective function values are values corresponding to the convergence of the corresponding optimization objective function values, the optimization objective function corresponding to the sub-models comprises at least two evaluation indexes, and the at least two evaluation indexes are at least two of the following multiple evaluation indexes: accuracy, recall, model complexity, novelty, diversity; the step of training at least two sub-models based on the optimized objective function and the perception data to obtain the optimized objective function values of the at least two sub-models includes: inputting the perception data into a sub-model for training so as to output a corresponding scene label; obtaining an optimized objective function value of a sub-model of the perception data input according to an optimized objective function, a preset scene label and the scene label; if the optimization objective function value is converged, a scene sub-model is established; if the optimized objective function value is not converged, adjusting the weight parameter of the sub-model input by the perception data until the optimized objective function value is converged;

When the optimized objective function value of the at least two sub-models meets a preset condition, weighting the at least two sub-models to construct a scene model, wherein the preset condition is as follows: the optimized objective function values corresponding to the at least two sub-models are all converged, or at least one optimized objective function value in the optimized objective function values corresponding to the at least two sub-models is converged; and the sum of the weight values corresponding to the at least two sub-models is 1.

2. The scene model building method according to claim 1, wherein the scene model comprises a first scene sub-model and a second scene sub-model;

the evaluation index of the optimization objective function of the first scene sub-model comprises accuracy and model complexity;

the evaluation index of the optimization objective function of the second scene sub-model comprises recall rate and model complexity.

3. The scene model building method according to claim 1, wherein each sub-model includes a plurality of hidden layers, and before weighting the at least two sub-models, further comprising:

combining the last hidden layers of the at least two sub-models;

4. The method for modeling a scene as defined in claim 1, wherein said weighting the at least two sub-models to construct the scene model comprises:

if the weighted values are converged, a scene model is built;

5. A scene recognition method, comprising:

according to an optimized objective function of scene recognition, the data to be recognized is scene-recognized by using a scene model, wherein the scene model is constructed by the scene model construction method according to any one of claims 1 to 4.

6. A scene model building apparatus, characterized by comprising:

the acquisition module is used for acquiring the perception data of the scene, wherein the perception data comprises environment data, operation data and user behavior data;

the training module is configured to train at least two sub-models based on an optimization objective function and the perception data to obtain an optimization objective function value of the at least two sub-models, where the optimization objective function value is a value corresponding to convergence of the corresponding optimization objective function value, and the optimization objective function corresponding to the sub-models includes at least two evaluation indexes, and the at least two evaluation indexes are at least two of the following multiple evaluation indexes: accuracy, recall, model complexity, novelty, diversity; the training module comprises: the training sub-module is used for inputting the perception data into the sub-model for training so as to output a corresponding scene label; the first acquisition sub-module is used for acquiring the optimized objective function value of the sub-model of the perception data input according to the optimized objective function, a preset scene tag and the scene tag; the first establishing sub-module is used for establishing a scene sub-model if the optimized objective function value converges; the first adjusting sub-module is used for adjusting the weight parameters of the sub-model of the perception data input if the optimized objective function value is not converged until the optimized objective function value is converged;

The weighting processing module is used for carrying out weighting processing on the at least two sub-models when the optimized objective function value of the at least two sub-models meets a preset condition so as to construct a scene model, wherein the preset condition is as follows: the optimized objective function values corresponding to the at least two sub-models are all converged, or at least one optimized objective function value in the optimized objective function values corresponding to the at least two sub-models is converged; and the sum of the weight values corresponding to the at least two sub-models is 1.

7. The scene model building apparatus according to claim 6, wherein the weighting processing module comprises:

the second acquisition sub-module is used for acquiring the optimized objective function values of the at least two sub-models;

the weighting processing sub-module is used for carrying out weighting processing on the optimized objective function value so as to obtain a weighted value of the optimized objective function value;

the second building sub-module is used for building a scene model if the weighted values are converged;

and the second adjusting sub-module is used for adjusting the weight parameters of the at least two sub-models until the weighted values are converged if the weighted values are not converged.

8. A scene recognition device, comprising:

the recognition module is used for performing scene recognition on the data to be recognized by using a scene model according to an optimized objective function of scene recognition, wherein the scene model is constructed by the scene model building method according to any one of claims 1 to 4.

9. A storage medium having stored thereon a computer program which, when run on a computer, causes the computer to perform the scene model building method according to any one of claims 1 to 4 or the scene recognition method according to claim 5.

10. An electronic device comprising a processor and a memory, the memory having a computer program, characterized in that the processor is adapted to execute the scene model building method according to any one of claims 1 to 4 or the scene recognition method according to claim 5 by invoking the computer program.