WO2022179281A1 - Procédé d'identification d'environnement et appareil - Google Patents

Procédé d'identification d'environnement et appareil Download PDF

Info

Publication number
WO2022179281A1
WO2022179281A1 PCT/CN2021/140833 CN2021140833W WO2022179281A1 WO 2022179281 A1 WO2022179281 A1 WO 2022179281A1 CN 2021140833 W CN2021140833 W CN 2021140833W WO 2022179281 A1 WO2022179281 A1 WO 2022179281A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
azimuth angle
recognition result
scene
feature
Prior art date
Application number
PCT/CN2021/140833
Other languages
English (en)
Chinese (zh)
Inventor
彭璐
赵安
黄维
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022179281A1 publication Critical patent/WO2022179281A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements

Definitions

  • the present application relates to the technical field of image processing, and in particular, to a scene recognition method and device.
  • the scene recognition technology based on mobile terminals such as mobile phones is an important basic perception capability, which can serve a variety of services such as smart travel, direct service, intention decision-making, smart noise reduction of headphones, search, and recommendation.
  • Existing scene recognition technologies mainly include: image-based scene recognition methods, sensor (eg, WiFi, Bluetooth, location sensors, etc.) positioning-based scene recognition methods, or signal fingerprint comparison-based scene recognition methods.
  • the image-based scene recognition method is affected by the camera's viewing angle range, camera angle, object occlusion and other factors, the robustness in complex scenes is greatly challenged, and it is difficult to realize real-time scene recognition without user perception.
  • the front or rear camera is used to collect images for recognition. Due to the small viewing angle range and few effective features, or the shooting angle is arbitrary, or there is a large amount of object feature information in the same image, noise information will be Overwhelmed by key features, these are prone to miscalculation. For example, when a ceiling feature appears in the image, it is recognized as indoors, but it may actually be on a subway or an airplane. Therefore, the technical problem to be solved by this application is how to improve the recognition accuracy of the image-based scene recognition method.
  • a scene recognition method and device which combine multiple images and the azimuth angle corresponding to each image to recognize the scene. Since more comprehensive scene information is obtained, the image-based scene recognition can be improved. The accuracy solves the problem of limited viewing angle range and shooting angle of single camera recognition scene, and the recognition is more accurate.
  • an embodiment of the present application provides a scene recognition method, the method includes: a terminal device collects images in the same scene from multiple azimuths through multiple cameras, wherein when each camera collects the image
  • the azimuth angle is the azimuth angle corresponding to the image
  • the azimuth angle is the angle between the direction vector and the gravity unit vector when each camera collects the image
  • the same scene is identified by the azimuth, and the scene identification result is obtained.
  • multiple cameras are used to shoot images of multiple azimuth angles of the same scene, and the scene is recognized by combining the multiple images and the azimuth angles corresponding to each image. Since more comprehensive scene information is obtained, Therefore, the accuracy of image-based scene recognition can be improved, the problems of limited viewing angle range and shooting angle of a single camera to recognize a scene can be solved, and the recognition can be more accurate.
  • the terminal device recognizes the same scene according to the image and the azimuth angle corresponding to the image, and obtains a scene recognition result, including: The azimuth angle is extracted from the azimuth angle feature corresponding to the image; the scene recognition model is used to identify the same scene based on the image and the azimuth angle feature corresponding to the image, and a scene recognition result is obtained, wherein the scene recognition model is a neural network. Model.
  • the scene recognition model includes multiple pairs of the first feature extraction layer and the first layer model, and each pair of the first feature extraction layer and The first layer model is used to process an image of an azimuth angle and an azimuth angle corresponding to the image of the one azimuth angle to obtain a first recognition result; wherein, the azimuth angle corresponding to the image of the one azimuth angle is the first recognition result.
  • the first feature extraction layer is used to extract the feature of the image of the one azimuth angle to obtain a feature vector
  • the first layer model is used to extract the feature vector according to the feature vector and the one azimuth
  • the azimuth angle corresponding to the image of the angle is obtained, and the first recognition result is obtained;
  • the scene recognition model further includes a second layer model, and the second layer model is used for the first recognition result and the corresponding first recognition result.
  • the azimuth angle is used to obtain the scene recognition result.
  • the scene recognition method of the embodiment of the present application adopts a two-layer scene recognition model, collects images from multiple angles and combines the azimuth angles of the images to perform scene recognition by using a competition mechanism, and considers the results of local features and overall features.
  • the accuracy of recognition in scene recognition without user perception reduces misjudgment.
  • the first feature extraction layer is configured to extract the feature of the image at one azimuth angle to obtain a plurality of feature vectors;
  • the first layer model is used to calculate the first weight corresponding to each feature vector in the plurality of feature vectors according to the azimuth angle corresponding to the image of the one azimuth angle;
  • the first layer model is used to calculate the first weight corresponding to each feature vector according to the eigenvectors and a first weight corresponding to each eigenvector to obtain the first recognition result.
  • the second layer model is used to calculate the azimuth angle of the first identification result according to the azimuth angle corresponding to the first identification result second weight; the second layer model is configured to obtain the scene recognition result according to the first recognition result and the second weight of the first recognition result.
  • the second layer model presets a third weight corresponding to each of the first recognition results, and the second The layer model is used to obtain the scene recognition result according to the first recognition result and the third weight corresponding to the first recognition result.
  • the second layer model is configured to determine each of the first recognition results according to the azimuth angle and a preset rule A corresponding fourth weight; wherein, the preset rule is a weight group set according to the azimuth angle, different azimuth angles correspond to different weight groups, and each weight group includes a fourth weight corresponding to each first identification result;
  • the second-layer model is configured to obtain the scene recognition result according to the first recognition result and the fourth weight corresponding to the first recognition result.
  • the method further includes: the terminal device acquires the acceleration on the coordinate axis of the three-dimensional rectangular coordinate system corresponding to the gravity sensor when each camera captures the image, and obtains each The direction vector when each camera collects the image; wherein, the corresponding three-dimensional rectangular coordinate system when each camera collects the image takes each camera as the origin, the z direction is the direction along the camera, and x and y are respectively is a direction perpendicular to the z direction, and the plane where x and y are located is perpendicular to the z direction; the azimuth angle is calculated according to the direction vector and the gravity unit vector.
  • the The method before using a scene recognition model to identify the same scene based on the image and the azimuth feature corresponding to the image, the The method further includes: the terminal device preprocesses the image; wherein, the preprocessing includes one or a combination of the following processes: converting an image format, converting an image channel, unifying the image size, and normalizing the image, Converting image format refers to converting color images to black and white images. Converting image channels refers to converting images to red, green, and blue RGB channels. Unifying image size refers to adjusting multiple images to have the same length and width. Image normalization refers to Normalize the pixel values of the image.
  • an embodiment of the present application provides a scene recognition apparatus, the apparatus includes: an image acquisition module, configured to collect images in the same scene from multiple azimuth angles through multiple cameras, wherein each camera collects images in the same scene.
  • the azimuth angle of the image is the azimuth angle corresponding to the image, and the azimuth angle is the angle between the direction vector and the gravity unit vector when each camera collects the image;
  • the same scene is recognized by the image and the azimuth angle corresponding to the image, and a scene recognition result is obtained.
  • the scene recognition device of the embodiment of the present application uses multiple cameras to capture images of multiple azimuth angles of the same scene, and recognizes the scene in combination with the multiple images and the azimuth angle corresponding to each image. Since more comprehensive scene information is obtained, Therefore, the accuracy of image-based scene recognition can be improved, the problems of limited viewing angle range and shooting angle of a single camera to recognize a scene can be solved, and the recognition can be more accurate.
  • the scene recognition module includes: an azimuth feature extraction module, configured to extract the azimuth feature corresponding to the image from the azimuth angle corresponding to the image; scene recognition a model for recognizing the same scene based on the image and the azimuth feature corresponding to the image to obtain a scene recognition result, wherein the scene recognition model is a neural network model.
  • the scene recognition model includes multiple pairs of the first feature extraction layer and the first layer model, and each pair of the first feature extraction layer and The first layer model is used to process an image of an azimuth angle and an azimuth angle corresponding to the image of the one azimuth angle to obtain a first recognition result; wherein, the azimuth angle corresponding to the image of the one azimuth angle is the first recognition result.
  • the first feature extraction layer is used to extract the feature of the image of the one azimuth angle to obtain a feature vector
  • the first layer model is used to extract the feature vector according to the feature vector and the one azimuth
  • the azimuth angle corresponding to the image of the angle is obtained, and the first recognition result is obtained;
  • the scene recognition model further includes a second layer model, and the second layer model is used for the first recognition result and the corresponding first recognition result.
  • the azimuth angle is used to obtain the scene recognition result.
  • the scene recognition device of the embodiment of the present application adopts a two-layer scene recognition model, collects multi-angle images and combines the azimuth angles of the images to perform scene recognition by using a competition mechanism, and considers the results of local features and overall features.
  • the accuracy of recognition in scene recognition without user perception reduces misjudgment.
  • the first feature extraction layer is configured to extract the feature of the image at one azimuth angle to obtain a plurality of feature vectors
  • the first layer model is used to calculate the first weight corresponding to each feature vector in the plurality of feature vectors according to the azimuth angle corresponding to the image of the one azimuth angle;
  • the first layer model is configured to obtain the first recognition result according to each feature vector and the first weight corresponding to each feature vector.
  • the second layer model is used to calculate the azimuth of the first identification result according to the azimuth angle corresponding to the first identification result second weight; the second layer model is configured to obtain the scene recognition result according to the first recognition result and the second weight of the first recognition result.
  • the second layer model presets a third weight corresponding to each of the first recognition results, and the second The layer model is used to obtain the scene recognition result according to the first recognition result and the third weight corresponding to the first recognition result.
  • the second layer model is configured to determine each of the first recognition results according to the azimuth angle and a preset rule A corresponding fourth weight; wherein, the preset rule is a weight group set according to the azimuth angle, different azimuth angles correspond to different weight groups, and each weight group includes a fourth weight corresponding to each first identification result;
  • the second-layer model is configured to obtain the scene recognition result according to the first recognition result and the fourth weight corresponding to the first recognition result.
  • the device further includes: an azimuth angle acquisition module, configured to acquire the coordinate axis of the three-dimensional rectangular coordinate system corresponding to the gravity sensor when each camera collects the image to obtain the direction vector when each camera collects the image; wherein, the corresponding three-dimensional Cartesian coordinate system when each camera collects the image takes each camera as the origin, and the z direction is the direction along the camera.
  • x and y are directions perpendicular to the z direction respectively, and the planes where x and y are located are perpendicular to the z direction; the azimuth angle is calculated according to the direction vector and the gravity unit vector.
  • the apparatus further includes: an image preprocessing module, configured to preprocess the image; wherein the preprocessing Includes one or a combination of the following processes: Convert Image Format, Convert Image Channels, Unify Image Size, Image Normalization, Convert Image Format refers to converting a color image to black and white, Convert Image Channel refers to Converting to red, green, and blue RGB channels, unified image size refers to adjusting the length and width of multiple images to be the same, and image normalization refers to normalizing the pixel values of the images.
  • an image preprocessing module configured to preprocess the image; wherein the preprocessing Includes one or a combination of the following processes: Convert Image Format, Convert Image Channels, Unify Image Size, Image Normalization, Convert Image Format refers to converting a color image to black and white, Convert Image Channel refers to Converting to red, green, and blue RGB channels, unified image size refers to adjusting the length and width of multiple images to be the same, and image normalization refers to normalizing the
  • embodiments of the present application provide a terminal device, where the terminal device can execute the above first aspect or one or more of the scene recognition methods in multiple possible implementation manners of the first aspect.
  • embodiments of the present application provide a computer program product, comprising computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are stored in an electronic
  • the processor in the electronic device executes the first aspect or one or more of the scene recognition methods in the multiple possible implementation manners of the first aspect.
  • embodiments of the present application provide a non-volatile computer-readable storage medium on which computer program instructions are stored, characterized in that, when the computer program instructions are executed by a processor, the above-mentioned first aspect is implemented Or one or more scene recognition methods in multiple possible implementation manners of the first aspect.
  • FIG. 1a and 1b respectively show schematic diagrams of application scenarios according to an embodiment of the present application.
  • Fig. 2a shows a schematic diagram of scene recognition performed by a neural network model according to an embodiment of the present application.
  • FIG. 2b shows a flowchart of a scene recognition method according to an embodiment of the present application.
  • FIG. 3 shows a schematic diagram of an application scenario according to an embodiment of the present application.
  • FIG. 4a and FIG. 4b respectively show schematic diagrams of an azimuth angle determination manner according to an embodiment of the present application.
  • FIG. 5 is a block diagram showing the structure of a neural network model according to an embodiment of the present application.
  • FIG. 6 shows a schematic structural diagram of a first layer model according to an embodiment of the present application.
  • FIG. 7 shows a flowchart of a scene recognition method according to an embodiment of the present application.
  • FIG. 8 shows a flowchart of the method of step S701 according to an embodiment of the present application.
  • FIG. 9 shows a block diagram of a scene recognition apparatus according to an embodiment of the present application.
  • FIG. 10 shows a schematic structural diagram of a terminal device according to an embodiment of the present application.
  • Azimuth the angle between the direction vector of the camera and the unit vector of gravity.
  • the direction vector of the camera establish a three-dimensional rectangular coordinate system with the camera as the origin, the z direction is the direction along the camera, x and y are the directions perpendicular to the z direction, and the plane where x and y are located is perpendicular to the z direction, gravity
  • the vector composed of the acceleration of the sensor in the three directions of x, y and z is the direction vector of the camera.
  • the existing image-based scene recognition methods can be applied in the following scenarios: 1.
  • the image is only collected by a single camera for recognition, the viewing angle range is small, the observed effective features are few, and the scene recognition recall rate is low; Therefore, similar objects or features are prone to misjudgment.
  • the feature of the ceiling appears in the image, it is recognized as indoor, but it may actually be in the subway or plane; three , There is a large amount of object feature information in the same image, and the noise information may drown out the main features, resulting in misjudgment of the main features.
  • the present application provides a scene recognition method, which uses multiple cameras to shoot images of multiple azimuth angles of the same scene, and obtains the azimuth angles when the multiple cameras shoot (collect) images, and each camera shoots ( The azimuth angle when collecting) images is the azimuth angle corresponding to the image, and the scene is recognized by combining multiple images and the azimuth angle corresponding to each image. Since more comprehensive scene information is obtained, the image-based scene recognition can be improved. The accuracy solves the problem of limited viewing angle range and shooting angle of single camera recognition scene, and the recognition is more accurate.
  • the multiple cameras of the embodiments of the present application may be set on the terminal device.
  • the multiple cameras may be the front camera and the rear camera set on the mobile phone, and the multiple cameras may be set Multiple cameras at different orientations of the vehicle body, multiple cameras can also be multiple cameras set at different orientations of the drone, and so on. It should be noted that the above application scenarios are only some examples of the present application, and the present application is not limited thereto.
  • the mobile phone can be provided with a front camera and a rear camera. Through the front camera and the rear camera of the mobile phone, images from two angles can be collected, and the azimuth angle of the front camera and the rear camera can also be obtained. , the scene can be more accurately recognized according to the images of the two angles combined with the corresponding azimuth angles.
  • multiple cameras can be set on the body of an autonomous vehicle, and multiple cameras can be set at different positions. Top and so on, the direction of each camera can also be adjusted individually, and a controller can also be set on the self-driving car, and the controller can be connected to multiple cameras.
  • sensors can also be set on the self-driving car, such as GPS, radar, accelerometer, gyroscope, etc., all sensors and cameras are connected to the controller, the controller can collect images of different angles through multiple cameras, and can also obtain the azimuth of each camera, according to the multiple images and each The azimuth corresponding to each image can more accurately identify the scene.
  • FIG. 1 a and FIG. 1 b are only examples of application scenarios provided by the present application, and the present application is not limited thereto.
  • the present application can also be applied to a scene in which a drone collects images for scene recognition.
  • the scene recognition method provided by the embodiments of the present application can be applied to terminal devices.
  • the terminal devices of the present application can be smart phones, netbooks, tablet computers, notebook computers, wearable electronic devices (such as smart bracelets, smart watches, etc. ), TVs, virtual reality devices, speakers, e-ink, and more.
  • FIG. 10 shows a schematic structural diagram of a terminal device according to an embodiment of the present application. Taking the terminal device as a mobile phone as an example, FIG. 10 shows a schematic structural diagram of the mobile phone 200 , for details, please refer to the specific description below.
  • scene recognition is performed based on the front and rear cameras and acceleration sensors of the mobile phone.
  • the front and rear cameras simultaneously collect images of different azimuth angles
  • the acceleration sensor of the mobile phone is used to extract the angle between the current direction of the mobile phone camera and the direction of gravity, and this angle can be used as the azimuth angle of the camera.
  • the scene recognition method provided by this embodiment of the present application may be implemented by using a neural network model (scene recognition model).
  • the neural network model used in this embodiment of the present application may include: multiple pairs of the first feature extraction layer and the first layer model, each pair of the first feature extraction layer and the first layer model is used to The image and the azimuth angle corresponding to the image of one azimuth angle are processed to obtain the first recognition result, and the azimuth angle corresponding to the image of one azimuth angle is the azimuth angle corresponding to the first recognition result.
  • the first feature extraction layer is used to extract the features of the image to obtain the feature vector (feature map).
  • the azimuth angle corresponding to the image of one azimuth angle is obtained, and the first recognition result is obtained.
  • the neural network model may further include a second layer model, and the second layer model is used to obtain the scene recognition result according to the first recognition result and the azimuth angle corresponding to the first recognition result.
  • the example shown in Figure 2a includes two pairs of the first feature extraction layer and the first layer model.
  • the application scenario shown in Figure 2a can be applied to a dual-camera (front and rear dual-camera) scenario.
  • Image 1 and Image 2 can be obtained through different An image acquired by a camera with an angle, for example, image 1 is acquired by the front camera of the mobile phone, and image 2 is acquired by the rear camera of the mobile phone.
  • the logarithm of the first feature extraction layer and the first layer model included in the neural network model can be configured according to the number of camera angles set in a specific application scenario.
  • the first feature extraction layer The logarithm to the first layer model can be greater than or equal to the number of angles.
  • the first feature extraction layer can be implemented by using Convolutional Neural Networks (CNN, Convolutional Neural Networks).
  • CNN Convolutional Neural Networks
  • the features of the input image are extracted through CNN to obtain a feature map.
  • the VGG model Visual Geometry Group Network
  • Inception model MobileNet
  • ResNet ResNet
  • DenseNet DenseNet
  • Transformer and other convolutional neural network models are used as the first feature extraction layer to extract feature maps
  • the convolutional neural network structure can also be customized as the first feature extraction layer.
  • Both the first-layer model and the second-layer model can be implemented based on the attention mechanism, and the second-layer model can also be implemented by presetting weights for each azimuth angle or presetting a weight mapping function according to the azimuth angle. Not limited.
  • the scene recognition method provided by the embodiment of the present application uses a two-layer scene recognition model based on a competition mechanism (attention mechanism): the first layer model can assign a convolution result (a feature obtained by CNN performing a convolution operation on an input image) according to the azimuth angle. vector) different weights, activate the neurons that identify the local features of different scenes, extract the key features of the image at the azimuth angle, perform the first scene classification, and obtain the first identification result; the second layer model calculates the difference in different scenes by calculating The azimuth angle is used to weight the scene classification results, and the weighted summation is used to obtain the classification results of multiple images from different perspectives, and the final scene recognition result is obtained.
  • a competition mechanism an algorithm that uses a two-layer scene recognition model based on a competition mechanism (attention mechanism): the first layer model can assign a convolution result (a feature obtained by CNN performing a convolution operation on an input image) according to the azimuth angle. vector) different weights, activate the neurons that identify the local features of
  • Using a two-layer competition mechanism-based scene recognition model combined with azimuth information can identify key features and filter irrelevant information, effectively reducing the probability of misrecognition.
  • aircraft and high-speed rail cannot be distinguished from an upward perspective (the ceiling features are similar and difficult to distinguish) , and can be distinguished from a side view (a round window of an airplane and a square window of a high-speed rail are easy to distinguish), and the scene recognition method of the embodiment of the present application helps to reduce misjudgments.
  • the azimuth feature extraction (dotted line box) shown in FIG. 2a can be realized by a neural network, that is to say, the neural network model provided in this embodiment of the present application may further include a second feature extraction layer, and the second feature extraction layer can also be implemented through a volume A neural network model implementation.
  • the azimuth angle feature extraction (dotted line frame) shown in FIG. 2a can also be obtained by calculating the azimuth angle according to an existing function, which is not limited in this application.
  • the sensor shown in Figure 2a can be an accelerometer, a gyroscope, etc.
  • the posture of the terminal device can be obtained through the motion data of the terminal device collected by the sensor, and the direction of the camera can be determined according to the posture of the terminal device and the orientation of the camera.
  • the orientation, as well as the direction of gravity, can determine the azimuth of the camera.
  • Fig. 2b shows a flowchart of a scene recognition method according to an embodiment of the present application. The following describes the flowchart of the image processing method of the present application in detail with reference to Figs. 2a and 2b.
  • Multiple cameras with different orientations set on the terminal device collect images in the same scene, and the terminal device obtains images from different viewing angles (azimuth angles) of the same scene.
  • the front camera and the rear camera of the mobile phone are used to capture images in the same scene at the same time, and images from different perspectives of the same scene are obtained.
  • the image captured by the front camera may be an image captured by a single camera, or may be an image obtained by combining images captured by multiple cameras; the image captured by the rear camera may also be an image captured by a single camera, or an image captured by a single camera. It can be a composite image of images captured by multiple cameras. Wherein, in a scenario where images captured by multiple cameras are combined into one image, the azimuth angle of this image is the same as the azimuth angle of a single camera.
  • the images captured by the cameras in the embodiments of the present application may be black and white images, RGB (Red, Green, Blue) color images, or RGB-D (RGB-Depth) depth images (D refers to depth information), or It can be an infrared image, which is not limited in this application.
  • FIG. 3 shows a schematic diagram of an application scenario according to an embodiment of the present application.
  • the user looks at the mobile phone on the subway, and the mobile phone is inclined at a certain angle to the surface of the subway floor. Therefore, the front camera of the mobile phone can capture the picture of the top of the subway, and the The rear camera can capture the picture of the subway floor.
  • the terminal device can collect the azimuth angles of cameras from multiple viewing angles while collecting images.
  • the azimuth angle of the camera may refer to the angle between the direction vector of the camera and the unit vector of gravity.
  • the direction vector of the camera may be acquired by a sensor, such as a gravity sensor, an accelerometer, a gyroscope, etc., which is not limited in this application.
  • the direction vector of the front camera can be obtained by a gravity sensor.
  • FIG. 4a and FIG. 4b respectively show schematic diagrams of an azimuth angle determination manner according to an embodiment of the present application.
  • a three-dimensional rectangular coordinate system can be established with the front camera as the origin, where the z direction is the direction along the front camera and is perpendicular to the plane of the mobile phone, and x and y are respectively parallel to the border of the mobile phone , and the direction perpendicular to the z direction.
  • the azimuth of the front camera is ⁇
  • the azimuth angle of the front camera can be calculated by formula (2):
  • the terminal device may separately preprocess the images collected by each camera, wherein the preprocessing may include one or more processing methods among image format conversion, image size unification, image normalization, and image channel conversion.
  • the image format conversion may refer to converting a color image into a black and white image.
  • Image channel conversion may refer to converting a color image to three RGB channels, and the three channels are Red, Green, and Blue in turn.
  • Unifying the image size may refer to unifying the length and width of the images collected by each camera, for example, the length of the unified image is 800 pixels and the width is 600 pixels.
  • p_R, p_G, and p_B represent the pixel values before normalization, respectively
  • p_R_Normalized, p_G_Normalized, and p_B_Normalized represent the pixel values after normalization, respectively.
  • the collected images may be processed through one or more of the above preprocessing methods.
  • the preprocessing process may include:
  • Step1 Image format conversion, convert the collected color image into a black and white image, and convert the color image into a black and white image.
  • the relevant grayscale formula method or the average method can be used to calculate the value of the pixel after conversion according to the value of the pixel before conversion;
  • Step2 Unify the image size, unify the size of the image size, for example, unify the size of the image to: length 800, width 600;
  • the preprocessing process may include:
  • Step1 Image channel conversion. For color images, it can be uniformly converted into RGB three channels, that is, the channels are Red, Green, and Blue in turn; if the images before preprocessing are all in RGB format, the process of image channel conversion can be omitted.
  • Step2 Unify the image size, unify the size of the image size, for example, unify the size of the image to: length 800, width 600;
  • Step3 Image normalization.
  • the normalization processing method is:
  • the preprocessing manner of the terminal device for the images collected by the cameras at each angle may be the same or different, which is not limited in the present application.
  • the format of the images can be unified, which is beneficial to the subsequent process of feature extraction and scene recognition.
  • Step 2 Perform feature extraction on the azimuth angle collected in step 2 to obtain the azimuth angle feature.
  • one or more of the following processing methods can be performed on the azimuth angle: numerical normalization, discretization, one-bit Efficient encoding, trigonometric transformations, and more. This application provides a variety of different feature extraction methods, the following are a few examples.
  • Example 1 the terminal device can discretize the collected azimuth, and then perform one-hot encoding (one-bit effective encoding). Discretization is to map individuals into a limited space without changing the relative size of the data.
  • the azimuth angle can be discretely divided into [0°, 45°), [45°, 90°), [90°, 135°), [135°, 180°] four intervals, the corresponding code of the interval to which the azimuth is mapped is 1, and the code of other intervals is 0, and the corresponding code of the four intervals is 1.
  • the feature vectors are [1,0,0,0], [0,1,0,0], [0,0,1,0], [0,0,0,1] respectively.
  • the azimuth angle ⁇ is 30°, which is mapped to the interval [0°, 45°], then the corresponding azimuth angle feature is [1, 0, 0, 0].
  • the terminal device may directly perform trigonometric function transformation on the azimuth angle, and the value obtained after the transformation is normalized to the [0,1] interval as the azimuth angle feature.
  • the trigonometric function change can refer to sin ⁇ , cos ⁇ , tan ⁇ , etc.
  • Example 3 discretize the value [0, 1] interval of the normalized trigonometric function into four intervals of [0, 0.25), [0.25, 0.5), [0.5, 0.75), [0.75, 1.0], terminal
  • the device can first perform trigonometric function transformation on the azimuth and normalize it, and determine the azimuth feature according to the interval to which the value of the normalized trigonometric function is mapped.
  • the scene recognition is performed through a neural network model, and the process of scene recognition is described in conjunction with the framework of the neural network model shown in FIG. 2a.
  • the terminal device uses the preprocessed images in step 3 as the input data of multiple first feature extraction layers, respectively uses the azimuth angle features obtained in step 4 as the input data for multiple first-layer models, and each pair of first The image and azimuth angle received by the feature extraction layer and the first layer model are associated, that is, the image captured by the camera with the same azimuth angle and the azimuth angle feature corresponding to the camera are used as a pair of the first feature extraction layer and the first Input data for the layer model.
  • the front camera of the mobile phone captures an image and preprocesses the image to obtain image 1.
  • the azimuth angle of the front camera is ⁇ 1, and the azimuth angle feature of ⁇ 1 is C1.
  • the neural network model includes feature extraction layer 1 and the first layer model 1.
  • the output of the feature extraction layer 1 is the input of the first layer model 1.
  • the neural network model can also include the feature extraction layer 2 and the first layer model 2.
  • the feature extraction layer The output of 2 is the input of the first layer model 2.
  • the terminal device can use image 1 as the input of the feature extraction layer 1 and C1 as the input of the first layer model 1, or the terminal device can also use the image 1 as the input of the feature extraction layer 2 and C1 as the first layer model 2 input of.
  • serial numbers of image 1, image 2, feature extraction layer 1, feature extraction layer 2, first-layer model 1, and first-layer model 2 in this application are not intended to limit the order and correspondence, but are only to distinguish different
  • the numbers set for the modules are not to be construed as limitations on this application.
  • the first feature extraction layer extracts the features of the image to obtain a feature map (feature vector), and inputs the feature map (feature vector) into the first layer model.
  • the first recognition result can be obtained by recognizing (classifying) the scene of the image.
  • the terminal device can also use the azimuth angle feature as the input data of the second-layer model, and the second-layer model can further identify (classify) the scene according to the first recognition result output by the first-layer model and the corresponding azimuth angle feature, and obtain Scene recognition results.
  • FIG. 5 is a block diagram showing the structure of a neural network model according to an embodiment of the present application.
  • FIG. 6 shows a schematic structural diagram of a first layer model according to an embodiment of the present application.
  • the terminal device of the present application performs image acquisition on J angles (azimuth angles), that is, the camera with J angles is used to collect images. Image from J angles.
  • Image 1 is used as the input data of the CNN on the upper side, and the CNN on the upper side performs feature extraction on the input image 1 to obtain a feature vector y i , where i is a positive integer from 1 to n.
  • image 2 is used as the input data of the CNN on the lower side, and the CNN on the lower side performs feature extraction on the input image 2 to obtain a feature vector xi .
  • the terminal device takes the feature vector yi and the azimuth angle feature C1 as the input data of the first layer model on the upper side.
  • the first layer model can calculate the first recognition result Z j according to the feature vector y i and the azimuth angle feature C1, where j is a positive integer from 1 to J, and J represents the angle of the collected image (azimuth angle ), J is equal to 2 in this example, and j is equal to 1 in the example of FIG. 6 .
  • the first layer model can be implemented based on the attention mechanism, and specifically, it can include the activation function (tanh), the softmax function, and the weighted average process shown in FIG. 6 . Among them, tanh is one of the activation functions, and the present application is not limited to this, and other activation functions can also be used, such as: sigmoid, ReLU, LReLU, ELU function, and so on.
  • C j represents the azimuth feature corresponding to the image at this angle
  • y i is the feature vector output by CNN
  • Wi and bi are the weight and bias value of the activation function, respectively
  • [C j , y i ] represents the feature
  • the vector obtained by splicing the vector and the azimuth feature Represents the parameters of the softmax function. According to the mi calculated by the tanh function, it can be determined whether to activate the corresponding neuron, and the corresponding features of the tanh function are extracted as the basis for classification.
  • the softmax function normalizes the mi to obtain the weight of each feature vector. . Therefore, the azimuth feature can affect the weight s i of the calculated eigenvector y i .
  • the scene recognition model provided by the present application can identify key features according to the azimuth angle, filter irrelevant information, reduce noise, improve recognition accuracy, and realize scene recognition without user perception.
  • the weight si calculated according to formula (3) can represent the weight value assigned to different features extracted from the image, and the feature vector and the corresponding weight are weighted and summed to obtain the first recognition result Z 1 of the first layer model.
  • the terminal device uses the feature vector x i (i is a positive integer from 1 to n', and n' and n may be equal or not equal, which is not limited in this application) and the azimuth angle feature C 2 as the first layer of the lower side.
  • the input data of the model can be calculated to obtain the first recognition result Z 2 in the same manner as the first layer model on the upper side.
  • the neural network model of the embodiment of the present application extracts feature maps through CNN, and assigns different weights to different convolution results of images in combination with azimuth angles.
  • the first-layer model identifies key features and filters irrelevant information through a competitive mechanism, which can effectively reduce the probability of misrecognition. .
  • the structure of the second layer model is shown in Figure 5, f 1 and f 2 represent the activation function tanh, wherein the number of tanh functions in the second layer model and the number of angles at which images are collected
  • the number of tanh functions can be equal to or greater than the number of angles at which images are acquired.
  • the input of the tanh function includes the output result Z j of the first layer model and the azimuth angle feature C j
  • the second layer model also includes the softmax function
  • the softmax function calculates the weight of the output result Z j of the first layer model according to the calculation result of the tanh function.
  • S j and finally, the second layer model calculates the final scene recognition result Z according to the calculated weight S j and the output result Z j of the first layer model.
  • the specific calculation method is shown in the following formula (4):
  • [C j , Z j ] represents the vector obtained by splicing the first recognition result Z j of image 1 and the corresponding azimuth feature C j , W j and b j represent the tanh function weight and bias value, respectively, Represents the parameters of the softmax function.
  • M j calculated by the tanh function, it can be determined whether to activate the corresponding neuron, and the feature corresponding to the tanh function (the first recognition result) is extracted as the basis for classification.
  • the softmax function normalizes M j to get The weight of each first recognition result. Therefore, the azimuth angle feature can affect the weight S j of the calculated first identification result Z j .
  • the scene recognition model provided by the present application can extract the features of key angles according to the azimuth angle, filter irrelevant angles, improve the recognition accuracy, and realize scene recognition without user perception.
  • the weight S j calculated according to formula (4) can represent the weight value assigned to different first recognition results, and the first recognition result and the corresponding weight are weighted and summed to obtain the scene recognition result Z of the second layer model.
  • W i , bi , W j , b j and are model parameters
  • the parameter values can be obtained by training the neural network model shown in FIG. 5 using sample data
  • the neural network model of the present application can be trained by using the training methods in the related art. Repeat.
  • the second-layer model may also be implemented by presetting weights for each azimuth angle, voting, or presetting a weight mapping function according to the azimuth angle. That is to say, in another embodiment of the present application, the neural network model includes multiple pairs of feature extraction layers and first-layer models, and also includes a second-layer model. It is realized by the way of preset weight mapping function according to azimuth angle.
  • the second-layer model can preset weights for each azimuth angle.
  • the preset corresponding weight is S j
  • the second-layer model is based on each first recognition result Z j .
  • the first recognition result Z j and the corresponding weighted weight S j can be calculated to obtain the scene recognition result
  • the first layer model may also preset a weight mapping function according to the azimuth angle, that is, different azimuth angles correspond to different preset weight groups, and the preset weight groups may include weights corresponding to each first identification result Z j .
  • the terminal device collects images from two angles through the front camera and the rear camera, and recognizes the images from the two angles to obtain two first recognition results Z 1 and Z 2 , and the corresponding weights are S respectively. 1 and S 2
  • the preset weight mapping function according to the azimuth angle can be as follows:
  • the terminal device may output the final recognition result according to the scene recognition result and the preset strategy, wherein the preset strategy may include outputting the final recognition result after filtering the scene recognition result according to the confidence threshold, or combining multiple categories into one category Output the final recognition result, and so on.
  • Example 2 Assuming that the scene recognition result includes 100 categories, the terminal device can combine several categories into a larger category output, for example, combine cars and buses into the car category as the final recognition result output.
  • FIG. 7 shows a flowchart of a scene recognition method according to an embodiment of the present application.
  • the scene recognition method may include the following steps:
  • Step S700 the terminal device collects images in the same scene from multiple azimuth angles through multiple cameras, wherein the azimuth angle when each camera collects the image is the azimuth angle corresponding to the image, and the azimuth angle is the azimuth angle corresponding to the image.
  • Step S701 the terminal device recognizes the same scene according to the image and the azimuth angle corresponding to the image, and obtains a scene recognition result.
  • multiple cameras are used to shoot images of multiple azimuth angles of the same scene, and the scene is recognized by combining the multiple images and the azimuth angles corresponding to each image. Since more comprehensive scene information is obtained, Therefore, the accuracy of image-based scene recognition can be improved, the problems of limited viewing angle range and shooting angle of a single camera to recognize a scene can be solved, and the recognition can be more accurate.
  • the images captured by the cameras in the embodiments of the present application may be black and white images, RGB (Red, Green, Blue) color images, or RGB-D (RGB-Depth) depth images (D refers to depth information), or It can be an infrared image, which is not limited in this application.
  • an image of an azimuth angle may be an image captured by one camera, or may be an image obtained by combining images captured by multiple cameras.
  • a mobile phone may include multiple rear cameras, and the image captured by the rear camera may be an image obtained by combining images captured by the multiple cameras.
  • the method may further include:
  • the terminal device preprocesses the image; wherein, the preprocessing includes one or a combination of the following processes: converting image format, converting image channels, unifying image size, and image normalizing, where the converted image format is It refers to converting a color image into a black and white image. Converting an image channel refers to converting the image to red, green and blue RGB channels. Unifying image size refers to adjusting multiple images to have the same length and width. Image normalization refers to converting the pixels of an image. Value normalization.
  • Each azimuth angle can adopt the same preprocessing method, or can adopt different preprocessing methods according to the images collected at each azimuth angle, which is not limited in this application.
  • the terminal device obtains the acceleration on the coordinate axis of the three-dimensional rectangular coordinate system corresponding to the gravity sensor when each camera collects the image, and can obtain the direction vector when each camera collects the image;
  • the corresponding three-dimensional rectangular coordinate system takes each camera as the origin, the z direction is the direction along the camera, and x and y are the directions perpendicular to the z direction respectively;
  • the azimuth is calculated using the gravity unit vector.
  • the specific process please refer to the content of Part 2 above, which will not be repeated.
  • the terminal device may preset a weight corresponding to each azimuth angle, and weight the image recognition result of each azimuth angle according to the weight of each azimuth angle to obtain the final scene recognition result.
  • the image and the azimuth corresponding to the image can also be input into the trained neural network model to identify the same scene, and the scene identification result can be obtained.
  • the present application does not limit the specific manner of scene recognition.
  • FIG. 8 shows a flowchart of the method of step S701 according to an embodiment of the present application.
  • the terminal device recognizes the same scene according to the image and the azimuth angle corresponding to the image, and obtains a scene recognition result, which may include:
  • Step S7010 the terminal device extracts the azimuth angle feature corresponding to the image from the azimuth angle corresponding to the image;
  • Step S7011 using a scene recognition model to recognize the same scene based on the image and the azimuth feature corresponding to the image, to obtain a scene recognition result, wherein the scene recognition model is a neural network model.
  • step S7010 for the specific process, reference may be made to the introduction in Part 4 above, and details are not repeated here.
  • the image and the azimuth angle feature corresponding to the image can be input into the scene recognition model, and the scene recognition model can be used to identify the image based on the image and the azimuth angle feature corresponding to the image. The same scene, get the scene recognition result.
  • the scene recognition model can be implemented through a variety of different neural network structures.
  • the scene recognition model includes multiple pairs of first feature extraction layers and In the first layer model, each pair of the first feature extraction layer and the first layer model is used to process an image of an azimuth angle and an azimuth angle corresponding to the image of the one azimuth angle to obtain a first recognition result.
  • An example of the first feature extraction layer may be image feature extraction layer 1 and feature extraction layer 2 as shown in FIG. 2a.
  • the logarithm of the first feature extraction layer and the first layer model included in the neural network model can be configured according to the number of camera angles set in specific application scenarios.
  • the first feature extraction layer and the first The logarithm of the layer model can be greater than or equal to the number of angles.
  • the azimuth angle corresponding to the image of the one azimuth angle is the azimuth angle corresponding to the first recognition result
  • the first feature extraction layer is used to extract the feature of the image of the one azimuth angle to obtain the feature vector
  • the The first layer model is used to obtain the first recognition result according to the feature vector and the azimuth angle corresponding to the image of the one azimuth angle.
  • the azimuth feature 1 corresponding to the image 1 extracted from the azimuth corresponding to the image 1 can be used as the input of the first layer model 1, and the azimuth feature extracted from the azimuth corresponding to the image 2.
  • the azimuth feature 2 corresponding to the image 2 can be used as the input of the first layer model 2;
  • the feature extraction layer 1 can extract the feature vector 1 of the image 1 and output it to the first layer model 1, and the feature extraction layer 2 can extract the feature vector of the image 2 2 and output to the first layer model 2.
  • the first layer model 1 can combine the feature vector 1 and the azimuth feature 1 to perform scene recognition to obtain the first recognition result 1
  • the first layer model 2 can combine the feature vector 2 and the azimuth feature 2 to perform scene recognition to obtain the first recognition result 2.
  • the scene recognition model further includes a second layer model, and the first layer model 1 and the first layer model 2 can respectively output the first recognition result 1 and the first recognition result 2 to the second layer model.
  • the azimuth feature 1 corresponding to the image 1 extracted from the azimuth corresponding to the image 1 can be used as the input of the second layer model.
  • the azimuth feature 1 corresponds to the first recognition result 1, and the azimuth feature is extracted from the
  • the azimuth feature 2 corresponding to the image 2 extracted from the azimuth angle can be used as the input of the second layer model, and the azimuth feature 2 corresponds to the first recognition result 2 .
  • the second layer model is used to obtain the scene recognition result according to the first recognition result and the azimuth angle corresponding to the first recognition result.
  • the scene recognition method of the embodiment of the present application adopts a two-layer scene recognition model, collects images from multiple angles and combines the azimuth angles of the images to perform scene recognition by using a competition mechanism, and considers the results of local features and overall features.
  • the accuracy of recognition in scene recognition without user perception reduces misjudgment.
  • the first feature extraction layer is used to extract the features of the image of the one azimuth angle to obtain multiple feature vectors; the first layer model is used to extract the features of the one azimuth angle according to the The azimuth angle corresponding to the image calculates the first weight corresponding to each feature vector in the plurality of feature vectors; the first layer model is used to calculate the first weight corresponding to each feature vector and each feature vector, The first identification result is obtained.
  • the second layer model is used to calculate the second weight of the first recognition result according to the azimuth angle corresponding to the first recognition result; the second layer model is used to calculate the second weight of the first recognition result according to the first recognition result and the first recognition result. A second weight of the recognition result to obtain the scene recognition result.
  • the first layer model may include an activation function and a softmax function, where the activation function may be a tanh function, and the activation function may also use other types of activation functions, not limited to those shown in Figures 5 and 6
  • Sigmoid activation function Relu activation function
  • the number of activation functions in the first layer model can be set according to the number of feature vectors extracted by the feature extraction layer, and can be greater than or equal to the number of extracted feature vectors.
  • the activation function is used to determine whether to activate the corresponding neuron according to the feature vector and the azimuth feature, and the feature corresponding to the activation function is extracted as the basis for classification.
  • the activation function and the softmax function are used to calculate the first weight corresponding to each feature vector, as above. For the si calculated by the formula (3), the specific process can refer to the above, and will not be repeated here.
  • the azimuth feature can affect the weight s i of the calculated feature vector y i . If the calculated si is relatively large, the feature vector has a greater impact on the classification result. If the calculated si is relatively small, then the feature Vectors have less influence on the results of classification. Therefore, the scene recognition model provided by the present application can identify key features according to the azimuth angle, filter irrelevant information, reduce noise, improve recognition accuracy, and realize scene recognition without user perception.
  • the second-layer model may include an activation function and a softmax function, wherein the activation function may be a tanh function, and the activation function may also adopt other types of activation functions, which are not limited to the examples shown in FIG. 5 and FIG. 6 .
  • the number of activation functions in the second-layer model is related to the number of angles at which images are collected, and the number of activation functions may be equal to or greater than the number of angles at which images are collected.
  • the input of the activation function includes the output result Z j of the first layer model and the azimuth angle feature C j
  • the second layer model also includes a softmax function
  • the softmax function calculates the weight of the output result Z j of the first layer model according to the calculation result of the activation function.
  • S j the specific calculation process can refer to formula (4) and the description in the above section 5, and will not be repeated here.
  • the azimuth angle feature can affect the weight S j of the calculated first identification result Z j . If the calculated S j is relatively large, the first identification result has a greater impact on the classification result. If the calculated S j is relatively small , then the first recognition result has less influence on the classification result. Therefore, the scene recognition model provided by the present application can extract the features of key angles according to the azimuth angle, filter irrelevant angles, improve the recognition accuracy, and realize scene recognition without user perception.
  • the second-layer model presets a third weight corresponding to each of the first recognition results, and the second-layer model is configured to use the first recognition result and the The third weight corresponding to the first recognition result is used to obtain the scene recognition result.
  • the second-layer model can preset weights for each azimuth angle.
  • the preset corresponding weight is S j
  • the second-layer model is based on each first recognition result Z j .
  • the first recognition result Z j and the corresponding weighted weight S j can be calculated to obtain the scene recognition result
  • the second layer model is configured to determine a fourth weight corresponding to each of the first recognition results according to the azimuth angle and a preset rule; wherein the preset rule is: According to the weight group set by the azimuth angle, different azimuth angles correspond to different weight groups, and each weight group includes a fourth weight corresponding to each first identification result; the second layer model is used for identifying according to the first identification result. The result and the fourth weight corresponding to the first recognition result are used to obtain the scene recognition result.
  • the first layer model may also preset a weight mapping function according to the azimuth angle, that is, different azimuth angles correspond to different preset weight groups, and the preset weight groups may include each first identification result Z j . corresponding weight.
  • the terminal device collects images from two angles through the front camera and the rear camera, and recognizes the images from the two angles to obtain two first recognition results Z 1 and Z 2 , and the corresponding weights are S respectively. 1 and S 2
  • the preset weight mapping function according to the azimuth angle can be as follows:
  • FIG. 9 shows a block diagram of a scene recognition apparatus according to an embodiment of the present application.
  • the apparatus may include:
  • the image acquisition module is used to collect images in the same scene from multiple azimuth angles through multiple cameras, wherein the azimuth angle when each camera collects the image is the azimuth angle corresponding to the image, and the azimuth angle is the azimuth angle corresponding to the image. the angle between the direction vector and the unit vector of gravity when each camera collects the image;
  • a scene recognition module configured to recognize the same scene according to the image and the azimuth angle corresponding to the image, and obtain a scene recognition result.
  • the scene recognition device of the embodiment of the present application uses multiple cameras to capture images of multiple azimuth angles of the same scene, and recognizes the scene in combination with the multiple images and the azimuth angle corresponding to each image. Since more comprehensive scene information is obtained, Therefore, the accuracy of image-based scene recognition can be improved, the problems of limited viewing angle range and shooting angle of a single camera to recognize a scene can be solved, and the recognition can be more accurate.
  • the scene recognition module includes:
  • an azimuth feature extraction module used for extracting the azimuth feature corresponding to the image from the azimuth angle corresponding to the image
  • a scene recognition model configured to recognize the same scene based on the image and the azimuth angle feature corresponding to the image, and obtain a scene recognition result, wherein the scene recognition model is a neural network model.
  • the scene recognition model includes multiple pairs of the first feature extraction layer and the first layer model, and each pair of the first feature extraction layer and the first layer model is used to compare the image of an azimuth angle and all the The azimuth angle corresponding to the image of one azimuth angle is processed to obtain the first recognition result; wherein, the azimuth angle corresponding to the image of the one azimuth angle is the azimuth angle corresponding to the first recognition result, and the first feature extraction The layer is used to extract the feature of the image of the one azimuth angle to obtain a feature vector, and the first layer model is used to obtain the first identification according to the feature vector and the azimuth angle corresponding to the image of the one azimuth angle. Result; the scene recognition model further includes a second layer model, and the second layer model is used to obtain the scene recognition result according to the first recognition result and the azimuth angle corresponding to the first recognition result.
  • the scene recognition device of the embodiment of the present application adopts a two-layer scene recognition model, collects multi-angle images and combines the azimuth angles of the images to perform scene recognition by using a competition mechanism, and considers the results of local features and overall features.
  • the accuracy of recognition in scene recognition without user perception reduces misjudgment.
  • the first feature extraction layer is used to extract the features of the image of the one azimuth angle to obtain multiple feature vectors; the first layer model is used to extract the features of the one azimuth angle according to the The azimuth angle corresponding to the image calculates the first weight corresponding to each feature vector in the plurality of feature vectors; the first layer model is used to calculate the first weight corresponding to each feature vector and each feature vector, The first identification result is obtained.
  • the second layer model is used to calculate the second weight of the first recognition result according to the azimuth angle corresponding to the first recognition result; the second layer model is used to calculate the second weight of the first recognition result according to the azimuth angle corresponding to the first recognition result;
  • the first recognition result and the second weight of the first recognition result are used to obtain the scene recognition result.
  • the second-layer model presets a third weight corresponding to each of the first recognition results, and the second-layer model is configured to use the first recognition result and the The third weight corresponding to the first recognition result is used to obtain the scene recognition result.
  • the second layer model is configured to determine a fourth weight corresponding to each of the first recognition results according to the azimuth angle and a preset rule; wherein the preset rule is: According to the weight group set by the azimuth angle, different azimuth angles correspond to different weight groups, and each weight group includes a fourth weight corresponding to each first identification result; the second layer model is used for identifying according to the first identification result. The result and the fourth weight corresponding to the first recognition result are used to obtain the scene recognition result.
  • the device further includes: an azimuth angle acquisition module, configured to acquire the acceleration on the coordinate axis of the three-dimensional rectangular coordinate system corresponding to the gravity sensor when each camera collects the image, and obtain each The direction vector when the camera collects the image; wherein, the corresponding three-dimensional rectangular coordinate system when each camera collects the image takes each camera as the origin, the z direction is the direction along the camera, and x and y are respectively The direction perpendicular to the z direction, and the plane where x and y are located is perpendicular to the z direction; the azimuth angle is calculated according to the direction vector and the gravity unit vector.
  • an azimuth angle acquisition module configured to acquire the acceleration on the coordinate axis of the three-dimensional rectangular coordinate system corresponding to the gravity sensor when each camera collects the image, and obtain each The direction vector when the camera collects the image
  • the corresponding three-dimensional rectangular coordinate system when each camera collects the image takes each camera as the origin, the z direction is the direction along the camera, and x and y
  • the apparatus further includes: an image preprocessing module, configured to perform preprocessing on the image; wherein, the preprocessing includes a combination of one or more of the following processes: converting Image format, convert image channel, unify image size, image normalization, convert image format refers to converting color image to black and white image, convert image channel refers to converting image to red, green and blue RGB channel, and unify image size refers to adjusting Multiple images have the same length and the same width, and image normalization refers to normalizing the pixel values of the images.
  • an image preprocessing module configured to perform preprocessing on the image; wherein, the preprocessing includes a combination of one or more of the following processes: converting Image format, convert image channel, unify image size, image normalization, convert image format refers to converting color image to black and white image, convert image channel refers to converting image to red, green and blue RGB channel, and unify image size refers to adjusting Multiple images have the same length and the same width, and image normalization refers to normalizing the
  • the apparatus further includes: a result output module, configured to output the scene recognition result.
  • FIG. 10 shows a schematic structural diagram of a terminal device according to an embodiment of the present application. Taking the terminal device being a mobile phone as an example, FIG. 10 shows a schematic structural diagram of the mobile phone 200.
  • the mobile phone 200 may include a processor 210, an external memory interface 220, an internal memory 221, a USB interface 230, a charging management module 240, a power management module 241, a battery 242, an antenna 1, an antenna 2, a mobile communication module 251, a wireless communication module 252, Audio module 270, speaker 270A, receiver 270B, microphone 270C, headphone jack 270D, sensor module 280, buttons 290, motor 291, indicator 292, camera 293, display screen 294, SIM card interface 295, etc.
  • a processor 210 an external memory interface 220, an internal memory 221, a USB interface 230, a charging management module 240, a power management module 241, a battery 242, an antenna 1, an antenna 2, a mobile communication module 251, a wireless communication module 252, Audio module 270, speaker 270A, receiver 270B, microphone 270C, headphone jack 270D, sensor module 280, buttons 290, motor 291, indicator 292, camera 293, display screen 294, SIM card interface 295, etc.
  • the sensor module 280 may include a gyroscope sensor 280A, an acceleration sensor 280B, a proximity light sensor 280G, a fingerprint sensor 280H, and a touch sensor 280K (of course, the mobile phone 200 may also include other sensors, such as a temperature sensor, a pressure sensor, a distance sensor, and a magnetic sensor. , ambient light sensor, air pressure sensor, bone conduction sensor, etc., not shown in the figure).
  • the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the mobile phone 200 .
  • the mobile phone 200 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 210 may include one or more processing units, for example, the processor 210 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or Neural-network Processing Unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • the controller may be the nerve center and command center of the mobile phone 200 . The controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 210 for storing instructions and data.
  • the memory in processor 210 is cache memory.
  • the memory may hold instructions or data that have just been used or recycled by the processor 210 . If the processor 210 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided, and the waiting time of the processor 210 is reduced, thereby improving the efficiency of the system.
  • the processor 210 can run the scene recognition method provided by the embodiment of the present application, so as to recognize the scene in combination with multiple images and the azimuth angle corresponding to each image, obtain more comprehensive scene information, and improve the accuracy of image-based scene recognition. , solves the problem of limited viewing angle range and shooting angle of single camera recognition scene, and the recognition is more accurate.
  • the processor 210 may include different devices. For example, when a CPU and a GPU are integrated, the CPU and the GPU may cooperate to execute the scene recognition method provided by the embodiments of the present application. For example, some algorithms in the scene recognition method are executed by the CPU, and another part of the algorithms are executed by the GPU. for faster processing efficiency.
  • Display screen 294 is used to display images, videos, and the like.
  • Display screen 294 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light).
  • LED organic light-emitting diode
  • AMOLED organic light-emitting diode
  • FLED flexible light-emitting diode
  • Miniled MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
  • cell phone 200 may include 1 or N display screens 294, where N is a positive integer greater than 1.
  • the display screen 294 may be used to display information entered by or provided to the user as well as various graphical user interfaces (GUIs).
  • GUIs graphical user interfaces
  • display 294 may display photos, videos, web pages, or documents, and the like.
  • display 294 may display a graphical user interface.
  • the GUI includes a status bar, a hideable navigation bar, a time and weather widget, and an application icon, such as a browser icon.
  • the status bar includes operator name (eg China Mobile), mobile network (eg 4G), time and remaining battery.
  • the navigation bar includes a back button icon, a home button icon, and a forward button icon.
  • the status bar may further include a Bluetooth icon, a Wi-Fi icon, an external device icon, and the like.
  • the graphical user interface may further include a Dock bar, and the Dock bar may include commonly used application icons and the like.
  • the display screen 294 may be an integrated flexible display screen, or a spliced display screen composed of two rigid screens and a flexible screen located between the two rigid screens.
  • Cameras 293 are used to capture still images or video.
  • the camera 293 may include a photosensitive element such as a lens group and an image sensor, wherein the lens group includes a plurality of lenses (convex or concave) for collecting the light signal reflected by the object to be photographed, and transmitting the collected light signal to the image sensor .
  • the image sensor generates an original image of the object to be photographed according to the light signal.
  • multiple cameras are used to collect images in the same scene from multiple azimuth angles, so that the scene can be identified in combination with the multiple images and the azimuth angle corresponding to each image, and more comprehensive scene information can be obtained, Improve the accuracy of image-based scene recognition, solve the problem of limited viewing angle range and shooting angle of single camera recognition scene, and make the recognition more accurate.
  • Internal memory 221 may be used to store computer executable program code, which includes instructions.
  • the processor 210 executes various functional applications and data processing of the mobile phone 200 by executing the instructions stored in the internal memory 221 .
  • the internal memory 221 may include a storage program area and a storage data area.
  • the storage program area may store operating system, code of application programs (such as camera application, WeChat application, etc.), and the like.
  • the storage data area may store data created during the use of the mobile phone 200 (such as images and videos collected by the camera application) and the like.
  • the internal memory 221 may also store one or more computer programs 1310 corresponding to the scene recognition method provided by the embodiment of the present application.
  • the one or more computer programs 1304 are stored in the aforementioned memory 221 and configured to be executed by the one or more processors 210, and the one or more computer programs 1310 include instructions that may be used to carry out the implementation of the present application
  • the computer program 1310 may include: an image acquisition module for acquiring images under the same scene from multiple azimuths through a plurality of cameras; a scene recognition module for according to the image Recognize the same scene with the azimuth angle corresponding to the image, and obtain the scene recognition result; the azimuth angle acquisition module is used to acquire the acceleration on the coordinate axis of the three-dimensional rectangular coordinate system corresponding to the gravity sensor when each camera collects the image , obtains the direction vector when each camera collects the image, and calculates the azimuth angle according to the direction vector and the gravity unit vector; an image preprocessing module is used to preprocess
  • the internal memory 221 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • non-volatile memory such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • the code of the scene recognition method provided by the embodiment of the present application may also be stored in an external memory.
  • the processor 210 may execute the code of the scene recognition method stored in the external memory through the external memory interface 220 .
  • the function of the sensor module 280 is described below.
  • the gyro sensor 280A can be used to determine the movement posture of the mobile phone 200 .
  • the angular velocity of cell phone 200 about three axes ie, x, y, and z axes
  • the gyro sensor 280A can be used to detect the current motion state of the mobile phone 200, such as shaking or still.
  • the gyro sensor 280A can be used to detect a folding or unfolding operation acting on the display screen 294 .
  • the gyroscope sensor 280A may report the detected folding operation or unfolding operation to the processor 210 as an event to determine the folding state or unfolding state of the display screen 294 .
  • the acceleration sensor 280B can detect the magnitude of the acceleration of the mobile phone 200 in various directions (generally three axes). That is, the gyro sensor 280A can be used to detect the current motion state of the mobile phone 200, such as shaking or still. When the display screen in the embodiment of the present application is a foldable screen, the acceleration sensor 280B can be used to detect a folding or unfolding operation acting on the display screen 294 . The acceleration sensor 280B may report the detected folding operation or unfolding operation to the processor 210 as an event to determine the folding state or unfolding state of the display screen 294 .
  • the terminal device can obtain the acceleration on the coordinate axis of the three-dimensional rectangular coordinate system corresponding to each camera when collecting the image through the acceleration sensor 280B, and obtain the direction vector when each camera collects the image , and calculate the azimuth angle according to the direction vector and the gravity unit vector.
  • Proximity light sensor 280G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes.
  • the light emitting diodes may be infrared light emitting diodes.
  • the mobile phone emits infrared light outward through light-emitting diodes.
  • Phones use photodiodes to detect reflected infrared light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the phone. When insufficient reflected light is detected, the phone can determine that there are no objects near the phone.
  • the proximity light sensor 280G can be arranged on the first screen of the foldable display screen 294, and the proximity light sensor 280G can detect the first screen according to the optical path difference of the infrared signal.
  • the gyroscope sensor 280A (or the acceleration sensor 280B) may send the detected motion state information (such as angular velocity) to the processor 210 .
  • the processor 210 determines, based on the motion state information, whether the current state is the hand-held state or the tripod state (for example, when the angular velocity is not 0, it means that the mobile phone 200 is in the hand-held state).
  • the fingerprint sensor 280H is used to collect fingerprints.
  • the mobile phone 200 can use the collected fingerprint characteristics to realize fingerprint unlocking, accessing application locks, taking photos with fingerprints, answering incoming calls with fingerprints, and the like.
  • Touch sensor 280K also called “touch panel”.
  • the touch sensor 280K may be disposed on the display screen 294, and the touch sensor 280K and the display screen 294 form a touch screen, also called a "touch screen”.
  • the touch sensor 280K is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to touch operations may be provided through display screen 294 .
  • the touch sensor 280K may also be disposed on the surface of the mobile phone 200 , which is different from the location where the display screen 294 is located.
  • the display screen 294 of the mobile phone 200 displays a main interface, and the main interface includes icons of multiple applications (such as a camera application, a WeChat application, etc.).
  • Display screen 294 displays an interface of a camera application, such as a viewfinder interface. Display screen 294 may also be used to display scene recognition results.
  • the wireless communication function of the mobile phone 200 can be realized by the antenna 1, the antenna 2, the mobile communication module 251, the wireless communication module 252, the modulation and demodulation processor, the baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in handset 200 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 251 can provide a wireless communication solution including 2G/3G/4G/5G, etc. applied on the mobile phone 200 .
  • the mobile communication module 251 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like.
  • the mobile communication module 251 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 251 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 .
  • at least part of the functional modules of the mobile communication module 251 may be provided in the processor 210 .
  • at least part of the functional modules of the mobile communication module 251 may be provided in the same device as at least part of the modules of the processor 210 .
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low frequency baseband signal is processed by the baseband processor and passed to the application processor.
  • the application processor outputs sound signals through audio devices (not limited to the speaker 270A, the receiver 270B, etc.), or displays images or videos through the display screen 294 .
  • the modem processor may be a stand-alone device.
  • the modulation and demodulation processor may be independent of the processor 210, and may be provided in the same device as the mobile communication module 251 or other functional modules.
  • the wireless communication module 252 can provide applications on the mobile phone 200 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • BT wireless fidelity
  • GNSS global navigation satellite system
  • frequency modulation frequency modulation, FM
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the wireless communication module 252 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 252 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 210 .
  • the wireless communication module 252 can also receive the signal to be sent from the processor 210 , perform frequency modulation on the signal, amplify the signal, and then convert it into an electromagnetic wave for radiation through the antenna 2 .
  • the wireless communication module 252 is configured to transmit data with other terminal devices under the control of the processor 210 .
  • the mobile phone 200 can implement audio functions through an audio module 270, a speaker 270A, a receiver 270B, a microphone 270C, an earphone interface 270D, and an application processor. Such as music playback, recording, etc.
  • the cell phone 200 can receive key 290 input and generate key signal input related to user settings and function control of the cell phone 200 .
  • the mobile phone 200 can use the motor 291 to generate vibration alerts (eg, vibration alerts for incoming calls).
  • the indicator 292 in the mobile phone 200 may be an indicator light, which may be used to indicate a charging state, a change in power, and may also be used to indicate a message, a missed call, a notification, and the like.
  • the SIM card interface 295 in the mobile phone 200 is used to connect the SIM card. The SIM card can be contacted and separated from the mobile phone 200 by inserting into the SIM card interface 295 or pulling out from the SIM card interface 295 .
  • the mobile phone 200 may include more or less components than those shown in FIG. 10 , which are not limited in this embodiment of the present application.
  • the illustrated handset 200 is merely an example, and the handset 200 may have more or fewer components than those shown, two or more components may be combined, or may have different component configurations.
  • the various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
  • the software system of the terminal device can adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
  • the embodiments of the present application take an Android system with a layered architecture as an example to exemplarily describe the software structure of a terminal device.
  • An embodiment of the present application provides a scene recognition apparatus, including: a processor and a memory for storing instructions executable by the processor; wherein the processor is configured to implement the above method when executing the instructions.
  • Embodiments of the present application provide a non-volatile computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, implement the above method.
  • Embodiments of the present application provide a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are stored in a processor of an electronic device When running in the electronic device, the processor in the electronic device executes the above method.
  • a computer-readable storage medium may be a tangible device that can hold and store instructions for use by the instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (Electrically Programmable Read-Only-Memory, EPROM or flash memory), static random access memory (Static Random-Access Memory, SRAM), portable compact disk read-only memory (Compact Disc Read-Only Memory, CD - ROM), Digital Video Disc (DVD), memory sticks, floppy disks, mechanically encoded devices, such as punch cards or raised structures in grooves on which instructions are stored, and any suitable combination of the foregoing .
  • RAM random access memory
  • ROM read only memory
  • EPROM erasable programmable read-only memory
  • EPROM Errically Programmable Read-Only-Memory
  • SRAM static random access memory
  • portable compact disk read-only memory Compact Disc Read-Only Memory
  • CD - ROM Compact Disc Read-Only Memory
  • DVD Digital Video Disc
  • memory sticks floppy disks
  • Computer readable program instructions or code described herein may be downloaded to various computing/processing devices from a computer readable storage medium, or to an external computer or external storage device over a network such as the Internet, a local area network, a wide area network and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer-readable program instructions from a network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
  • the computer program instructions used to perform the operations of the present application may be assembly instructions, Instruction Set Architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more source or object code written in any combination of programming languages, including object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as the "C" language or similar programming languages.
  • the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
  • the remote computer may be connected to the user's computer through any kind of network—including a Local Area Network (LAN) or a Wide Area Network (WAN)—or, may be connected to an external computer (eg, use an internet service provider to connect via the internet).
  • electronic circuits such as programmable logic circuits, Field-Programmable Gate Arrays (FPGA), or Programmable Logic Arrays (Programmable Logic Arrays), are personalized by utilizing state information of computer-readable program instructions.
  • Logic Array, PLA the electronic circuit can execute computer readable program instructions to implement various aspects of the present application.
  • These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer or other programmable data processing apparatus to produce a machine that causes the instructions when executed by the processor of the computer or other programmable data processing apparatus , resulting in means for implementing the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.
  • These computer readable program instructions can also be stored in a computer readable storage medium, these instructions cause a computer, programmable data processing apparatus and/or other equipment to operate in a specific manner, so that the computer readable medium on which the instructions are stored includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.
  • Computer readable program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other equipment to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executing on a computer, other programmable data processing apparatus, or other device to implement the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more functions for implementing the specified logical function(s) executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in hardware (eg, circuits or ASICs (Application) that perform the corresponding functions or actions. Specific Integrated Circuit, application-specific integrated circuit)), or can be implemented by a combination of hardware and software, such as firmware.

Abstract

L'invention concerne un procédé d'identification d'environnement et un appareil. Le procédé comprend : un dispositif terminal qui capture des images dans un environnement unique au moyen d'une pluralité de caméras et à partir d'une pluralité d'angles d'azimut, l'angle d'azimut lorsque chaque caméra capture une image étant l'angle d'azimut correspondant à ladite image, et chaque angle d'azimut étant l'angle inclus d'un vecteur d'unité de gravité et d'un vecteur de direction lorsque chaque caméra capture une image (S700) ; et le dispositif terminal identifie ledit environnement unique en fonction des images et des angles d'azimut correspondant aux images, et obtient un résultat d'identification d'environnement (S701). Le présent procédé intègre une pluralité d'images et des angles d'azimut correspondant à chacune des images pour effectuer une identification sur un environnement, et en raison du fait que des informations d'environnement plus complètes sont obtenues, la précision de l'identification d'environnement à base d'image peut être améliorée, le problème des limitations de la portée de l'angle de vue et des angles de prise de vue lorsqu'une seule caméra identifie un environnement est résolu, et l'identification est plus exacte et précise.
PCT/CN2021/140833 2021-02-25 2021-12-23 Procédé d'identification d'environnement et appareil WO2022179281A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110215000.3A CN115049909A (zh) 2021-02-25 2021-02-25 场景识别方法及装置
CN202110215000.3 2021-02-25

Publications (1)

Publication Number Publication Date
WO2022179281A1 true WO2022179281A1 (fr) 2022-09-01

Family

ID=83048679

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/140833 WO2022179281A1 (fr) 2021-02-25 2021-12-23 Procédé d'identification d'environnement et appareil

Country Status (2)

Country Link
CN (1) CN115049909A (fr)
WO (1) WO2022179281A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105339841A (zh) * 2013-12-06 2016-02-17 华为终端有限公司 双镜头设备的拍照方法及双镜头设备
CN109117693A (zh) * 2017-06-22 2019-01-01 深圳华智融科技股份有限公司 一种基于广角取景的扫描识别的方法及终端
CN109903393A (zh) * 2019-02-22 2019-06-18 清华大学 基于深度学习的新视角场景合成方法和装置
US10504008B1 (en) * 2016-07-18 2019-12-10 Occipital, Inc. System and method for relocalization and scene recognition
US20200265554A1 (en) * 2019-02-18 2020-08-20 Beijing Xiaomi Mobile Software Co., Ltd. Image capturing method and apparatus, and terminal

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105339841A (zh) * 2013-12-06 2016-02-17 华为终端有限公司 双镜头设备的拍照方法及双镜头设备
US10504008B1 (en) * 2016-07-18 2019-12-10 Occipital, Inc. System and method for relocalization and scene recognition
CN109117693A (zh) * 2017-06-22 2019-01-01 深圳华智融科技股份有限公司 一种基于广角取景的扫描识别的方法及终端
US20200265554A1 (en) * 2019-02-18 2020-08-20 Beijing Xiaomi Mobile Software Co., Ltd. Image capturing method and apparatus, and terminal
CN109903393A (zh) * 2019-02-22 2019-06-18 清华大学 基于深度学习的新视角场景合成方法和装置

Also Published As

Publication number Publication date
CN115049909A (zh) 2022-09-13

Similar Documents

Publication Publication Date Title
CN108960209B (zh) 身份识别方法、装置及计算机可读存储介质
CN109034102B (zh) 人脸活体检测方法、装置、设备及存储介质
CN108594997B (zh) 手势骨架构建方法、装置、设备及存储介质
US20220076000A1 (en) Image Processing Method And Apparatus
US9811910B1 (en) Cloud-based image improvement
WO2020048308A1 (fr) Procédé et appareil de classification de ressources multimédias, dispositif informatique et support d'informations
CN110647865A (zh) 人脸姿态的识别方法、装置、设备及存储介质
WO2019033747A1 (fr) Procédé de détermination de cible à suivi intelligent par un véhicule aérien sans pilote, véhicule aérien sans pilote et dispositif de commande à distance
CN111368811B (zh) 活体检测方法、装置、设备及存储介质
CN110059652B (zh) 人脸图像处理方法、装置及存储介质
US20220262035A1 (en) Method, apparatus, and system for determining pose
JP2021503659A (ja) 生体検出方法、装置及びシステム、電子機器並びに記憶媒体
CN110062171B (zh) 一种拍摄方法及终端
CN110650379A (zh) 视频摘要生成方法、装置、电子设备及存储介质
US20230005277A1 (en) Pose determining method and related device
CN113066048A (zh) 一种分割图置信度确定方法及装置
WO2021218695A1 (fr) Procédé de détection de vivacité sur la base d'une caméra monoculaire, dispositif et support d'enregistrement lisible
WO2020103732A1 (fr) Procédé de détection de rides et dispositif terminal
CN112818979B (zh) 文本识别方法、装置、设备及存储介质
CN115880213A (zh) 显示异常检测方法、装置及系统
WO2022179281A1 (fr) Procédé d'identification d'environnement et appareil
CN110163192B (zh) 字符识别方法、装置及可读介质
CN113468929A (zh) 运动状态识别方法、装置、电子设备和存储介质
CN111341307A (zh) 语音识别方法、装置、电子设备及存储介质
WO2022105793A1 (fr) Procédé et dispositif de traitement d'images

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21927705

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21927705

Country of ref document: EP

Kind code of ref document: A1