CN112416114B

CN112416114B - Electronic device and picture visual angle recognition method thereof

Info

Publication number: CN112416114B
Application number: CN201910782553.XA
Authority: CN
Inventors: 黄志文; 杨朝光; 徐文正
Original assignee: Acer Inc
Current assignee: Acer Inc
Priority date: 2019-08-23
Filing date: 2019-08-23
Publication date: 2023-08-04
Anticipated expiration: 2039-08-23
Also published as: CN112416114A

Abstract

The invention provides an electronic device and a picture visual angle identification method thereof. The picture visual angle identification method is suitable for an electronic device and comprises the following steps. A first person perspective view displayed by a display is acquired. Removing a specific object in the first person perspective view produces a preprocessed image. The preprocessed image is input into the neural network model to identify a perspective of the first person perspective view. A function is performed according to the view angle of the first person view angle picture.

Description

Electronic device and picture visual angle recognition method thereof

Technical Field

The present invention relates to electronic devices, and particularly to an electronic device and a method for identifying a viewing angle of a frame of the electronic device.

Background

With the progress of technology, users have not been satisfied with viewing only planar images in order to pursue the feeling of being in the environment. In order to provide a user with a more realistic and stereoscopic visual experience, many application software has been used to simulate a stereoscopic virtual scene in a three-dimensional space, so that the user can watch the stereoscopic virtual scene through the display and even further interact with the stereoscopic virtual scene. In some applications, the stereoscopic virtual scene image displayed by the display is an image generated based on a first person control at a first person perspective (First Person View, FPV). At the first person viewing angle, the user sees an image as if it were seen through an analog-to-digital avatar's eyes, where the analog-to-digital avatar's viewing angle is controlled by the user through an input device or by moving the body. That is, the user can view the stereoscopic scene content corresponding to different viewing angles through the operation of the input device or the action of the body part. For example, when a user plays a first-person perspective game, the screen content displayed by the display is scene content of a stereoscopic virtual scene viewed by a game character (i.e., an analog-to-digital avatar) based on a perspective. In response to a user manipulating the input device or moving the body part, the perspective of the game character will correspondingly change.

Disclosure of Invention

In view of the above, the present invention provides an electronic device and a method for identifying a visual angle of a first-person visual angle screen for subsequent applications by a Neural Network (NN) model.

The embodiment of the invention provides a picture visual angle identification method which is suitable for an electronic device and comprises the following steps. A first person perspective view displayed by a display is acquired. Removing a specific object in the first person perspective view produces a preprocessed image. The preprocessed image is input into the neural network model to identify a perspective of the first person perspective view. A function is performed according to the view angle of the first person view angle picture.

The embodiment of the invention provides an electronic device, which comprises a display, a storage device and a processor, wherein the processor is coupled with the storage device and the display. The processor is configured to perform the following steps. A first person perspective view displayed by a display is acquired. Removing a specific object in the first person perspective view produces a preprocessed image. The preprocessed image is input into the neural network model to identify a perspective of the first person perspective view. A function is performed according to the view angle of the first person view angle picture.

Based on the above, in the embodiment of the invention, after the first-person view frame is preprocessed to remove the specific object, the preprocessed image may be input to the neural network model to identify the view angle of the first-person view frame. In this way, the recognition accuracy of the neural network model can be improved by removing the specific object in the first person view angle picture. In addition, after the view angle of the first person view angle picture is identified, a specific function can be executed according to the view angle of the first person view angle picture, so that the functionality of the electronic device is increased.

In order to make the above features and advantages of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.

Drawings

FIG. 1 is a schematic diagram of an electronic device according to an embodiment of the invention;

FIG. 2 is a flow chart of a method for identifying a visual angle of a picture according to an embodiment of the invention;

FIGS. 3A and 3B illustrate an example of generating a preprocessed image according to an embodiment of the invention;

FIGS. 4A and 4B illustrate an example of generating a preprocessed image according to an embodiment of the invention;

FIG. 5 is a schematic diagram illustrating identifying perspective according to a neural network model, according to an embodiment of the present invention;

fig. 6A to 6C are schematic diagrams illustrating a method for identifying a visual angle of a screen according to an embodiment of the invention;

FIG. 7 is a flow chart illustrating training of a neural network model, in accordance with an embodiment of the present invention.

Reference numerals illustrate:

10: electronic device

110: display device

120: storage device

130: processor and method for controlling the same

F1 to F5: first person view angle picture

SF 1-SF 5: sprite

Img1 to Img2: preprocessed image

500: neural network model

510: convolutional layer

520: pooling layer

530: full connection layer

540: output layer

550: output data

150: lighting device

151-153: lamp signal

S201 to S204, S701 to S704: step (a)

Detailed Description

Some embodiments of the invention will be described in detail below with reference to the drawings, wherein reference to the following description refers to the same or similar elements appearing in different drawings. These examples are only a part of the invention and do not disclose all possible embodiments of the invention. Rather, these embodiments are merely examples of methods and apparatus of the present invention.

Fig. 1 is a schematic diagram of an electronic device according to an embodiment of the invention, but this is for convenience of description and is not intended to limit the invention. Referring to fig. 1, the electronic device 10 is, for example, a notebook computer, a desktop computer, a tablet computer, a head-mounted display device, a game console, a smart phone, a smart television, a server device, or a combination thereof, which is not limited in the present invention. In an embodiment of the present invention, the electronic device 10 includes a display 110, a storage device 120, and a processor 130.

The display 110 is, for example, a liquid crystal display (Liquid Crystal Display, LCD), a Light-Emitting Diode (LED) display, an Organic Light-Emitting Diode (OLED) display, or other types of displays, which are not limited in this regard. From another perspective, the display 110 may be a stand alone display, a display of a notebook computer, a display of a head mounted display device, or a display integrated on other types of electronic devices, as the invention is not limited in this respect.

The storage device 120 is used to store data such as virtual reality image content, program code, software components, etc., and may be, for example, any type of fixed or removable random access memory (random access memory, RAM), read-only memory (ROM), flash memory (flash memory), hard disk or other similar devices, integrated circuits, and combinations thereof.

The processor 130 is, for example, a central processing unit (Central Processing Unit, CPU), or other general purpose or special purpose Microprocessor (Microprocessor), digital signal processor (Digital Signal Processor, DSP), programmable controller, application specific integrated circuit (Application Specific Integrated Circuits, ASIC), programmable logic device (Programmable Logic Device, PLD), graphics processor (Graphics Processing Unit, GPU), or other similar device or combination of devices. The processor 130 may execute program codes, software modules, instructions, etc. recorded in the storage device 120 to implement the method for identifying a viewing angle of a picture according to the embodiments of the present invention.

However, in addition to the display 110, the storage 120, and the processor 130, the electronic device 10 may include other elements not shown in fig. 1, such as a speaker, a microphone, a camera, a communication module, etc., to which the present invention is not limited.

Fig. 2 is a flowchart illustrating a picture view angle recognition method according to an embodiment of the present invention. Referring to fig. 2, the method of the present embodiment is applicable to the electronic device 10 of fig. 1, and the detailed flow of the method of the present embodiment is described below with respect to each element in the electronic device 10.

It should be noted that, in the embodiment of the present invention, when the processor 130 of the electronic device 10 executes an application program, the display 110 will display the first person view. The application program is, for example, a game program or a multimedia playing program, etc., which can provide stereoscopic scene content. For example, when a user plays a first person perspective game or views 360 degree panoramic images/videos using the electronic device 10, the display 110 will display a first person perspective view. In response to a manipulation instruction issued by a user using an input device (not shown) or movement of a body part (e.g., head), the viewing angle of the first person viewing angle picture will correspondingly change. For example, in response to the user manipulating the touch device, the mouse, or the keyboard, the processor 130 determines the first person view from the free-standing scene content and provides the first person view to the display 110 for display. Alternatively, in response to the head pose of the user of the head mounted display device, the processor 130 determines the first person perspective view from the free-standing scene content and provides it to the display 110 of the head mounted display device for display.

First, in step S201, the processor 130 obtains a first-person view displayed on the display 110. Specifically, the processor 130 may obtain the first person perspective screen displayed by the display 110 through an Application Program Interface (API) of an operating system or an application program. For example, the processor 130 may obtain the first person view displayed by the display 110 through a screen acquisition technique such as "Desktop Duplication API" of the Windows operating system. Alternatively, the processor 130 may obtain the first person perspective view via an API of the game program. The image content of the first person viewing angle picture is generated by simulating the digital avatar of the user to view the stereoscopic scene according to a viewing angle.

In step S202, the processor 130 removes a specific object in the first person view to generate a preprocessed image. In step S203, the processor 130 inputs the preprocessed image into the neural network model to identify the viewing angle of the first-person viewing angle picture. In one embodiment, the view angle of the first-person view image may be a vertical tilt view angle, and the vertical tilt view angle may be, for example, 0 to 180 degrees. However, in other embodiments, the viewing angle of the first-person view frame may be a horizontal deflection viewing angle.

Specifically, in an embodiment of the present invention, the processor 130 may identify the perspective of the first-person perspective screen using the neural network model to identify the perspective of the first-person perspective screen according to the screen characteristics of the first-person perspective screen. However, the first person perspective view may include certain items that are detrimental to the recognition accuracy of the neural network model. Thus, in the embodiment of the invention, the first-person perspective view is preprocessed to remove specific objects before generating the recognition result according to the neural network model. Correspondingly, in the training process of the neural network model, the training pictures in the training data set also execute the same preprocessing.

In general, these specific objects that do not contribute to the recognition accuracy of the neural network model are not correlated with the change in perspective. In other words, the specific objects do not change in response to the change of the first-person viewing angle. For example, assuming that the first person view is a game screen, a virtual hand or weapon located below the game screen is a specific object that is detrimental to the recognition accuracy of the neural network model used to recognize the view. Alternatively, a play control menu, game control menu, LOGO (LOGO), or other static graphic in the first person view also belongs to a particular item that is detrimental to the recognition accuracy of the neural network model used to recognize the view. In the embodiment of the invention, after the specific object in the first person view angle picture is removed to generate the preprocessed image, the preprocessed image is provided for the neural network model for recognition, so that the recognition accuracy of the neural network model can be obviously improved.

In one embodiment, the processor 130 may perform image analysis to detect a specific object, for example, by an object detection algorithm such as color detection, contour detection, or image comparison, etc., to detect the specific object in the first person view. The processor 130 may then remove the particular object from the first person perspective view, such as by cutting out the image block that includes the particular object to produce a preprocessed image.

In one embodiment, the processor 130 may crop the first person view into a plurality of sprites and take at least one of the sprites that does not include the particular object to generate the preprocessed image. Specifically, in some application scenarios, since the location of the specific object may not be changed substantially, the processor 130 may directly crop the first-person view and extract the image portion that does not include the specific object, so as to achieve the purpose of removing the specific object.

For example, fig. 3A and 3B illustrate an example of generating a preprocessed image according to an embodiment of the invention. Referring to fig. 3A and 3B, in the present example, the processor 130 obtains a first-person view F1 with a size of W1×h1. In this example, it is assumed that the playback control object is included below the first-person view F1. Accordingly, the processor 130 cuts the first-person view F1 into the sub-frames SF1 and SF2 arranged side by side, and takes the sub-frames SF1 to generate the preprocessed image Img1 with the size of W1×h2, so as to identify the view of the first-person view F1 by using the preprocessed image Img1 in the subsequent steps.

For example, fig. 4A and 4B illustrate an example of generating a preprocessed image according to an embodiment of the invention. Referring to fig. 4A and 4B, in the present example, the processor 130 obtains a first-person view F2 with a size of W4×h4. In this example, it is assumed that the lower part of the first person view F2 includes a virtual hand holding a virtual weapon. Accordingly, the processor 130 clips the first-person view F2 into the sprites SF3, SF4, and SF5. In other words, the sub-frames generated by cropping the first-person view F2 include sub-frames SF3, SF4 with a size of W3×h3, and SF5 with a size of W5×h3. The processor 130 may stitch the sub-frame SF3 and the sub-frame SF5 into a preprocessed image Img2 with a size of (w3+w5) x H3, so as to identify the viewing angle of the first-person viewing angle frame F2 by using the preprocessed image Img2 in the subsequent steps.

It should be noted that, in one embodiment, the processor 130 may classify the preprocessed image into one of a plurality of viewing angle ranges using a neural network model. The neural network model may be a deep neural network model or other machine learning model, which the present invention is not limited to. For example, the neural network model may be LeNet, VGGNet, NASNet, resNet for image classification in a convolutional layer class neural network (Convolution Neural Network, CNN) model, etc., as the invention is not limited in this regard. The multiple view angle ranges are output classification categories of the neural network model.

FIG. 5 is a schematic diagram illustrating the recognition of viewing angles according to a neural network model according to an embodiment of the present invention. Referring to fig. 5, the pretreated image Img2 is input to the CNN model for explanation. In this example, the convolutional Layer neural network 500 is composed of at least one convolutional Layer (Convolution Layer) 510, at least one Pooling Layer (Pooling Layer) 520, at least one full-link Layer (Fully connected Layer) 530, and an output Layer 540.

The neural network 500 is typically formed by concatenating the convolutional layer 510 and the pooling layer 520 in the former stage, and is typically used to obtain the feature value of the preprocessed image Img2 as the feature acquisition of the image. This eigenvalue may be a multidimensional array, generally regarded as an eigenvector of the input preprocessed image Img 2. The convolutional layer neural network 500 includes a full link layer 530 and an output layer 540 at a rear stage, wherein the full link layer 530 and the output layer 540 classify the preprocessed image Img2 into one of a plurality of classes according to the feature values generated by the convolutional layer 510 and the pooling layer 520. In detail, the output data 550 generated by the output layer 540 may include probabilities P1 to P8 that the convolutional layer neural network 500 determines the preprocessed image Img2 to be in each of the categories AR1 to AR18, so as to determine the category to which the preprocessed image Img2 belongs according to the highest one of the probabilities P1 to P8. More specifically, the categories AR1 to AR1 are different viewing angle ranges. In this example, a vertical viewing angle range of 180 degrees is divided into 18 viewing angle ranges of 10 degrees. In other words, the categories AR1 to AR18 correspond to the viewing angle ranges of 0 to 10 degrees, 10 to 20 degrees, 20 to 30 degrees, 30 to 40 degrees, …, 160 to 170 degrees, 170 to 180 degrees, respectively. Based on the use of the convolutional layer class neural network 500, the processor 130 may classify the preprocessed image Img2 into one of 18 view ranges to identify the view of the first-person view picture F2. By classifying the preprocessed image Img2 into one of a plurality of view angles, the embodiment of the invention can save the operation amount and improve the processing efficiency under the condition of ensuring the successful recognition rate of the neural network model and achieving a certain recognition precision.

Finally, in step S204, the processor 130 performs a function according to the view angle of the first-person view angle picture. This function may include providing an acousto-optic effect corresponding to the viewing angle or recording the viewing angle as a course of game operation. In detail, the processor 130 may control the speaker or the lighting device to provide a corresponding lighting effect or sound effect according to the viewing angle of the first person viewing angle picture. Alternatively, the processor 130 may record the viewing angle of the user during play of the first person viewing angle game as a game operational history for human reference. Thus, the functionality and entertainment of the electronic device 10 can be improved.

For example, fig. 6A to 6C are schematic diagrams illustrating a method for identifying a view angle of a picture according to an embodiment of the invention. Referring to fig. 6A to 6C, the electronic device 10 may include a lighting device 150. Referring to fig. 6A, it is assumed that the light signal 151 in the light device 150 is turned on when the viewing angle of the first person viewing angle picture F3 is recognized as falling within 120 degrees to 140 degrees. Referring to fig. 6B, it is assumed that the light signal 152 in the light device 150 is turned on when the viewing angle of the first person viewing angle picture F4 is recognized as falling within the range of 80 degrees to 100 degrees. Referring to fig. 6C, it is assumed that the light signal 153 in the light device 150 is turned on when the viewing angle of the first person viewing angle picture F5 is recognized as falling within 40 degrees to 60 degrees.

It is worth noting that when the electronic device 10 is a head-mounted display device, the processor 130 may further automatically perform the horizontal viewing angle correction according to the viewing angle of the first-person viewing angle picture. In detail, the processor 130 may calculate a difference between the viewing angle of the first-person viewing angle picture and a preset desired viewing angle (90 degrees in the case of horizontal correction) to obtain a viewing angle offset, and correct the viewing angle positioning parameter of the head-mounted display device according to the viewing angle offset. The processor 130 may then provide the viewing angle corrected image to the user.

FIG. 7 is a flow chart illustrating training of a neural network model, in accordance with an embodiment of the present invention. Referring to fig. 7, the process of the present embodiment is applicable to the electronic device 10 of fig. 1, and the detailed process of training the neural network model is described below with respect to each element in the electronic device 10. It should be noted that, the processor 130 for training the neural network model and actually identifying the view angle of the first person view angle picture may be implemented by a processor in a single electronic device or implemented by a processor in a plurality of electronic devices, which is not limited in the present invention.

In step S701, the processor 130 obtains a plurality of training frames of the application program while executing the application program. For example, the processor 130 may generate the mouse control event by itself to enable the game program to provide a plurality of training frames, which are respectively labeled with suitable training views. In step S702, the processor 130 removes a specific object in the training frame to generate a plurality of preprocessed training images. Here, the manner of removing the specific object in the training frame by the processor 130 is the same as the manner of removing the specific object in the first person view frame by the processor 130 in step S202. In other words, the processor 130 may also cut the training frames into a plurality of sub-training frames, and extract at least one of the sub-training frames of each training frame to generate a plurality of preprocessed training images.

In step S703, the processor 130 respectively takes each of the preprocessed training image labels as one of the viewing angle ranges according to the plurality of training viewing angles and the plurality of viewing angle ranges respectively corresponding to the training images, so as to obtain a classification label of each of the preprocessed training images. For example, assuming that a training frame is labeled as having a training viewing angle of 90 degrees, the classification label of the preprocessed training image of the training frame is a viewing angle range of 80 degrees to 100 degrees. Here, the processor 130 will perform a labeling action on the preprocessed training images of each training frame to generate a classification label for each preprocessed training image. In addition, the training perspective of the training frame may be provided by the application that generated the training frame or may be annotated by the developer themselves. For example, the processor 130 may execute a mouse event simulation tool. The processor 130 may simulate a mouse event of a mouse movement by a mouse event simulation tool and define a training view angle according to a fixed movement unit. For example, the mouse event simulation tool can simulate a mouse down event with a very large range of movement, and mark the training visual angle of the training picture generated at this time as 0 degrees. Then, the mouse event simulation tool can simulate a plurality of mouse up events of a fixed unit of gradual movement, and gradually increase the training viewing angle of the training picture generated in response to each mouse up event by an angle interval (for example, 1 degree).

After the processor 130 generates the training data set including the plurality of preprocessed training images and the corresponding class labels, in step S704, the processor 130 trains the neural network model according to the preprocessed training images and the class labels of the preprocessed training images. Specifically, the processor 130 may input the preprocessed training image to the neural network model. By comparing the classification result of the neural network model with the classification labels, the processor 130 trains a set of rules (i.e. parameters of the neural network model) for classifying the preprocessed training image into one of a plurality of view angles, and finally builds the neural network model for identifying the view angles.

In summary, in the embodiment of the invention, after the first-person view frame is preprocessed to remove the specific object, the preprocessed image may be input to the neural network model to identify the view angle of the first-person view frame. In this way, the recognition accuracy of the neural network model for recognizing the view angle of the picture can be improved by removing the specific object in the first-person view angle picture. In addition, after the view angle of the first person view angle picture is identified, a specific function can be executed according to the view angle of the first person view angle picture, so that the functionality of the electronic device is increased. In addition, by classifying the preprocessed image to one of a plurality of view angles, the embodiment of the invention can ensure the successful recognition rate of the neural network model and can save the operation amount and improve the processing efficiency under the condition of reaching a certain recognition precision.

Although the invention has been described with reference to the above embodiments, it should be understood that the invention is not limited thereto, but rather may be modified and practiced by those skilled in the art without departing from the spirit and scope of the present invention.

Claims

1. A method for identifying a visual angle of a picture, which is suitable for an electronic device, the method comprising:

acquiring a first person viewing angle picture displayed by a display;

removing a specific object in the first person perspective view to generate a preprocessed image;

inputting the preprocessed image to a neural network model to identify a perspective of the first person perspective view; and

performing a function in accordance with the view of the first person view,

wherein the step of removing the particular object in the first person perspective view to produce the preprocessed image comprises:

clipping the first person view angle picture into a plurality of sub-pictures; and

taking at least one of the plurality of sub-frames that does not include the particular object generates the preprocessed image, wherein the particular object does not have a corresponding variation in response to a perspective transformation of the first-person perspective frame.

2. The picture view identification method according to claim 1, wherein the plurality of sprites includes a left side sprite, a middle sprite, and a right side sprite, and the step of taking at least one of the plurality of sprites that does not include the specific object to generate the preprocessed image includes:

and splicing the left side sub-picture and the right side sub-picture into the preprocessed image.

3. The picture viewing angle identification method of claim 1, wherein the function comprises providing an acousto-optic effect corresponding to the viewing angle or recording the viewing angle as a game operation history.

4. The picture view identification method according to claim 1, wherein the step of inputting the preprocessed image to the neural network model to identify the view of the first-person view picture comprises:

the preprocessed image is classified into one of a plurality of view ranges using the neural network model.

5. The picture view identification method according to claim 1, wherein the view of the first-person view picture is a vertical tilt view.

6. The picture viewing angle identification method of claim 1, the method further comprising;

when an application program is executed, a plurality of training pictures of the application program are acquired;

removing the specific object in the training pictures to generate a plurality of preprocessed training images;

respectively labeling the preprocessed training images as one of the view angle ranges according to the training view angles and the view angle ranges corresponding to the training pictures; and

training the neural network model according to the plurality of preprocessed training images and the classification labels of the plurality of preprocessed training images.

7. An electronic device, comprising:

a display;

a storage device;

a processor, coupled to the display and the storage device, configured to:

acquiring a first person viewing angle picture displayed by the display;

performing a function in accordance with the view of the first person view,

wherein the processor is also configured to:

8. The electronic device of claim 7, wherein the plurality of sprites comprises a left side sprite, a middle sprite, and a right side sprite, and the processor is further configured to:

9. The electronic device of claim 7, wherein the function comprises providing an acousto-optic effect corresponding to the perspective or recording the perspective as a game operational history.

10. The electronic device of claim 7, wherein the processor is further configured to:

11. The electronic device of claim 7, wherein the view of the first person view screen is a vertical tilt view.

12. The electronic device of claim 7, wherein the processor is further configured to: