CN115082959A

CN115082959A - Display device and image processing method

Info

Publication number: CN115082959A
Application number: CN202210689537.8A
Authority: CN
Inventors: 马乐; 刘兆磊; 张�杰
Original assignee: Hisense Visual Technology Co Ltd
Current assignee: Hisense Visual Technology Co Ltd
Priority date: 2022-06-16
Filing date: 2022-06-16
Publication date: 2022-09-20

Abstract

The application provides a display device and an image processing method, wherein the method can calculate the ratio of time consumed for detecting one frame of image to the time interval for obtaining two adjacent frames of images to obtain the number of queues; creating a buffer queue according to the number of queues; sequentially storing the image frames to a buffer queue according to the sequence of obtaining the image frames; then, aiming at any cache queue, identifying the key points of the human body on the image frames in the cache queue to obtain key point images of the human body; sequentially rendering the human body key point images and the image frames in the cache queue according to the sequence of the cache queue; the sequence of the buffer queue is the storage sequence of the image frames; therefore, the problem of low image display fluency caused by image display jamming is avoided during image rendering, the matching degree of the human body key point image and the image frame is improved, the accuracy of the human body key point image used for image frame rendering is improved, and deviation between an image rendering result and the image frame is avoided.

Description

Display device and image processing method

Technical Field

The present application relates to the field of computer technologies, and in particular, to a display device and an image processing method.

Background

The Frames Per Second (FPS) represents the number of Frames played Per Second; the more pictures are played per second, the smoother the display effect of the pictures is. In one processing scenario, image processing includes rendering each frame of image. The rendering process for each frame of image generally includes the following two processing modes:

synchronous rendering: synchronously acquiring each frame of image; performing image detection on the ith frame image according to a preset detection algorithm aiming at the ith frame image to obtain an ith frame human body key point image; then rendering the ith frame of image based on the ith frame of human body key point image; after rendering the ith frame image, acquiring an (i + 1) th frame image; and in the same way, rendering each frame of image.

Asynchronous rendering: asynchronously acquiring each frame of image; performing image detection on the ith frame image according to a preset detection algorithm aiming at the ith frame image to obtain an ith frame human body key point image; then rendering the current frame image based on the ith frame human body key point image; different from the synchronous rendering mode, after the ith frame of image is obtained, the (i + 1) th frame of image is obtained according to the interval time of obtaining two adjacent frames of images, and each frame of image is displayed according to the interval time of obtaining two adjacent frames of images; by parity of reasoning, rendering each frame of image; the current frame image is the currently displayed image frame.

In the synchronous rendering mode, if the detection time of the image detection is long, the time interval between the acquisition of the ith frame image and the acquisition of the (i + 1) th frame image is large, namely the time interval between the acquisition of two adjacent frames of images is long, so that the number of pictures played per second is reduced, the picture rendering display is blocked, and the picture fluency is low. In the asynchronous rendering mode, if the detection time of image detection is long, the detection of the human key point image is slow, so that the i +1 frame image is rendered based on the i frame human key point image; when the image is rendered, the matching degree of the human body key point image and the image frame is low, namely the accuracy of the human body key point image used for rendering the current frame image is low, so that the image rendering result is deviated from the image frame.

Disclosure of Invention

The application provides a display device and an image processing method, which are used for avoiding the phenomenon that a picture is blocked during rendering, improving the smoothness of rendering and displaying the picture, increasing the matching degree of a human body key point image and an image frame, improving the accuracy of the human body key point image used for rendering the image frame and avoiding the deviation of a picture rendering result and the image frame.

In a first aspect, the present application provides a display device comprising:

a display configured to display an image frame picture;

a camera configured to acquire image frames at preset intervals; the preset interval time represents the interval time for the camera to acquire two adjacent frames of images;

a controller configured to:

calculating the ratio of the time consumed for detecting one frame of image to the time interval for obtaining two adjacent frames of images to obtain the number of queues;

creating a buffer queue according to the number of the queues; the buffer queue is used for storing image frames;

sequentially storing the image frames to the cache queue according to the sequence of obtaining the image frames;

aiming at any cache queue, carrying out human body key point identification on the image frames in the cache queue to obtain human body key point images;

sequentially rendering the human body key point images and the image frames in the buffer queue according to the sequence of the buffer queue; and the sequence of the buffer queue is the storage sequence of the image frames.

In a second aspect, the present application further provides an image processing method, which is applied to a display device;

the method comprises the following steps:

creating a buffer queue according to the queue number; the buffer queue is used for storing image frames;

According to the technical scheme, before the image frame is rendered, the number of queues is determined according to the ratio of the time consumed for detecting one frame of image to the time interval for obtaining two adjacent frames of images, and a buffer queue is established based on the number of the queues; if the number of queues is 3, 3 buffer queues are created.

Then, sequentially storing the obtained image frames to a buffer queue; after the image frames are stored in the buffer queues, the display equipment identifies key points of the human body of the image frames stored in any buffer queue to obtain key point images of the human body; that is, the display device asynchronously performs human body key point identification on the image frames stored in each buffer queue; the efficiency and the real-time performance of image frame detection are improved. Because the display device asynchronously identifies the human body key points of the image frames stored in each buffer queue, the time interval of the adjacent buffer queues for obtaining the human body key point images is equivalent to the time interval for obtaining the two adjacent frames of images.

When the rendering is carried out, the image frames in the buffer queue and the human body key point images corresponding to the image frames are sequentially rendered based on the sequence of the buffer queue (for example, the rendering is carried out on the buffer queue n1, and then the rendering is carried out on the buffer queue n 2; n1 and n2 are adjacent buffer queues); the time interval of the adjacent buffer queues for obtaining the human body key point images is equivalent to the time interval of obtaining the adjacent two frames of images, so that the problem of low image display fluency caused by the fact that image display is blocked during rendering is avoided. Because the image frames in the buffer queue and the human body key point images are in one-to-one correspondence, the matching degree of the human body key point images and the image frames is increased, the accuracy of the human body key point images used for rendering the image frames is improved, and the deviation of the image rendering result and the image frames is avoided.

Drawings

In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of a display device in an embodiment of the present application;

fig. 2 is a block diagram of a hardware configuration of a display device in an embodiment of the present application;

fig. 3 is a block diagram of a software configuration of a display device in an embodiment of the present application;

FIG. 4a is a diagram illustrating an ith frame of an image according to an embodiment of the present application;

FIG. 4b is a schematic diagram of an ith frame of human key point image according to an embodiment of the present application;

FIG. 4c is a schematic diagram illustrating an ith frame image rendering according to an embodiment of the present application;

FIG. 5a is a diagram of an i +1 th frame image according to an embodiment of the present disclosure;

FIG. 5b is a schematic diagram of an i +1 th frame of human key point image in the embodiment of the present application;

FIG. 5c is a schematic diagram illustrating an i +1 th frame image rendering according to an embodiment of the present disclosure;

FIG. 6 is a diagram illustrating a screen rendering according to an embodiment of the present disclosure;

FIG. 7 is a diagram illustrating an image processing method according to an embodiment of the present application;

FIG. 8 is a diagram illustrating an image processing method according to an embodiment of the present application;

fig. 9 is a schematic diagram of a screen rendering according to an embodiment of the present application.

Detailed Description

To make the purpose and embodiments of the present application clearer, the following will clearly and completely describe the exemplary embodiments of the present application with reference to the attached drawings in the exemplary embodiments of the present application, and it is obvious that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.

The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.

The terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

Fig. 1 is a schematic diagram of an application scenario according to some embodiments of the present application, which is intended to illustrate a scenario in which a plurality of display devices, including but not limited to devices having data transceiving and processing functions and image display functions and/or sound output functions, are present, and a server capable of communicating with the display devices. In the scenario shown in fig. 1, a mobile device 100, a display device 200 and a server 300 are included.

In some embodiments, the display device 200 (e.g., the smart television and the mobile device 100) may display an image captured by its own camera, and may also display an image transmitted by the server 300.

Based on the internet of everything technology, communication connection can be established between multiple display devices in the above scenario, for example, communication is performed between the display device 200 and the mobile device 100, so that an image acquired by the mobile device 100 is projected onto the smart television 200 for display.

The communication protocols for realizing the above-mentioned everything interconnection can include a local area network protocol, a wide area network protocol and a short-range wireless communication protocol which is not limited by a network. Local area network protocols include, but are not limited to, HSP communication protocols; the wide area network includes, but is not limited to, an Artificial Intelligence Internet of Things (AIOT) protocol, and the short range wireless communication protocol includes, but is not limited to, a bluetooth transmission protocol, an infrared transmission protocol.

Based on the difference of the aforementioned communication protocol types, the communication protocol channels of the display device can be divided into local area network protocol channels based on a local area network, wide area network protocol channels based on a wide area network, and other protocol channels. Other protocol channels include bluetooth protocol channels, infrared protocol channels, etc. The display device in the above scenario may support one or more of the aforementioned protocol channels.

The display apparatus 200 may establish a communication connection with the server 300 to perform information interaction with the server 300, for example, to provide various contents and interaction information to the display apparatus. The display device may be allowed to be communicatively connected via a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 300 may be a cluster or a plurality of clusters, and may include one or more types of servers.

It should be noted that, in the same scene shown in fig. 1, other display devices may also be included, including but not limited to external devices such as a touch-control integrated device, a projection device, a tablet computer, a computer, and a notebook computer. The number of devices in the same type of terminal is not limited herein.

Fig. 2 illustrates a hardware configuration block diagram of a display device according to an exemplary embodiment.

In some embodiments, the display apparatus 200 includes at least one of a tuner demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, a memory, a power supply, a user interface.

In some embodiments the controller comprises a processor, a video processor, an audio processor, a graphics processor, a RAM, a ROM, a first interface to an nth interface for input/output.

In some embodiments, the display 260 includes a display screen component for presenting a picture, and a driving component for driving an image display, a component for receiving an image signal from the controller output, performing display of video content, image content, and a menu manipulation interface, and a user manipulation UI interface.

In some embodiments, the display 260 may be a liquid crystal display, an OLED display, and a projection display, and may also be a projection device and a projection screen.

In some embodiments, communicator 220 is a component for communicating with external devices or servers 300 according to various communication protocol types. For example: the communicator may include at least one of a Wifi module, a bluetooth module, a wired ethernet module, and other network communication protocol chips or near field communication protocol chips, and an infrared receiver. The display apparatus may establish transmission and reception of control signals and data signals with an external control apparatus or server 300 through the communicator 220.

In some embodiments, the user interface may be configured to receive control signals for controlling a device (e.g., an infrared remote control, etc.).

In some embodiments, the detector 230 is used to collect signals of the external environment or interaction with the outside. For example, detector 230 includes a light receiver, a sensor for collecting ambient light intensity; alternatively, the detector 230 includes an image collector, such as a camera, which may be used to collect external environment scenes, attributes of the user, or user interaction gestures, or the detector 230 includes a sound collector, such as a microphone, which is used to receive external sounds.

In some embodiments, the external device interface 240 may include, but is not limited to, the following: high Definition Multimedia Interface (HDMI), analog or data high definition component input interface (component), composite video input interface (CVBS), USB input interface (USB), RGB port, and the like. The interface may be a composite input/output interface formed by the plurality of interfaces.

In some embodiments, the tuner demodulator 210 receives broadcast television signals via wired or wireless reception, and demodulates audio/video signals, such as EPG data signals, from a plurality of wireless or wired broadcast television signals.

In some embodiments, the controller 250 and the modem 210 may be located in different separate devices, that is, the modem 210 may also be located in an external device of the main device where the controller 250 is located, such as an external set-top box.

In some embodiments, the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored in memory. The controller 250 controls the overall operation of the display device. For example: in response to receiving a user command for selecting a UI object to be displayed on the display 260, the controller 250 may perform an operation related to the object selected by the user command.

In some embodiments, the object may be any one of selectable objects, such as a hyperlink, an icon, or other actionable control. The operations related to the selected object are: displaying an operation connected to a hyperlink page, document, image, or the like, or performing an operation of a program corresponding to the icon.

In some embodiments the controller comprises at least one of a Central Processing Unit (CPU), a video processor, an audio processor, a Graphics Processing Unit (GPU), a RAM Random Access Memory (RAM), a ROM (Read-Only Memory), a first to nth interface for input/output, a communication Bus (Bus), and the like.

A CPU processor. For executing operating system and application program instructions stored in the memory, and executing various application programs, data and contents according to various interactive instructions receiving external input, so as to finally display and play various audio-video contents. The CPU processor may include a plurality of processors. E.g. comprising a main processor and one or more sub-processors.

In some embodiments, a graphics processor for generating various graphics objects, such as: icons, operation menus, user input instruction display graphics, and the like. The graphic processor comprises an arithmetic unit, which performs operation by receiving various interactive instructions input by a user and displays various objects according to display attributes; the system also comprises a renderer for rendering various objects obtained based on the arithmetic unit, wherein the rendered objects are used for being displayed on a display.

In some embodiments, the video processor is configured to receive an external video signal, and perform video processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, and image synthesis according to a standard codec protocol of the input signal, so as to obtain a signal that can be displayed or played on a directly displayable device.

In some embodiments, the video processor includes a demultiplexing module, a video decoding module, an image synthesis module, a frame rate conversion module, a display formatting module, and the like. The demultiplexing module is used for demultiplexing the input audio and video data stream. And the video decoding module is used for processing the video signal after demultiplexing, including decoding, scaling and the like. And the image synthesis module is used for carrying out superposition mixing processing on the GUI signal input by the user or generated by the user and the video image after the zooming processing by the graphic generator so as to generate an image signal for display. And the frame rate conversion module is used for converting the frame rate of the input video. And the display formatting module is used for converting the received video output signal after the frame rate conversion, and changing the signal to be in accordance with the signal of the display format, such as an output RGB data signal.

In some embodiments, the audio processor is configured to receive an external audio signal, decompress and decode the received audio signal according to a standard codec protocol of the input signal, and perform noise reduction, digital-to-analog conversion, and amplification processing to obtain an audio signal that can be played in the speaker.

In some embodiments, a user may enter user commands on a Graphical User Interface (GUI) displayed on display 260, and the user input interface receives the user input commands through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface receives the user input command by recognizing the sound or gesture through the sensor.

In some embodiments, a "user interface" is a media interface for interaction and information exchange between an application or operating system and a user that enables conversion between an internal form of information and a form that is acceptable to the user. A commonly used presentation form of the User Interface is a Graphical User Interface (GUI), which refers to a User Interface related to computer operations and displayed in a graphical manner. It may be an interface element such as an icon, a window, a control, etc. displayed in the display screen of the electronic device, where the control may include a visual interface element such as an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc.

Referring to fig. 3, in some embodiments, the system is divided into four layers, which are an Application (Applications) layer (abbreviated as "Application layer"), an Application Framework (Application Framework) layer (abbreviated as "Framework layer"), an Android runtime (Android runtime) and system library layer (abbreviated as "system runtime library layer"), and a kernel layer, respectively, from top to bottom.

In some embodiments, at least one application program runs in the application program layer, and the application programs may be windows (windows) programs carried by an operating system, system setting programs, clock programs or the like; or an application developed by a third party developer. In particular implementations, the application packages in the application layer are not limited to the above examples.

The framework layer provides an Application Programming Interface (API) and a programming framework for the application. The application framework layer includes a number of predefined functions. The application framework layer acts as a processing center that decides to let the applications in the application layer act. The application program can access the resources in the system and obtain the services of the system in execution through the API interface.

As shown in fig. 3, in the embodiment of the present application, the application framework layer includes a manager (Managers), a Content Provider (Content Provider), and the like, where the manager includes at least one of the following modules: an Activity Manager (Activity Manager) is used for interacting with all activities running in the system; the Location Manager (Location Manager) is used for providing the system service or application with the access of the system Location service; a Package Manager (Package Manager) for retrieving various information related to an application Package currently installed on the device; a Notification Manager (Notification Manager) for controlling display and clearing of Notification messages; a Window Manager (Window Manager) is used to manage icons, windows, toolbars, wallpapers, and desktop components on a user interface.

In some embodiments, the activity manager is used to manage the lifecycle of the various applications as well as general navigational fallback functions, such as controlling exit, opening, fallback, etc. of the applications. The window manager is used for managing all window programs, such as obtaining the size of a display screen, judging whether a status bar exists, locking the screen, intercepting the screen, controlling the change of the display window (for example, reducing the display window, displaying a shake, displaying a distortion deformation, and the like), and the like.

In some embodiments, the system runtime layer provides support for the upper layer, i.e., the framework layer, and when the framework layer is used, the android operating system runs the C/C + + library included in the system runtime layer to implement the functions to be implemented by the framework layer.

In some embodiments, the kernel layer is a layer between hardware and software. As shown in fig. 3, the core layer comprises at least one of the following drivers: audio drive, display driver, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (like fingerprint sensor, temperature sensor, pressure sensor etc.) and power drive etc..

In some embodiments, a user may enter a user command on a Graphical User Interface (GUI) displayed on the display, and the user input interface receives the user input command through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface receives the user input command by recognizing the sound or gesture through the sensor. The user interface is an interface (such as physical keys on the display device body or the like) which can be used for receiving control input.

In some embodiments, the display device 200 may acquire the image frames through an image collector; for example, the mobile device 100 collects image frames through a camera thereof; the display device 200 may also receive the image frames transmitted by the server 300 by establishing a communication connection with the server 300.

In some embodiments, the display device 200 displays video pictures through a display. The display principle of the video is based on the persistence of vision effect of human eyes; video is formed by using images that are played continuously frame by frame. The rate of continuous play-back frame by frame is generally determined by the number of Frames transmitted Per Second (FPS) of a picture.

FPS is a definition in the field of images, and refers to the number of frames transmitted per second for a picture, and colloquially to the number of pictures for animation or video. The FPS measures the amount of information used to store and display the motion video. The greater the number of frames per second, the more fluid the displayed motion will be. Typically, to avoid motion dysfluencies, the FPS value is a minimum of 30.

In some embodiments, the movie is played at a rate of 24 frames per second, i.e., 24 frames are projected continuously on the screen in one second. In FPS, "F" is an english word Frame, "P" is an english word P er (per), and S is an english word Second. Chinese is "how many frames per second". Such as a film, has an FPS value of 24.

In some embodiments, for any frame of image, when detecting scenes such as gestures, limbs, faces, and the like, the image frame needs to be detected by a preset detection algorithm to obtain a human body key point image, and then the image frame is rendered based on the human body key point image.

For example, in scenes of photographing, video recording and live broadcasting, a user adds decorations (such as glasses, hats, hair and other decorations) to a human face image in an image; because the human body is changed in real time when the images are acquired, i.e., the images of each frame are different. Therefore, each frame of image needs to be detected to position the human body, so that the image rendering result corresponds to the image frame, and the image rendering result is ensured to correspond to the image frame.

FIG. 4a is a schematic diagram of an ith frame image as exemplary shown in some embodiments of the present application; FIG. 4b is a schematic diagram of an ith frame human key point image exemplarily shown in some embodiments of the present application; the display device 200 acquires an ith frame image shown in fig. 4a through a camera of the display device, and then detects the ith frame image according to a preset detection algorithm (such as a human posture detection algorithm Open Pose, etc.) to obtain an ith frame human body key point image shown in fig. 4 b; then, the display apparatus 200 renders the ith frame image based on the ith frame human key image. The detection of the ith frame image according to a preset detection algorithm is equivalent to the identification of human key points of the image frame.

Fig. 4a shows an image frame captured by the display device 200, such as a human image, a landscape image, etc.; fig. 4b is a human body key point image calculated by the display device 200 according to a preset detection algorithm based on the collected image frame, and is used for representing information such as the position and the posture of the human body in the image frame. The positions of the head, the hand, the elbow, the knee, the foot, the waist and the like of the human body can be shown in fig. 4 b; the posture is that the legs stand and the arms are unfolded and the direction is downward.

By way of example based on fig. 4a and 4b, in a live broadcast scene, a magic stick is added to a hand of a human body in a live broadcast picture for decoration; assuming that the i-th frame image shown in fig. 4a is in the live broadcast picture, detecting the i-th frame image to obtain a human body key point image shown in fig. 4 b; the human hand position is then located from the human key point image as shown in figure 4 b.

FIG. 4c is a schematic view of an ith frame image frame rendering as exemplary shown in some embodiments of the present application; after locating the human hand position, a "magic wand" decoration (shown as a dashed line in fig. 4 c) is added to the i-th frame image as shown in fig. 4a based on the human hand position.

In some embodiments, the rendering mode generally includes synchronous rendering and asynchronous rendering; the synchronous rendering mode is as follows:

based on fig. 4a, 4b and 4c, fig. 5a is a schematic diagram of an i +1 th frame image exemplarily shown in some embodiments of the present application; FIG. 5b is a schematic diagram of an i +1 th frame human key point image exemplarily shown in some embodiments of the present application; the ith frame image and the (i + 1) th frame image are adjacent frame images, and the ith frame image is earlier than the (i + 1) th frame image. That is, the next frame of the image frame shown in fig. 4a is the image frame shown in fig. 5 a.

After rendering the ith frame of detection image, acquiring an (i + 1) th frame of image shown in fig. 5a, performing human key point identification on the (i + 1) th frame of image to obtain an (i + 1) th frame of human key point image shown in fig. 5b, and then rendering the (i + 1) th frame of image based on the (i + 1) th frame of human key point image according to the interval time between the acquisition of two adjacent frames of images; and in the same way, rendering each frame of image.

The asynchronous rendering mode is as follows:

acquiring an ith frame image shown in fig. 4a, and performing human body key point identification on the ith frame image based on a preset detection algorithm to obtain an ith frame human body key point image shown in fig. 4 b; then rendering the ith frame of image based on the ith frame of human body key point image; in the process of rendering the ith frame of image, asynchronously obtaining the (i + 1) th frame of image shown in the figure 5a according to the interval time of obtaining two adjacent frames of images, and rendering the (i + 1) th frame of image according to the interval time of obtaining two adjacent frames of images and the newly obtained human body key point image; and in the same way, rendering each frame of image.

In the synchronous rendering mode, if the execution time of the preset detection algorithm is longer, the time interval between the acquisition of the (i + 1) th frame image and the acquisition of the (i) th frame image is larger, even if the number of pictures played per second is smaller, the time interval between the display of the two adjacent frames of images is larger; that is, the rate of continuously playing the image frames is low, which may cause the video frames to be unsmooth and jerky.

In the asynchronous rendering mode, if the execution time of the preset detection algorithm is longer, the obtained ith frame of human body key point image shown in fig. 4b is longer; when the i +1 th frame of image is rendered, rendering the i +1 th frame of image based on the i th frame of human body key point image; FIG. 5c is a schematic view of an i +1 th frame image frame rendering as exemplary shown in some embodiments of the present application; after locating the human hand position based on the ith frame of human key point image, add a "magic wand" decoration (shown as a dotted line in fig. 5 c) to the ith frame of image as shown in fig. 5a based on the human hand position. As can be seen from fig. 5c, since the right-hand portion of the i +1 th frame image is changed with respect to the i-th frame image, the position of the right-side "magic bar" decoration in the rendering screen shown in fig. 5c does not match the position of the right-hand portion, i.e., does not correspond to the position of the right-hand portion.

When the image is rendered, the matching degree of the human body key point image and the image frame is low, namely the accuracy of the human body key point image used for rendering the current frame image is low, so that the image rendering result is deviated from the image frame.

Based on the above fig. 4a, 4b, 4c, 5a, 5b and 5c, fig. 6 exemplarily shows a schematic diagram of a screen rendering; as shown in fig. 6, when rendering the (i + 1) th frame image, the right arm direction of the person in the (i + 1) th frame image is upward, and the right arm direction of the i-th frame human body key point image is downward; assuming that the image rendering is performed on the right arm of the person, when the (i + 1) th frame of image is rendered, rendering is performed based on the (i) th frame of human body key point image, so that the rendering result shown in fig. 5c is displayed below the right arm in the (i + 1) th frame of image and is not displayed on the right arm in the (i + 1) th frame of image; therefore, when picture rendering is carried out, the accuracy rate of the human body key point image used for picture frame rendering is low, and the picture rendering result and the picture frame have deviation.

In summary, in order to increase the matching degree between the human body key point image and the image frame, the accuracy of the human body key point image used for image frame rendering is improved, and the deviation between the image rendering result and the image frame is avoided. In some embodiments, a schematic diagram of an image processing method is provided, which may be applied to a display apparatus 200 performing the image processing method, as shown in fig. 7. The method comprises the following steps:

the display device 200 calculates the ratio of the time consumed for detecting one frame of image to the time interval for acquiring two adjacent frames of images to obtain the number of queues.

In some embodiments, if it is determined that the time consumed for detecting one frame of image is longer than the time interval between the acquisition of two adjacent frames of images, calculating a ratio result between the time consumed for detecting one frame of image and the time interval between the acquisition of two adjacent frames of images according to an integer division algorithm; the ratio of the interval time of two adjacent frames of images to the preset FPS value is obtained, wherein the interval time of the two adjacent frames of images is 1 second. For example, if the preset FPS value is 100, the interval time Tc between two adjacent frames is 1000ms/100 is 10 ms.

In performing an integer algorithm, the ratio result typically includes at least the following two components: an integer portion and a remainder portion. In some embodiments, if the ratio result includes an integer portion and a remainder portion, then the ratio result is rounded up to obtain the number of queues. Wherein, rounding up means that 1 is added to the integer part of the ratio result, and the remainder part of the ratio result is ignored.

For example, if the preset detection algorithm detects that one frame of image takes time to be Ta equal to 50ms, and the display device 200 acquires two adjacent frames of images through its own camera with an interval time of Tc equal to 20ms, the ratio result is Ta/Tc equal to 50/20 equal to 2.5. I.e., the integer portion of the ratio result is 2 and the remainder portion is 10. Further, the number of queues n is 2+1 is 3.

In some embodiments, if the ratio result includes only an integer portion, the ratio result is taken as the number of queues. For example, if the preset detection algorithm detects that one frame of image takes time to be Ta equal to 40ms, the display device 200 acquires two adjacent frames of images through its own camera with an interval time of Tc equal to 20ms, and the result of the ratio is Ta/Tc equal to 40/20 equal to 2. And the number n of queues is 2.

In some embodiments, if it is determined that the time taken to detect one frame of image is less than or equal to the time interval between the acquisition of two adjacent frames of images, i.e., the integer part of the ratio result is 0 and the number of queues is 1, the image frame is directly acquired; the method is equivalent to identifying key points of the human body for each frame of image according to a synchronous detection mode.

After the display device 200 calculates the number of queues, a buffer queue is created according to the number of queues; taking an example based on the above embodiment, when the number of queues n is 3, 3 buffer queues are created, n1, n2, and n3, respectively.

In some embodiments, the display device 200 acquires the image frames frame by frame, i.e., there is an acquisition order in which the image frames are acquired; if the ith image frame is acquired at the ith time, the ith +1 image frame is acquired at the (i + 1) th time, wherein the ith time is earlier than the (i + 1) th time.

Therefore, when the image frames are stored in the buffer queue, the image frames can be sequentially stored in the buffer queue based on the acquisition order of the image frames. Taking an example based on the above embodiment, the number of buffer queues is 3, which are n1, n2, and n 3; assuming that the 1 st image frame, the 2 nd image frame, … …, and the ith image frame are sequentially acquired, after the 1 st image frame is acquired, the 1 st image frame is stored in a buffer queue n 1; after the 2 nd image frame is acquired, storing the 2 nd image frame into a buffer queue n 2; after the 3 rd image frame is acquired, storing the 3 rd image frame into a buffer queue n 3; after the 4 th image frame is acquired, storing the 4 th image frame into a buffer queue n 1; … …, respectively; after the ith image frame is obtained, storing the ith image frame into a buffer queue nx; wherein x is the result of the complementation between i and the number of queues.

For example, if i is 16 and n is 3, x is the remainder of i and the queue number n is 1, that is, after the 16 th image frame is acquired, the 16 th image frame is stored in the buffer queue n 1.

After sequentially storing the image frames in the buffer queues, the display device 200 detects human body images of the image frames in any buffer queue; generating a human body key point image based on the human body image; wherein, the human body key point image represents the human body posture in the human body image. As shown in fig. 4b or fig. 5b, the human key map images correspond to different human poses. That is, the display apparatus 200 performs the human body key point recognition on the image frames of the buffer queues asynchronously. For example, when the display device 200 performs human keypoint recognition on the 1 st image frame in the buffer queue n1, the display device 200 asynchronously performs human keypoint recognition on the 2 nd image frame in the buffer queue n 2. Therefore, the detection efficiency of the image frame is improved.

After the human body key point image is generated, the human body key point image is stored in a buffer queue. Then, the display device 200 sequentially renders the human body key point images and the image frames in the buffer queue according to the sequence of the buffer queue.

The sequence of the buffer queue is the storage sequence of the image frames; for example, the buffer queue includes n1, n2 and n3, and when the display device 200 acquires the 1 st image frame, the 1 st image frame is stored in the buffer queue n 1; when the display device 200 acquires the 2 nd image frame, storing the 2 nd image frame to a buffer queue n 2; when the display device 200 acquires the 3 rd image frame, storing the 3 rd image frame to the buffer queue n 3; when the display device 200 acquires the 4 th image frame, storing the 4 th image frame to a buffer queue n 1; and so on. Therefore, the sequence of the buffer queues is buffer queue n1, buffer queue n2 and buffer queue n 3.

Traversing the storage state in each buffer queue before rendering the picture; in some embodiments, if the storage states in the buffer queues are all stored, sequentially rendering the human body key point images and the image frames in the buffer queues from the first buffer queue according to the storage sequence of the buffer queues.

Based on the above embodiment, for example, the sequence of the buffer queues is n1, n2, and n3, and the first buffer queue is n 1; after determining that the image frames and the human body key point images corresponding to the image frames are stored in n1, n2 and n3, circularly rendering the image frames and the human body key point images of the image frames in the buffer queue from the buffer queue n1 in the order of n1, n2 and n 3.

For example, the 1 st image frame and the 1 st human body key point image in the buffer queue n1 are rendered first; then rendering the 2 nd image frame and the 2 nd human body key point image in a buffer queue n 2; re-rendering the 3 rd image frame and the 3 rd human body key point image in the buffer queue n 3; after the 3 rd image frame and the 3 rd human body key point image are rendered, the 4 th image frame and the 4 th human body key point image are pulled from the buffer queue n1 again, and the 4 th image frame and the 4 th human body key point image are rendered; and in the same way, rendering each frame of image.

In some embodiments, if the storage states in the first buffer queue are all stored, sequentially rendering the human body key point images and the image frames in the buffer queue from the first buffer queue according to the sequence of the buffer queue.

Since the time interval at which the buffer queue stores the image frames is the time interval at which the display device 200 acquires the two adjacent frames of images, rendering is performed based on the time interval at which the display device 200 acquires the two adjacent frames of images when performing screen rendering. Therefore, when the first cache queue is determined to store the human body key point images and the image frames, the human body key point images and the image frames can be sequentially pulled from each cache queue according to the sequence of the cache queues and based on the interval time of obtaining two adjacent frames of images; and rendering the human body key point image and the image frame after pulling the human body key point image and the image frame.

Taking an example based on the above embodiment, the 1 st image frame acquiring time is Tc1, and the time when the buffer queue n1 stores the 1 st image frame and the 1 st human body key point image is Ta 1; the 2 nd image frame obtaining time is Tc2, and the time when the 2 nd image frame and the 2 nd human body key point image are stored in the buffer queue n2 is Ta 2; wherein Ta1 is Tc1+ Ta (it takes time to detect one frame image), Ta2 is Tc2+ Ta, and Tc2-Tc1 is Tc (acquiring an interval time between two adjacent frames of images); from this, Ta2-Ta1 ═ Tc2-Tc1 ═ Tc can be obtained.

That is, after the 1 st image frame is rendered, the 2 nd image frame and the 2 nd human key point image are already stored in the buffer queue n2, and the 2 nd image frame can be rendered; by analogy, each image frame is rendered, and the fluency of rendering frame by frame is ensured.

In some embodiments, after any image frame and human body key point image in the buffer queue are rendered, setting labels for the rendered image frame and human body key point image; the label is used for representing that the image frame and the human body key point image are rendered and are not repeatedly rendered.

By way of example based on the above embodiment, after rendering of the 1 st image frame and the 1 st human body keypoint image in the buffer queue n1 is completed, setting tags for the 1 st image frame and the 1 st human body keypoint image; when the image frames and the human body key point images in the buffer queue n1 are rendered again, rendering is performed according to the sequence of the buffer image frames in the buffer queue n 1; for example, after the 3 rd image frame and the 3 rd human body key point image are rendered, the 4 th image frame and the 4 th human body key point image in the buffer queue n1 are rendered.

That is, a plurality of image frames and human body key point images of the image frames may be stored in any one of the buffer queues. Taking an example based on the above embodiment, the buffer queue n1 stores the 1 st image frame, the 1 st human body key point image, the 4 th image frame, the 4 th human body key point image, … …, the 3m +1 th image frame and the 3m +1 th human body key point image; the 2 nd image frame, the 2 nd human body key point image, the 5 th image frame, the 5 th human body key point image, the … …, the 3m +2 th image frame and the 3m +2 th human body key point image are stored in the buffer queue n 2; the 3 rd image frame, the 3 rd human body key point image, the 6 th image frame, the 6 th human body key point image, the … …, the 3m +3 th image frame and the 3m +3 th human body key point image are stored in the buffer queue n 3; wherein m is a natural number.

In some embodiments, after rendering of the image frames and the human body key point images in any one cache queue is completed, caching and releasing the rendered image frames and the rendered human body key point images; that is, only one image frame and the human body key point image corresponding to the image frame are stored in any buffer queue.

Taking an example based on the above embodiment, after the 1 st image frame and the 1 st human body keypoint image in the buffer queue n1 are rendered, the 1 st image frame and the 1 st human body keypoint image are released in the buffer queue n 1. After the display device 200 acquires the 4 th image frame, the 4 th image frame is stored in the buffer queue n1, and it is further ensured that only one image frame is stored in the buffer queue, so that the problem of disorder of rendered image frames is avoided during image rendering, the matching degree of the human body key point image and the image frame is increased, and the accuracy of the human body key point image used for image rendering is improved.

In some embodiments, the rendering time is less than or equal to the time interval between the acquisition of two adjacent frames of images; the rendering time duration represents the time duration for rendering any one frame of image frame and the human body key point image of the image frame.

To better illustrate the technical solution of the present invention, fig. 8 schematically shows an image processing method, as shown in fig. 8, the contents are as follows:

it takes 130ms to detect one frame image, and 40ms to acquire two adjacent frame images, and 40ms to acquire an image frame f1 and an image frame f2 as shown in fig. 8.

And calculating the number n of the queues according to the time consumption Ta for detecting one frame of image and the interval time Tc for obtaining two adjacent frames of images. Since the ratio result of Ta to Tc is 3.25, i.e. the integer part of the ratio result is 3, the number n of queues is 4. That is, the buffer queue includes, as shown in fig. 8, a buffer queue n1, a buffer queue n2, a buffer queue n3, and a buffer queue n4 in this order. The buffer queue n1 is the first buffer queue, and the image frame f1 is the first frame image.

As shown in fig. 8, the image frames acquired by the display apparatus 200 are acquired frame by frame, that is, the image frame f1, the image frame f2, the image frame f3, … … are acquired in this order. Wherein the acquisition interval of any two adjacent frames is Tc; if the time of acquiring the image frame f3 is t3, the time of acquiring the image frame f4 is t4, and t3-t4 is Tc.

After the display device 200 acquires the image frame f1, storing the image frame f1 into a buffer queue n1, executing a preset detection algorithm, and performing human key point identification on the image frame f1 to obtain a human key point image a 1; the human keypoint image a1 is then stored into the buffer queue n 1. That is, the buffer queue n1 at this time stores the human body key point image a1 corresponding to the image frame f1 and the image frame f 1.

Because it takes a time Ta to detect one frame image to be greater than the acquisition interval time Tc between two adjacent frames, and Ta is 3.25 times Tc. Therefore, before obtaining the human keypoint image a1, the display device 200 may sequentially acquire the image frame f2, the image frame f3, and the image frame f4 based on the acquisition interval time Tc between two adjacent frames.

After the display device 200 acquires the image frame f2, storing the image frame f2 into a buffer queue n2, executing a preset detection algorithm, and performing human key point identification on the image frame f2 to obtain a human key point image a 2; the human keypoint image a2 is then stored into the buffer queue n 2.

By analogy, after the display device 200 acquires the image frame f3, storing the image frame f3 into the buffer queue n3, executing a preset detection algorithm, and performing human key point identification on the image frame f3 to obtain a human key point image a 3; the human keypoint image a3 is then stored into the buffer queue n 3.

After the display device 200 acquires the image frame f4, storing the image frame f4 into a buffer queue n4, executing a preset detection algorithm, and performing human body key point identification on the image frame f4 to obtain a human body key point image a 4; the human keypoint image a4 is then stored into the buffer queue n 4.

That is, the display device 200 is asynchronously performed when the preset detection algorithm is performed, thereby improving the efficiency of human body key point recognition on the image frame.

As shown in fig. 8, the human body key point images of any two adjacent frames have a time interval Tc; if the time when the human body key point image a3 is recognized is r3, the time when the human body key point image a4 is recognized is r4, and r3-r4 are Tc.

After the image frame f1 is acquired, monitoring the storage state of the buffer queue n1 in real time, if the monitoring buffer queue n1 stores the human body key point image, circularly pulling the image frame and the human body key point image to a rendering queue from the buffer queue n1 according to the sequence of n1, n2, n3 and n4, and rendering the image frame and the human body key point image in the rendering queue in sequence.

For example, when it is determined that the human body key point image a1 is stored in the buffer queue n1, the human body key point image a1 and the image frame f1 are pulled from the buffer queue n1 and sent to the buffer queue. Further recording an image h1 to be rendered in the buffer queue; the image h1 to be rendered includes a human body key point image a1 and an image frame f 1. And then rendering of the image h1 to be rendered (i.e. the human key point image a1 and the image frame f1) in the buffer queue is started.

Because the time interval of the human body key point image of any two adjacent frames is Tc, the time interval of the image to be rendered of any two adjacent frames stored in the rendering queue is also Tc. Therefore, when the image to be rendered is rendered, rendering is performed based on the time interval Tc, so that the problems of low fluency and unsmooth rendering of the image due to excessive time consumption Ta of detecting one frame of image are avoided, and the fluency of rendering the image is improved.

Because any image to be rendered comprises an image frame and a human body key point image of the image frame, namely the image frame of the image to be rendered and the human body key point image are corresponding. Fig. 9 exemplarily shows a schematic diagram of image rendering, as shown in fig. 9, a human body key point image of an image frame is identified based on the image frame, and image rendering is performed according to the image frame and the human body key point image of the image frame, so that the problems of inaccurate image rendering result and unmatched image frame caused by excessive time consumption Ta for detecting one image frame are avoided, and the accuracy of image rendering is improved.

In some embodiments, after the image frames and the human body key point images in any one of the buffer queues are rendered, the rendered image frames and the rendered human body key point images are buffered and released. For example, the image frame f1 and the human body keypoint image a1 in the buffer queue n1 are sent to the rendering queue, and after the rendering is completed, the image frame f1 and the human body keypoint image a1 in the buffer queue n1 are released. That is, the buffer queue n1 after the buffer release does not store any image frame information.

Since the image frame f5 is not acquired after the rendering of the image frame f1 and the human key point image a1 is completed, the buffer queue n1 at this time is an empty queue. Therefore, only the information of one image frame is stored in any cache queue, so that the image frame is ensured to correspond to the human key point image, namely the human key point image is obtained based on the image frame, the accuracy of the human key point image used for image frame rendering is improved, and the deviation between the image rendering result and the image frame is avoided.

By analogy, after the image frame f4 is stored into the buffer queue n4, which corresponds to the completion of one round of storage, a new round of storage is started from the image frame f 5. That is, a new round of storage is started at the m × n +1 image frame; wherein n is the queue number of the buffer queue, and m is a natural number.

In some embodiments, the time taken for the preset detection algorithm to perform the human body key point identification is different for different image frames, that is, the time taken for detecting one image frame is changed.

If the time consumption of detecting one frame of image is changed, the historical maximum value of the time consumption of detecting one frame of image can be periodically calculated, and the queue number is calculated based on the maximum value, so that the fluency and the accuracy of picture rendering are ensured, and the flexibility and the real-time performance of image processing are improved.

For example, the cycle time is set to 1 minute, and it is assumed that the current time is 3: 00, then calculate 2: 59-3: the maximum time taken to detect one frame image between 00 is 50ms, and "50 ms" is taken as 3: 00-3: 01, and then 3: 00-3: number of queues between 01.

According to the technical scheme, before the image frame is rendered, the number of queues is determined according to the ratio of the time consumed for detecting one frame of image to the time interval for obtaining two adjacent frames of images, and a buffer queue is established based on the number of the queues; then, sequentially storing the obtained image frames to a buffer queue; the image frames stored in the cache queue are asynchronously identified by human key points; the efficiency and the real-time performance of image frame detection are improved.

Because the image frames stored in the buffer queues are asynchronously identified by the human key points, the time interval of obtaining the human key point images by the adjacent buffer queues is equivalent to the time interval of obtaining the two adjacent frames of images. And when rendering is carried out, rendering is carried out on the image frames in the buffer queue and the human body key point images corresponding to the image frames in sequence based on the sequence of the buffer queue.

The time interval of the human key point images obtained by the adjacent cache queues is equivalent to the time interval of the two adjacent frames of images, so that the problem of low image fluency caused by image blockage is avoided during image rendering, the matching degree of the human key point images and the image frames is increased, the accuracy of the human key point images used for image frame rendering is improved, and the deviation of image rendering results and the image frames is avoided.

The embodiments provided in the present application are only a few examples of the general concept of the present application, and do not limit the scope of the present application. Any other embodiments extended according to the scheme of the present application without inventive efforts will be within the scope of protection of the present application for a person skilled in the art.

Claims

1. A display device, comprising:

a display configured to display an image frame picture;

a controller configured to:

2. The display device of claim 1, wherein the controller, prior to creating the buffer queue according to the number of queues, is further configured to:

if the time consumed for detecting one frame of image is judged to be less than or equal to the time interval for acquiring two adjacent frames of images, directly acquiring the image frame;

and if the time consumed for detecting one frame of image is judged to be longer than the time interval for acquiring two adjacent frames of images, calculating the number of queues.

3. The display device of claim 2, wherein the controller calculates a ratio of a time taken to detect one frame of image to a time taken to acquire two adjacent frames of images, and the number of queues is specifically configured to:

carrying out integer division operation on the consumed time for detecting one frame of image and the interval time for acquiring two adjacent frames of images to obtain a ratio result; the ratio of the interval time of two adjacent frames of images to a preset FPS value is obtained;

and if the ratio result has a remainder, rounding the ratio result upwards to obtain the number of the queues.

4. The display device according to claim 1, wherein the controller sequentially stores image frames to the buffer queue according to an order in which the image frames are acquired, and is specifically configured to:

storing the ith image frame to an x-th buffer queue; wherein x is the result of the complementation of i and the number of queues.

5. The display device according to claim 1, wherein the controller performs human keypoint identification on the image frames in the buffer queue to obtain a human keypoint image, and is specifically configured to:

detecting a human body image in the image frame;

generating a human body key point image based on the human body image; the human body key point image represents the human body posture in the human body image;

after the human body key point image is generated, the human body key point image is stored in the buffer queue.

6. The display device according to claim 1, wherein the controller sequentially renders the human body keypoint images and the image frames in the buffer queue according to an order of the buffer queue, and is specifically configured to:

traversing the storage state in each buffer queue;

when the human body key point images and the image frames are judged to be stored in the first cache queue, the human body key point images and the image frames in the cache queue are rendered in sequence from the first cache queue according to the sequence of the cache queue.

7. The display device according to claim 6, wherein the controller sequentially renders the human body key point images and the image frames in the buffer queue in order of the buffer queue, and is specifically configured to:

sequentially pulling the human body key point image and the image frame from the buffer queue according to the sequence of the buffer queue based on the interval time of obtaining two adjacent frames of images;

and aiming at any buffer queue, after the human body key point images and the image frames in the buffer queue are pulled, rendering the human body key point images and the image frames in the buffer queue.

8. The display device of claim 7, wherein the controller, after rendering the human keypoint images and the image frames in the buffer queue, is further configured to:

and releasing the buffer of the buffer queue.

9. The display device according to claim 1, wherein rendering time is less than or equal to the time between the acquisition of two adjacent frames of images; the rendering time duration represents the time duration for rendering any one frame of image frame and the human body key point image of the image frame.

10. An image processing method, characterized in that the method is applied to a display device;

the method comprises the following steps: