CN109685802B - Low-delay video segmentation real-time preview method - Google Patents

Low-delay video segmentation real-time preview method Download PDF

Info

Publication number
CN109685802B
CN109685802B CN201811527499.6A CN201811527499A CN109685802B CN 109685802 B CN109685802 B CN 109685802B CN 201811527499 A CN201811527499 A CN 201811527499A CN 109685802 B CN109685802 B CN 109685802B
Authority
CN
China
Prior art keywords
frames
image segmentation
video stream
values
segmentation result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811527499.6A
Other languages
Chinese (zh)
Other versions
CN109685802A (en
Inventor
巩晓雅
邬静云
刘国良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Luzhou Hemiao Communication Technology Co ltd
Original Assignee
Luzhou Hemiao Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Luzhou Hemiao Communication Technology Co ltd filed Critical Luzhou Hemiao Communication Technology Co ltd
Priority to CN201811527499.6A priority Critical patent/CN109685802B/en
Publication of CN109685802A publication Critical patent/CN109685802A/en
Application granted granted Critical
Publication of CN109685802B publication Critical patent/CN109685802B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a low-delay video segmentation real-time preview method, which comprises the following steps: processing key frames of the video stream by adopting a network structure based on deep learning to obtain a second image segmentation result; processing transition frames among key frames of the video stream by adopting a gray projection algorithm to obtain a second image segmentation result of the transition frames; and adopting a low-delay display strategy to display the first image segmentation result in real time. The network structure for deep learning processes the key frames of the video stream, so that the image segmentation can be accurately carried out; the transition frames among the key frames of the video stream are processed by adopting a gray projection algorithm, so that the similarity among the video frames can be utilized to rapidly propagate the segmentation result of the previous frame, the segmentation time of each frame is shorter, and the fluency of the video is ensured; and combining the accurate result obtained by processing the key frames by the network structure of the deep learning with a low-delay strategy of the video sequence, so that the video has no click feeling and hysteresis feeling.

Description

Low-delay video segmentation real-time preview method
Technical Field
The application relates to the technical field of video segmentation, in particular to a low-delay video segmentation real-time preview method.
Background
Image segmentation is an important component of computer vision, and has wide application in real life, such as tissue detection, disaster assessment, face-beautifying, intelligent mapping and the like in medical images. The video image segmentation refers to the process of obtaining a binary image by segmenting the foreground and the background of an object in each frame of image in a video, and has higher requirements on real-time performance due to the fact that the smoothness of the video is ensured. In recent years, the deep learning method has been developed rapidly, and in terms of precision, the deep learning method is greatly improved compared with the traditional method, so that the image segmentation method based on the deep learning gradually becomes a research hot spot. With the development of technology and the improvement of computing power of devices, many video segmentation applications are gradually deployed to mobile devices, especially to smart phones. However, the method based on deep learning is time-consuming, the computing power of the mobile device is low, and how to use the image segmentation method based on deep learning in the video on the mobile device can display the result of each frame of image segmentation in real time is a very challenging research topic.
Disclosure of Invention
Aiming at the problems that the computing capacity of the existing mobile equipment is low and the image segmentation method based on deep learning is used in the video applied to the mobile equipment to display the result of each frame of image segmentation in real time, the application provides a low-delay video segmentation real-time preview method which is used for solving the problems existing in the existing research subjects.
According to a first aspect of the present application, there is provided a low-delay video segmentation real-time preview method, comprising the steps of:
processing key frames of the video stream on the equipment by adopting a network structure based on deep learning to obtain a second image segmentation result;
carrying out translation vector calculation on transition frames among key frames of the video stream on the equipment by adopting a gray projection algorithm so as to obtain a second image segmentation result of the transition frames;
a low-delay display strategy is adopted to display a first image segmentation result in real time, wherein the first image segmentation result refers to the content of processing a second image segmentation result to be displayed on a screen in real time;
the key frames and transition frames are determined according to the computational capabilities of the device itself.
Further, the step of processing the key frames of the video stream on the device by the network structure based on the deep learning to obtain the second image segmentation result is as follows:
extracting an original image from a video stream;
performing convolution operation on the original image to obtain low-level features of the original image;
performing dense hole convolution operation on the low-level features to obtain high-level features;
and decoding the low-level features and the high-level features to obtain corresponding second image segmentation results.
Further, the step of performing translation vector calculation on transition frames between key frames of the video stream on the device by using a gray projection algorithm to obtain a second image segmentation result of the transition frames includes:
the color images in the video stream are gray mapped in the G channel.
Further, the step of performing translation vector calculation on transition frames between key frames of the video stream on the device by using a gray projection algorithm to obtain a second image segmentation result of the transition frames includes:
and searching the minimum value of the Euclidean distance of the line/column gray level projection curve between two frames of images at all positions by adopting a class binary search method.
Further, the searching for the minimum value of the euclidean distance of the line/column gray level projection curve between two frames of images in all positions by using the class binary search method means that:
step one, selecting three values among the effective N values, and taking the minimum value of the three values as a first center point;
step two, taking the first central point in the step one as a center, shortening the searching radius to be 1/2 of the first central point, selecting two values from the rest values, comparing the two values with the first central point, and selecting the minimum value in the three values as a second central point;
step three, taking the second center point in the step two as a center, shortening the searching radius to be 1/2 of the second center point, selecting two values from the rest values, comparing the two values with the second center point, and selecting the minimum value in the three values as a third center point;
and the like, until the residual numerical value is not more than three, comparing the residual numerical value with the central point in the previous step, and selecting the minimum value as the central point, wherein the central point is the minimum value.
Further, the use of the low-delay display strategy to display the first image segmentation result in real time means that:
the method comprises the steps that a strategy of displaying a look-ahead is used for a key frame, a process of waiting for the forward propagation of a key frame neural network is not suspended by a preview process, a rough result obtained through the propagation of a previous frame is used firstly, a complex operation process is transferred to a background to operate, an accurate result is obtained, and then the rough result is replaced with the accurate result in a time sequence; the preview process refers to real-time display content of a screen in the video stream transmission process; the process of the key frame neural network forward propagation refers to the process of processing key frames by using the network structure based on deep learning; the rough result is a second image segmentation result of the key frame, which is obtained by carrying out translation vector calculation on the previous frame of the key frame and the key frame by adopting a gray projection algorithm; the complex operation process refers to a process of forward propagation of a key frame neural network; the time sequence is a frame sequence formed by arranging each key frame and each transition frame according to the time sequence of the video stream when the video stream is transmitted; the accurate result refers to a second image segmentation result obtained after the key frame is processed by using the network structure based on the deep learning.
Compared with the prior art, the application has the beneficial effects that:
1. the application adopts the network structure of deep learning to process the key frames of the video stream on the equipment, and can accurately divide the images;
2. the transition frames among the key frames of the video stream on the equipment are processed by adopting a gray projection algorithm, so that the similarity among the video frames can be utilized to rapidly propagate the segmentation result of the previous frame, the segmentation time of each frame is shorter, and the fluency of the video is ensured;
3. and a low-delay strategy combining the accurate result obtained by processing the key frames of the video stream on the device by the deep-learning network structure with the time sequence, so that the video has no click feeling and no lag feeling.
Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a flow chart of a low-latency video segmentation live preview method in an embodiment of the present application;
FIG. 2 is a flow chart of processing key frames of a video stream on a device to obtain a second image segmentation result using a deep learning based network architecture in an embodiment of the present application;
FIG. 3 is a schematic diagram of a low-latency display strategy for displaying a first image segmentation result in real time according to an embodiment of the present application;
fig. 4 is a partial block diagram of a smart phone when the device is the smart phone according to an embodiment of the present application.
Detailed Description
In order to enable those skilled in the art to better understand the present application, the following description will further explain the technical solutions in the embodiments of the present application by referring to the figures in the embodiments of the present application.
In some of the flows described in the specification and claims of the present application and in the foregoing figures, a plurality of operations occurring in a particular order are included, but it should be understood that the operations may be performed out of order or performed in parallel, with the order of operations such as 11, 12, etc., being merely used to distinguish between the various operations, the order of the operations themselves not representing any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types.
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.
It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Example 1
As shown in fig. 1, a low-delay video segmentation real-time preview method according to an embodiment of the present application is provided, including the steps of:
the key frames and the transition frames in this embodiment are determined according to the computing capability of the device itself.
S1, processing key frames of a video stream on equipment by adopting a network structure based on deep learning to obtain a second image segmentation result;
since deep learning is proposed, a deep network is continuously developed towards a wider and deeper direction, and the precision is continuously improved by matching with mass data, so that a remarkable effect is obtained. The full convolution network (Fully Convolutional Networks, FCN) removes the full connection layer in the traditional neural network, realizes the classification of pixel level, and is a great breakthrough of deep learning in the field of image segmentation. On the basis of FCN, many scholars have proposed more large deep networks, such as UNet, segNet, deepLab series, and the like, with significant success. However, the large-scale deep network is based on massive data and huge computing overhead, and running the large-scale deep network on the mobile device consumes a lot of time, and cannot meet the requirement of the existing mobile device on 'faster and better' in practical application, so the processing of the video stream by the network structure based on deep learning in this embodiment is as follows:
as shown in fig. 2, the step of processing the key frames of the video stream on the device to obtain the second image segmentation result based on the deep learning network structure is as follows:
s11, extracting an original image from a video stream;
s12, carrying out convolution operation on the original image to obtain low-level features of the original image;
s13, performing dense hole convolution operation on the low-level features to obtain high-level features;
the advantage of dense hole convolution is that when the image needs global information, the computation amount is not increased, and the receptive field is increased, so that each convolution output contains a larger range of information.
S14 decodes the low-level features and the high-level features to obtain corresponding second image segmentation results, which refer to the class of each pixel of the image, in particular, in the front-background segmentation, a binary image.
The decoding process is a process of performing deconvolution operation on the low-level features and the high-level features based on the deep-learning network structure to obtain corresponding second image segmentation results.
S2, carrying out translation vector calculation on transition frames among key frames of the video stream on the equipment by adopting a gray projection algorithm so as to obtain a second image segmentation result of the transition frames.
A series of videos are analyzed frame by frame, and when the videos run smoothly, the correlation of two adjacent frames of images is larger, and for some mobile devices with not very high requirements on accuracy, the translational motion is enough to express the motion change between the two frames of images. The gray projection algorithm is a method for calculating translation vectors of two images, so that the gray projection algorithm is adopted to carry out image propagation after image segmentation between two adjacent frames.
The principle of the gray projection algorithm is briefly described as follows: for each frame of image in the video sequence, mapping the gray value of each frame of image into two independent one-dimensional waveforms to obtain projection curves of rows and columns of each frame, respectively carrying out correlation operation on the projection curves of two adjacent frames of images t and t+1, and obtaining translation vectors of the frames t and t+1 when the correlation is maximum, wherein the Euclidean distance is adopted for the correlation calculation.
Euclidean Distance (Euclidean Distance), also known as Euclidean metric, euclidean Distance, is a commonly used Distance definition, which is the true Distance between two points in an m-dimensional space; the euclidean distance in the two-dimensional space is the distance of a straight line segment between two points, and the smaller the euclidean distance is, the larger the correlation is represented.
In order to further improve the speed of the gray projection algorithm, the following two improvements are made:
first, gray mapping is carried out on color images in a video stream in a G channel; scientific researches show that the maximum photosensitivity of human eyes is located at 555nm, namely near green light, and color images in video streams can express enough row and column gray scale distribution of images in a G channel, and the time required by image graying is saved.
Secondly, searching the minimum value of the Euclidean distance of the line/column gray level projection curve between two frames of images at all positions by adopting a class binary search method; the gray projection algorithm adopts a global search method, so that the calculated amount is large. The line/column correlation curve between two frames of images has a single peak characteristic, so that the minimum value of the Euclidean distance of the line/column gray level projection curve between two frames of images at all positions can be searched by using a class binary search method, and the calculated amount can be greatly reduced.
Searching the minimum value of the Euclidean distance of the line/column gray level projection curve between two frames of images at all positions by adopting a class binary search method, wherein the minimum value is as follows:
uniformly selecting three points in the effective searching range, taking the minimum value of the three points as a central point, shortening 1/2 of the searching radius, and cycling the process again until convergence, wherein the obtained point is the minimum value, in other words:
step one, selecting three values among the effective N values, and taking the minimum value of the three values as a first center point;
step two, taking the first central point in the step one as a center, shortening the searching radius to be 1/2 of the first central point, selecting two values from the rest values, comparing the two values with the first central point, and selecting the minimum value in the three values as a second central point;
step three, taking the second center point in the step two as a center, shortening the searching radius to be 1/2 of the second center point, selecting two values from the rest values, comparing the two values with the second center point, and selecting the minimum value in the three values as a third center point;
and the like, until the residual values do not exceed three, selecting the minimum value as a central point, wherein the central point is the minimum value.
For example: three values were chosen uniformly among the available 9 values, which were 98, 68, 49, 21, 15, 16, 19, 45, 88 in turn. Firstly, uniformly selecting three values, namely selecting two end point values and a middle value, namely selecting 98, 15 and 88, and taking the minimum value of the three points as 15; then shortening the searching radius to 1/2 of the previous step, continuously and uniformly selecting three numbers from the rest numerical values by taking 15 as the center, namely selecting 49, 15 and 19, and taking the minimum value of the three points as 15; finally, shortening the searching radius to 1/2 of the previous step, continuously and uniformly selecting three numbers from the remaining numbers, wherein only 3 numbers, namely 21, 15 and 16, are left in the remaining numbers, and the minimum value in the three points is 15; the minimum value obtained by the search is 15.
S3, adopting a low-delay display strategy to display a first image segmentation result in real time, wherein the first image segmentation result refers to the content of processing a second image segmentation result to be displayed on a screen in real time.
The processing of the second image segmentation result to display the content of the screen in real time means that the processing of special effect editing of the video image to display the content of the screen in real time is performed on the second image segmentation result, wherein the special effect editing processing of the video image comprises road highlighting, background blurring and the like, and the special effect editing processing of the video image is set in advance according to application scenes.
In many real-time applications, low latency is very important. The method has the advantages that the quick gray projection algorithm is utilized to conduct the transmission of the image segmentation results among frames, the similarity among video frames can be utilized to conduct the quick transmission of the segmentation results of the previous frame, the segmentation time of each frame is short, but the key frames still need to be calculated through a complex neural network to obtain the image segmentation results, and thus the click feeling and the hysteresis feeling are caused.
To solve this problem, the present embodiment adopts a low-latency display strategy to display the first image segmentation result in real time, and the process is that: the method comprises the steps that a strategy of displaying a look-ahead is used for a key frame, a process of waiting for the forward propagation of a key frame neural network is not suspended by a preview process, a rough result obtained through the propagation of a previous frame is used firstly, a complex operation process is transferred to a background operation, an accurate result is obtained, and then the rough result is replaced with the accurate result in a time sequence; the preview process refers to real-time display content of a screen in the video stream transmission process; the process of the key frame neural network forward propagation refers to the process of processing the key frame by using a network structure based on deep learning; the rough result is a second image segmentation result of the key frame, which is obtained by carrying out translation vector calculation on the previous frame of the key frame and the key frame by adopting a gray projection algorithm; the complex operation process refers to the process of the forward propagation of the key frame neural network; the time sequence refers to a frame sequence formed by arranging each key frame and each transition frame according to the time sequence of the video stream when the video stream is transmitted; the accurate result refers to a second image segmentation result obtained after processing the key frame using the deep learning based network structure.
The above process can also be expressed as: the executing process of the low-delay display strategy comprises a process of forward propagation of the key frame neural network, but in the video display process, the process of forward propagation of the key frame neural network is firstly transferred to background operation to obtain a second image segmentation result, meanwhile, a frame on the key frame and a translation vector of the key frame are calculated to obtain a rough second image segmentation result (rough result) of the key frame, and the rough second image segmentation result (rough result) of the key frame is processed into a rough first image segmentation result and then is displayed. After the background operation is completed, the rough second image segmentation result is replaced by a second image segmentation result in the time sequence, as shown in fig. 3, wherein the mask is the second image segmentation result obtained by the segmentation network.
The terminal in this embodiment may be a terminal including a smart phone, a tablet pc, a PDA (Personal Digital Assistant ), etc., taking the smart phone as an example:
referring to fig. 4, a block diagram of a part of the structure of a smart phone includes a processor 401, a memory 402, an operating system 403, a bluetooth module 404, a display module 405, an audio processing module 406, a video processing module 407, a sensor module 408, a communication module 409, a wireless network module 410, a power module 411, a key module 412, an interface module 413, an input/output module 414, an RF circuit module 415, and a positioning module 416.
The processor 401 is a control center of the mobile phone, connects each module of the whole mobile phone through an interface and a line, and performs data processing by running or executing a built-in operating system 403, a software program and/or a module stored in the memory 402 and calling data stored in the memory 402, thereby performing various corresponding functions, and thus performing overall control of the mobile phone. Optionally, the processor 401 may include one or more processing units; preferably, the processor 401 may integrate an application processor and a modem processor, wherein the application processor is mainly applied in terms of an operating system 403, a user interface, and application programs, etc., and the modem processor is mainly applied in terms of wireless communication.
The memory 402 mainly includes a storage program area that can store an operating system 403 and application programs (such as a sound playing function, an image/video playing function, etc.) required for at least one function, and a storage data area; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, memory 402 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device.
The operating system 403 is the kernel and the keystone of the system, and is also a program for managing system hardware and system software resources, such as managing and configuring memory, determining the priority of supply and demand of system resources, controlling input and output devices, connecting networks, managing file systems, and the like. For example, operating system 403 provides an operator interface for a user to interact with the system. The classic operating systems include an Android operating system and an iOS operating system.
The bluetooth module 404 is a PCBA board with integrated bluetooth function, specifically, a chip basic circuit set with integrated bluetooth function is used for short-distance wireless communication, and is divided into a bluetooth data module and a bluetooth voice module according to the functions.
The display module 405 includes a display 4051, the display 4051 typically being a liquid crystal display for displaying text, pictures, animations and video, the display 4051 having a touch function that when detected by a touch operation thereon or thereabout is communicated to the processor 401 to determine the type of touch event, and the processor 401 then provides a corresponding visual output on the display 4051 based on the type of touch event.
The audio processing module 406 includes a microphone 4061 and an audio processor 4062; generally, after collecting the sound signals, the microphone 4061 converts the collected sound signals into electrical signals, and the audio processor 4062 receives the electrical signals and converts the electrical signals into audio data; when audio data needs to be played, the audio processor 4062 converts the received audio data into an electrical signal, and then transmits the electrical signal to the microphone 4061, and the electrical signal is converted into a sound signal by the microphone 4061 and output.
The video processing module 407 includes a camera 4071 and a graphics processor; the camera 4071 is used for capturing images and videos, and the graphic processor 4072 processes the stored images or videos for noise cancellation, correction of distortion patterns, enhancement of sharpness, background blurring processing mentioned in the present application, and the like.
The sensor module 408 includes a variety of sensors such as light sensors, motion sensors, gyroscopes, barometers, hygrometers, thermometers, infrared sensors, and the like. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display screen according to the brightness of ambient light, and the proximity sensor may turn off the display screen and/or the backlight when the mobile phone moves to the ear. The acceleration sensor is used as one of the motion sensors, can detect the acceleration in all directions (generally three axes), can detect the gravity and the direction when the motion sensor is stationary, and can be used for recognizing the application of the gesture of the mobile phone, such as switching of a transverse screen and a vertical screen, gesture calibration of a magnetometer and the like.
The communication module 409 is used for processing and transmitting all message types such as information and voice, for example, receiving and sending information, making a call, and making a voice call through communication software.
A wireless network module 410 comprising a WiFi unit; the user accesses the internet, for example, e-mails, browses web pages, accesses streaming media, etc., through the wireless network module 410 of the cellular phone.
The power module 411 includes a battery 4111 and a power management system 4112, wherein the power management system 4112 is logically connected to the processor 401 to perform functions such as charging, discharging, and power consumption management of the battery 4111.
The key module 412 includes at least a power key and a volume up-down key; the power button controls the state of the power module 411 of the mobile phone; the volume up-down key is generally used for adjusting the volume of mobile phone audio/video and other media, and can also be used for adjusting the brightness and the darkness of the mobile phone; furthermore, the combination of the power key and the volume increasing and decreasing key can also be used for screen capturing, switching on/off, restarting, system restoration and the like of the mobile phone.
The interface module 413 includes a card connection unit, an earphone interface, a data interface and/or a power interface, where the card connection unit is used to insert a data card and a SIM card, and the data card can expand the storage space of the mobile phone; after the SIM card is inserted into the mobile phone, the user can dial and contact with the user holding other terminals inserted with the SIM card; the user can also connect with the network through the data flow of the SIM card; the earphone interface, the data interface and the power interface are different in the expression forms of different mobile phones, are integrated, are independent, and are partially overlapped and partially independent.
The input/output module 414 is configured to receive input digital or character information, and obtain information input by a user through the operating system 403 at an interface of the display screen.
The RF circuit module 415 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the RF circuit module 415 may also communicate with other devices through the wireless network module 410.
The positioning module 416 is used for positioning the current geographic position of the mobile phone to realize navigation or location-based services; the positioning module 416 is generally based on the location information of the GPS system (Global Positioning System) in the united states or the beidou system in china to position the mobile phone.
Those skilled in the art will appreciate that the structure shown in fig. 4 is not meant to be limiting, and may include more or less components than shown, or may combine certain components, or may employ a different arrangement of components, which is not described in detail herein.
In the embodiments provided by the present application, it should be understood that the described method may be implemented in other ways. For example, the above-described method embodiments are merely illustrative, the division of the method is merely a logic function division, there may be another division manner in actual implementation, and some or all units may be selected according to actual needs to achieve the purpose of the embodiment.
The foregoing is only a partial embodiment of the present application, and it should be noted that it will be apparent to those skilled in the art that modifications and adaptations can be made without departing from the principles of the present application, and such modifications and adaptations are intended to be comprehended within the scope of the present application.

Claims (5)

1. A low-delay video segmentation real-time preview method, comprising the steps of:
processing key frames of the video stream on the equipment by adopting a network structure based on deep learning to obtain a second image segmentation result;
carrying out translation vector calculation on transition frames among key frames of the video stream on the equipment by adopting a gray projection algorithm so as to obtain a second image segmentation result of the transition frames;
a low-delay display strategy is adopted to display a first image segmentation result in real time, wherein the first image segmentation result refers to the content of processing a second image segmentation result to be displayed on a screen in real time;
the key frames and the transition frames are determined according to the operation capability of the equipment;
the adoption of the low-delay display strategy to display the first image segmentation result in real time means that:
the method comprises the steps that a strategy of displaying a look-ahead is used for a key frame, a process of waiting for the forward propagation of a key frame neural network is not suspended by a preview process, a rough result obtained through the propagation of a previous frame is used firstly, a complex operation process is transferred to a background to operate, an accurate result is obtained, and then the rough result is replaced with the accurate result in a time sequence; the preview process refers to real-time display content of a screen in the video stream transmission process; the process of the key frame neural network forward propagation refers to the process of processing key frames by using the network structure based on deep learning; the rough result is a second image segmentation result of the key frame, which is obtained by carrying out translation vector calculation on the previous frame of the key frame and the key frame by adopting a gray projection algorithm; the complex operation process refers to a process of forward propagation of a key frame neural network; the time sequence is a frame sequence formed by arranging each key frame and each transition frame according to the time sequence of the video stream when the video stream is transmitted; the accurate result refers to a second image segmentation result obtained after the key frame is processed by using the network structure based on the deep learning.
2. The method of claim 1, wherein the step of processing key frames of the video stream on the device by the deep learning based network structure to obtain the second image segmentation result is as follows:
extracting an original image from a video stream;
performing convolution operation on the original image to obtain low-level features of the original image;
performing dense hole convolution operation on the low-level features to obtain high-level features;
and decoding the low-level features and the high-level features to obtain corresponding second image segmentation results.
3. The method of claim 1, wherein the step of performing translation vector calculation on transition frames between key frames of the video stream on the device using the gray projection algorithm to obtain the second image segmentation result of the transition frames comprises:
the color images in the video stream are gray mapped in the G channel.
4. A method according to claim 3, wherein the step of performing translation vector calculation on transition frames between key frames of the video stream on the device using a gray projection algorithm to obtain a second image segmentation result of the transition frames comprises:
and searching the minimum value of the Euclidean distance of the line/column gray level projection curve between two frames of images at all positions by adopting a class binary search method.
5. The method of claim 4, wherein the searching for the minimum value of the euclidean distance of the line/column gray level projection curve between two frames of images at all positions by using the class binary search method means:
step one, selecting three values among the effective N values, and taking the minimum value of the three values as a first center point;
step two, taking the first central point in the step one as a center, shortening the searching radius to be 1/2 of the first central point, selecting two values from the rest values, comparing the two values with the first central point, and selecting the minimum value in the three values as a second central point;
step three, taking the second center point in the step two as a center, shortening the searching radius to be 1/2 of the second center point, selecting two values from the rest values, comparing the two values with the second center point, and selecting the minimum value in the three values as a third center point;
and the like, until the residual values do not exceed three, selecting the minimum value as a central point, wherein the central point is the minimum value.
CN201811527499.6A 2018-12-13 2018-12-13 Low-delay video segmentation real-time preview method Active CN109685802B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811527499.6A CN109685802B (en) 2018-12-13 2018-12-13 Low-delay video segmentation real-time preview method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811527499.6A CN109685802B (en) 2018-12-13 2018-12-13 Low-delay video segmentation real-time preview method

Publications (2)

Publication Number Publication Date
CN109685802A CN109685802A (en) 2019-04-26
CN109685802B true CN109685802B (en) 2023-09-15

Family

ID=66186592

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811527499.6A Active CN109685802B (en) 2018-12-13 2018-12-13 Low-delay video segmentation real-time preview method

Country Status (1)

Country Link
CN (1) CN109685802B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110322445B (en) * 2019-06-12 2021-06-22 浙江大学 Semantic segmentation method based on maximum prediction and inter-label correlation loss function
CN110490858B (en) * 2019-08-21 2022-12-13 西安工程大学 Fabric defective pixel level classification method based on deep learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101287143A (en) * 2008-05-16 2008-10-15 清华大学 Method for converting flat video to tridimensional video based on real-time dialog between human and machine
CN103533255A (en) * 2013-10-28 2014-01-22 东南大学 Motion displacement curve simplification based automatic segmentation method for video scenes
CN108062753A (en) * 2017-12-29 2018-05-22 重庆理工大学 The adaptive brain tumor semantic segmentation method in unsupervised domain based on depth confrontation study
CN108198202A (en) * 2018-01-23 2018-06-22 北京易智能科技有限公司 A kind of video content detection method based on light stream and neural network
CN108242062A (en) * 2017-12-27 2018-07-03 北京纵目安驰智能科技有限公司 Method for tracking target, system, terminal and medium based on depth characteristic stream

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140210944A1 (en) * 2013-01-30 2014-07-31 Samsung Electronics Co., Ltd. Method and apparatus for converting 2d video to 3d video
US10303984B2 (en) * 2016-05-17 2019-05-28 Intel Corporation Visual search and retrieval using semantic information
EP3500911B1 (en) * 2016-08-22 2023-09-27 Magic Leap, Inc. Augmented reality display device with deep learning sensors

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101287143A (en) * 2008-05-16 2008-10-15 清华大学 Method for converting flat video to tridimensional video based on real-time dialog between human and machine
CN103533255A (en) * 2013-10-28 2014-01-22 东南大学 Motion displacement curve simplification based automatic segmentation method for video scenes
CN108242062A (en) * 2017-12-27 2018-07-03 北京纵目安驰智能科技有限公司 Method for tracking target, system, terminal and medium based on depth characteristic stream
CN108062753A (en) * 2017-12-29 2018-05-22 重庆理工大学 The adaptive brain tumor semantic segmentation method in unsupervised domain based on depth confrontation study
CN108198202A (en) * 2018-01-23 2018-06-22 北京易智能科技有限公司 A kind of video content detection method based on light stream and neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
无特定背景条件下运动目标的分割算法研究;王建平 等;《计算机应用与软件》;20100331;第27卷(第3期);第256-259页 *
灰度投影结合Fisher准则的对象分割;胡正平;《计算机工程与设计》;20050930;第26卷(第9期);第2439-2442页 *

Also Published As

Publication number Publication date
CN109685802A (en) 2019-04-26

Similar Documents

Publication Publication Date Title
CN108629747B (en) Image enhancement method and device, electronic equipment and storage medium
CN110149541B (en) Video recommendation method and device, computer equipment and storage medium
CN110059744B (en) Method for training neural network, method and equipment for processing image and storage medium
JP7154678B2 (en) Target position acquisition method, device, computer equipment and computer program
CN108629354B (en) Target detection method and device
CN111369427B (en) Image processing method, image processing device, readable medium and electronic equipment
US11443438B2 (en) Network module and distribution method and apparatus, electronic device, and storage medium
US11763470B2 (en) Method and apparatus for aligning image frames, electronic device and storage medium
CN107071263A (en) A kind of image processing method and terminal
CN113076814A (en) Text area determination method, device, equipment and readable storage medium
CN109685802B (en) Low-delay video segmentation real-time preview method
CN103533228A (en) Method and system for generating a perfect shot image from multiple images
CN113822322A (en) Image processing model training method and text processing model training method
CN111325220B (en) Image generation method, device, equipment and storage medium
CN114547428A (en) Recommendation model processing method and device, electronic equipment and storage medium
CN113050860A (en) Control identification method and related device
WO2022095640A1 (en) Method for reconstructing tree-shaped tissue in image, and device and storage medium
CN110728167A (en) Text detection method and device and computer readable storage medium
CN112270238A (en) Video content identification method and related device
CN111626035A (en) Layout analysis method and electronic equipment
CN115499577B (en) Image processing method and terminal equipment
CN113032560B (en) Sentence classification model training method, sentence processing method and equipment
CN116129211A (en) Target identification method, device, equipment and storage medium
CN111310701B (en) Gesture recognition method, device, equipment and storage medium
CN112990208B (en) Text recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20230816

Address after: 646000 Buildings 13 and 15, No. 1, Section 6, Jiugu Avenue, Jiangyang District, Luzhou, Sichuan

Applicant after: Luzhou hemiao Communication Technology Co.,Ltd.

Address before: 563000 B Building 2, Zunyi Software Park, economic development zone, Xinpu New District, Zunyi, Guizhou.

Applicant before: GUIZHOU MARS EXPLORATION TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant