CN117615172A - Video stream identification method, device, computer equipment and storage medium - Google Patents

Video stream identification method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN117615172A
CN117615172A CN202311636457.7A CN202311636457A CN117615172A CN 117615172 A CN117615172 A CN 117615172A CN 202311636457 A CN202311636457 A CN 202311636457A CN 117615172 A CN117615172 A CN 117615172A
Authority
CN
China
Prior art keywords
video stream
video
identification
recognition
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311636457.7A
Other languages
Chinese (zh)
Inventor
郝亮
陈志强
徐瑛琦
王建业
刘培培
郭之越
王晓亚
王妮
高翠玲
黄泽阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiongan Weisaibo Intelligent Technology Co ltd
Original Assignee
Xiongan Weisaibo Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiongan Weisaibo Intelligent Technology Co ltd filed Critical Xiongan Weisaibo Intelligent Technology Co ltd
Priority to CN202311636457.7A priority Critical patent/CN117615172A/en
Publication of CN117615172A publication Critical patent/CN117615172A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/21805Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • H04N7/181Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a plurality of remote sources

Abstract

The embodiment of the invention discloses a video stream processing method, a video stream processing device, computer equipment and a storage medium. The method comprises the following steps: acquiring a video stream transmitted through a video channel; acquiring at least one identification type corresponding to the video channel; performing target recognition on the video stream according to each recognition type to obtain a recognition result corresponding to each recognition type; and adding each identification result into the video stream to obtain a new video stream. The embodiment of the invention can improve the flexibility of video stream target identification.

Description

Video stream identification method, device, computer equipment and storage medium
Technical Field
The embodiment of the invention relates to the field of video processing, in particular to a video stream identification method, a video stream identification device, computer equipment and a storage medium.
Background
Along with the continuous progress and development of technology, the technological awareness of people is continuously improved, the artificial intelligence technology is gradually changing the production and life modes of people, and particularly in the field of security monitoring, the floor application of the artificial intelligence technology is more and more.
At present, in the existing video identification process, target identification is performed on a video stream, so that a target object in the video stream can be identified.
However, in the above manner, the same identification operation is performed on the video streams of the multiple acquisition devices, so that the identification flexibility is poor.
Disclosure of Invention
The embodiment of the invention provides a video stream identification method, a video stream identification device, computer equipment and a storage medium, which can improve the flexibility of video stream target identification.
In a first aspect, an embodiment of the present invention provides a video stream identification method, which is applied to a video service module, including:
acquiring a video stream transmitted through a video channel;
acquiring at least one identification type corresponding to the video channel;
performing target recognition on the video stream according to each recognition type to obtain a recognition result corresponding to each recognition type;
and adding each identification result into the video stream to obtain a new video stream.
In a second aspect, an embodiment of the present invention further provides a video stream identification apparatus configured in a video service module, including:
the video stream acquisition module is used for acquiring a video stream transmitted through a video channel;
the identification type acquisition module is used for acquiring at least one identification type corresponding to the video channel;
the target recognition module is used for carrying out target recognition on the video stream according to each recognition type to obtain a recognition result corresponding to each recognition type;
And the target adding module is used for adding the identification results into the video stream to obtain a new video stream.
In a third aspect, an embodiment of the present invention further provides a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor executes the program to implement a video stream identification method according to any one of the embodiments of the present invention.
In a fourth aspect, embodiments of the present invention further provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a video stream identification method according to any of the embodiments of the present invention.
According to the embodiment of the invention, the video stream transmitted by the video channel is obtained, the identification result corresponding to each identification type is obtained for at least one identification type corresponding to the video channel, and the identification result is added into the video stream to obtain a new video stream, so that the problem that only the same identification operation can be executed for different video channels in the prior art is solved, the differential target identification of the video channel can be realized, the target identification flexibility of the video channel is improved, the target identification diversity is improved, the target can be identified in a targeted manner, and the target identification accuracy is improved.
Drawings
FIG. 1 is a flow chart of a video stream identification method according to a first embodiment of the invention;
fig. 2 is a flowchart of a video stream identification method in a second embodiment of the present invention;
FIG. 3 is a flow chart of a video stream identification method in a third embodiment of the invention;
FIG. 4 is a scene graph of a video stream recognition method according to a third embodiment of the present invention;
FIG. 5 is a scene graph of a video stream recognition method in accordance with a third embodiment of the invention;
fig. 6 is a schematic structural diagram of a video stream recognition device in a fourth embodiment of the present invention;
fig. 7 is a schematic structural diagram of a computer device in a fifth embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
Example 1
Fig. 1 is a flowchart of a video stream identification method according to a first embodiment of the present invention, where the method may be applicable to a case of performing object identification on a video stream transmitted through a video channel, and the method may be performed by a video stream identification apparatus provided by the embodiment of the present invention, where the apparatus may be implemented in a software and/or hardware manner, and may generally be integrated into a video service module, where the video service module may be integrated into a computer device, for example, a terminal device, or more specifically, a client of a mobile terminal. As shown in fig. 1, the method in this embodiment specifically includes:
S101, acquiring a video stream transmitted through a video channel.
A video channel may refer to a channel transmitting a video stream, and the number of video channels may be at least one. Multiple video channels may transmit multiple video streams in parallel. Video streams transmitted by a plurality of video channels can be acquired, and one video stream transmitted by one video channel can be selected. A video stream may refer to a transmission of a video signal in the form of a continuous data stream, during which video data is transmitted.
S102, at least one identification type corresponding to the video channel is obtained.
Identifying the type may refer to identifying the type of the resulting object. By way of example, identification types may include animals, characters, equipment, and the like. As another example, identifying the type may include: safety behavior and dangerous behavior. Each of the identified types may also be specifically subdivided, for example, dangerous behaviors including smoking and unworn helmets, and the like. In this regard, the setting may be made according to a specific scene. The video channels correspond to the identification types, and different video channels can independently correspond to the identification types. The identification types corresponding to different video channels can be the same or different.
In an optional embodiment, the obtaining at least one identification type corresponding to the video channel includes: acquiring acquisition equipment corresponding to the video channel; acquiring a shooting range of the acquisition equipment; and acquiring at least one identification type corresponding to the shooting range.
The acquisition equipment shoots videos and transmits the videos through a video channel. The shooting range of the acquisition device is used for determining the application scene. Illustratively, the shooting range of the collecting device is within a park, and the identification type can be animals, people and the like. Illustratively, the shooting range of the collecting device is an industrial factory, and the identification type can be characters, devices and the like. The corresponding recognition type may be preset according to the photographing range. According to the information of the video stream transmitted by the video channel, determining an acquisition device for shooting and generating the video stream, taking the acquisition device as the acquisition device corresponding to the video channel, acquiring the position of the acquisition device, determining a shooting range corresponding to the position as the shooting range of the acquisition device, and determining the identification type corresponding to the shooting range according to the corresponding relation between the preset shooting range and the identification type.
In a specific example, the shooting range is factory interior, and the identification type includes smoking and safety helmets; the shooting range is outside the factory, and the identification type comprises a safety helmet.
Through acquisition equipment that acquisition channel corresponds to and acquisition equipment's shooting scope, confirm the discernment type that the video channel corresponds according to the discernment type that the shooting scope corresponds, can confirm the discernment type that needs discernment according to the content of shooing, can accurately carry out target recognition according to acquisition equipment's collection content, improve target recognition's precision and flexibility.
S103, carrying out target recognition on the video stream according to each recognition type to obtain a recognition result corresponding to each recognition type.
Target recognition is used to identify the location of the target object in the image, as well as the type of target object. The recognition result may refer to whether a target object exists, a position of the target object, and a type of the target object. Each identification type corresponds to the presence of an identification result. Exemplary identification types include animals and equipment. The identification result is specifically: cat location, dog location, power unit location, cable location, etc.
And carrying out target recognition on the video stream according to a recognition algorithm corresponding to the recognition type. Different recognition types can correspond to different recognition algorithms or can correspond to the same recognition algorithm. The plurality of identification types can be identified simultaneously or can be identified separately, can be identified in series or can be identified in parallel. Specifically, the setting can be performed as needed.
S104, adding each identification result to the video stream to obtain a new video stream.
And adding the identification result into the video stream to obtain a new video stream, wherein the identification result is displayed in the new video stream. For example, the identification result may be marked in the video stream, added to the attribute data of the video stream as additional data of the video stream, or may be directly drawn in the video stream.
In an alternative embodiment, the acquiring the video stream transmitted through the video channel includes: acquiring a video stream transmitted by a factory monitoring device through a video channel; the number of the plant monitoring devices is at least one, the plant monitoring devices transmit video streams through at least one video channel, and the identification type comprises at least one type of operation risk behavior.
The acquisition equipment corresponding to the video channel is factory monitoring equipment. The plant monitoring device is used for detecting whether working risk behaviors exist on working personnel of the plant. The plant monitoring device may be a camera configured inside and outside the plant area. Typically, the factory is provided with a plurality of cameras, i.e. the number of factory monitoring devices is at least one. A plant monitoring device transmits a corresponding video stream via at least one video channel. One video channel transmits one video stream. The plurality of plant monitoring devices transmit a plurality of video streams over a plurality of video channels.
Job risk behavior may refer to behavior that may create a hazard in a factory environment. Exemplary, smoking behavior, unworn-helmet behavior, fight behavior, and the like. Types of job risk behaviors may include smoking behaviors, unworn helmets, fight behaviors, unworn billboards, and the like. The setting may be performed according to a specific factory environment, and is not particularly limited.
The video stream is acquired and generated through the factory monitoring equipment, the video stream is transmitted through the video channel, the operation risk behavior under the factory environment is obtained, the subsequent alarming can be carried out according to the identified operation risk behavior, the risk is avoided in time, and the factory operation safety is improved.
In an alternative embodiment, after obtaining the new video stream, further comprising: and carrying out format conversion on the new video stream, and pushing out the converted video stream so as to play the converted video stream on a page.
The new video stream may be formatted so that the converted video may be played on the page. The converted video stream is pushed to a server or equipment capable of providing page playing, the equipment renders and plays the converted video stream on the page, so that a user can quickly browse the video stream comprising the identification result, the identified type can be subjected to subsequent processing in time, and the playing instantaneity of the video stream is improved.
For example, the video stream may be formatted according to the RTMP (Real Time Message Protocol, real-time information transfer protocol) protocol and the converted video stream may be deduced. The video stream may be pushed out to an SRS (Simple Realtime Server, simple real-time service) streaming media service for distribution, making pages for playback at the management device. Distribution may be in protocols such as HLS (HTTP Live Streaming, online streaming protocol), FLV (Flash Video), or WebRTC (Web Real Time Communications, network real-time communication).
By converting the format of the new video stream and pushing the new video stream to playable equipment, the video stream after conversion is played on the page, the complexity of playing the video stream can be simplified, and the real-time performance of the video stream identification result can be improved.
According to the embodiment of the invention, the video stream transmitted by the video channel is obtained, the identification result corresponding to each identification type is obtained for at least one identification type corresponding to the video channel, and the identification result is added into the video stream to obtain a new video stream, so that the problem that only the same identification operation can be executed for different video channels in the prior art is solved, the differential target identification of the video channel can be realized, the target identification flexibility of the video channel is improved, the target identification diversity is improved, the target can be identified in a targeted manner, and the target identification accuracy is improved.
Example two
Fig. 2 is a flowchart of a video stream identification method according to a second embodiment of the present invention, where optimization is performed based on the foregoing embodiments, and target identification is performed on the video stream according to each identification type, so as to obtain an identification result corresponding to each identification type, which is specifically: performing target recognition on the video stream one by one according to each recognition type to obtain a recognition result corresponding to each recognition type; adding each identification result to the video stream to obtain a new video stream, and embodying: and adding the identification results to the video stream according to the corresponding adding mode of the identification types to obtain a new video stream. The method of the embodiment specifically comprises the following steps:
S201, a video stream transmitted through a video channel is acquired.
The video channel, the video stream, the identification type, the identification result, and the like of the embodiment of the present invention may refer to the description of the above embodiment.
S202, at least one identification type corresponding to the video channel is obtained.
And S203, carrying out target recognition on the video stream one by one according to each recognition type to obtain a recognition result corresponding to each recognition type.
The video stream is subject to object recognition one by one according to the recognition type, which may be serial. The identification order of the identification types can be set as required, or can be random.
S204, adding each identification result to the video stream according to the corresponding adding mode of each identification type to obtain a new video stream.
The adding mode can be to display the recognition results of different recognition types in different color boxes. Illustratively, the object box of identification type a is red, the object box of identification type B is yellow, the object box of identification type C is green, etc.
In an alternative embodiment, said performing object recognition on said video stream includes: determining a region to be identified in an image of the video stream; cutting the video stream according to the position of the region to be identified in the image; and carrying out target recognition on the cut video stream.
The video stream comprises a plurality of images, an area to be identified is determined for each image, the images are cut into the images of the area to be identified, and target identification is carried out for each area to be identified, so that target identification is carried out for small-size images, the redundant data amount of target identification can be reduced, and the accuracy of target identification is improved. It should be noted that, the recognition result may be added to the clipped video stream to obtain a new video stream, or the position of the recognition result in the original video stream may be determined, and the recognition result may be added to the original video stream to obtain a new video stream.
The area to be identified can be set by a user, and can also be determined according to statistical data of distribution positions of identification types of the images identified by history.
Illustratively, the identification type is a person and the area to be identified is an active area of the person.
The method comprises the steps of determining the region to be identified in the image of the video stream, cutting and target identification are carried out on the video stream according to the region to be identified, interference regions are reduced, target identification accuracy is improved, redundant regions of target identification are reduced, and target identification speed is improved.
In an optional embodiment, the performing object recognition on the video stream according to each recognition type to obtain a recognition result corresponding to each recognition type includes: and calling plug-ins corresponding to the identification types, and carrying out target identification on the video stream to obtain identification results fed back by the plug-ins corresponding to the identification types.
The plug-in is used for realizing the function of identifying the identification type. The plug-ins are in one-to-one correspondence with the identification types. And carrying out target recognition on the video stream by adopting a plug-in unit corresponding to the recognition type, so as to obtain a recognition result corresponding to the recognition type.
A corresponding object recognition algorithm can be configured for each recognition type, and a corresponding plug-in is generated for each object recognition algorithm for calling and executing, so that the corresponding recognition type recognition is performed on the video stream.
By setting the plug-in corresponding to the identification type and calling the plug-in corresponding to the identification type to carry out target identification on the video stream, the target identification of a plurality of identification types of the video stream by a plurality of platforms can be quickened, the equipment migration cost of the target identification is reduced, and meanwhile, the realization operation of the target identification can be simplified.
According to the embodiment of the invention, the video stream is subjected to serial recognition of the recognition types, different recognition types are distinguished, the recognition results corresponding to the recognition types are added to the video stream in the addition mode corresponding to the recognition types, so that the target recognition can be independently carried out, the mutual interference among different target recognition processes is reduced, the target recognition accuracy is improved, the recognition results can be distinguished, and the display accuracy of the recognition results in the video stream is improved.
Example III
Fig. 3 is a flowchart of a video stream identification method in a third embodiment of the present invention. Fig. 4 and 5 are scene diagrams of a video stream recognition method according to a third embodiment of the present invention. As shown in fig. 3-5, a system for implementing a video stream identification method includes: the system comprises a management system, a video service module SDK, a yolo (you only look once) model and algorithm plug-ins thereof, SRS streaming media service and video acquisition equipment.
The management system can perform online management on the video stream channel, can perform online management according to a target recognition algorithm supported by the yolo model, and performs real-time addition according to the target recognition algorithm obtained by training the yolo model; the data identification matching is carried out between the video service module SDK and the video service module SDK through json format data, data communication is carried out through a socket protocol, the management system informs the video service module SDK of the corresponding relation between the identification channel and the target identification algorithm, and the video service module SDK feeds back the setting result to the management system for prompting.
The yolo model and the algorithm plug-in thereof are various target recognition algorithms which can be recognized through early training, the management system can perform online management of the target recognition algorithms according to actual needs, and each target recognition algorithm is made into a plug-in which can be called by a video service module SDK; each object recognition algorithm may be invoked simultaneously by multiple video channels, each of which may support multiple object recognition algorithms.
The video service module SDK can carry out RTSP (real time streaming protocol) streaming according to the video channel added by the management system and feed back the result to the management system; according to the corresponding relation between the set video stream channel and the target recognition algorithm, the plug-in unit of the target recognition algorithm formed by training the yolo model of the pulled video stream can extract the obvious characteristic capable of representing the target from the image and perform image recognition, and the recognition result is instantly added into the pulled video stream to push out the result video stream by the RTMP protocol.
The video acquisition device refers to a physical device, can generate an acquisition image or video stream during normal operation, and can be pulled by a video service module SDK in an RTSP protocol.
The SRS streaming media service is a real-time video server and is used for distributing RTMP streams pushed by the video service in HLS, FLV or webRTC, and playing the RTMP streams on a designated page of a management system.
The video service module SDK performs data interaction with the management system through a socket communication protocol, and can be deployed on the same server or not; the management system comprises a front-end page, a back-end service and a database, wherein the front-end page refers to a vue framework page, the back-end service is an interface service provided by Java language, and the database is a sqlite database.
The management system can manage the video channel and the name of the target recognition algorithm, set any combination of the video channel and the target recognition algorithm, and inform the video service module SDK of the combination in json form.
The method of the embodiment specifically comprises the following steps:
s301, collecting a data set according to target identification requirements, marking the data set, extracting features, and training a yolo model.
Determining targets required to be identified in an application scene, collecting a large number of data sets, wherein the data sets comprise pictures of the targets to be identified, performing target annotation by using a CVAT (Computer Vision Annotation Tool ), training a yolo model by using the data sets after the annotation is completed, checking identification effects according to training results, and if the requirements are met, making a single plug-in for a video service module SDK to call; if the effect does not meet the requirement, continuously collecting data to carry out labeling training.
The embodiment of the invention adopts the yolo model as a target recognition algorithm; the target recognition algorithm needs to search data collection in the early stage, classification labeling and feature extraction of a data set, training and optimization of the data set and the like to generate plug-ins, so that the video service module SDK can be conveniently called.
The yolo model is a target detection model, and target detection is an important basic task in computer vision, and is used for finding a specific object in a picture, and the target detection not only requires identifying the type of the object, but also marks the position of the object. Compared with the traditional neural network with the proposal frame, the yolo has greatly improved speed. The target recognition algorithm is used for detecting the position and the type of an object on a picture and comprises 5 pieces of information: the center position (x, y) of the object, the length and width (h, w) of the object, and finally the kind of the object. yolo prediction is based on the entire picture and outputs all detected target information, including category and location, at once. The yolo model algorithm is trained to identify one or more objects, each of which may be provided for use with a different video stream channel.
The video service module SDK and the yolo model algorithm are deployed on the same server, and the specific deployment form is single-node service or service cluster; the video service module SDK pulls streams from the video acquisition equipment through an RTSP protocol, and the video service module SDK invokes an algorithm plug-in to be processed and then pushes streams through the RTMP protocol.
S302, the management system adds the name of the target recognition algorithm.
The management system comprises a front-end page, java back-end service and a sqlite database; operating users to interact on a front-end page, providing a data interface by Java back-end service and performing service processing, and storing video channel data and target recognition algorithm information in a database; and the trained algorithm management system and the video service module SDK agree on specific identifications, and the video service module SDK can call a corresponding algorithm plug-in and add a target identification algorithm name into the management system.
S303, the management system adds video hardware acquisition channel data, namely video acquisition hardware data information, and the management system issues the channel data to a video service module SDK (Software Development Kit ) in a socket message form.
The channel data may include a video channel through which the video acquisition device transmits video, a correspondence between the video channel and the video acquisition device, and the like. Video service module SDK: refers to a code package for a parse model. The video service module SDK in the embodiment of the invention refers to a code program, which is used for assisting in developing software, and comprises the following functions: and pulling the video stream, completing the call of the target recognition algorithm, and pushing out the video stream recognized by the target recognition algorithm. The video service module SDK is deployed and operated on the hardware gateway, so that the workload of overall program deployment can be effectively reduced.
Specifically, a plurality of hardware information of video acquisition devices can be added in a management system and stored in a database; after the yolo model algorithm training is completed, adding identification information of one or more algorithm names into a management system and storing the identification information into a database; after the management system combines the video channel and the target recognition algorithm and sends the combination information to the video service module SDK through the socket, the video service module SDK returns a setting success message, and the back-end service stores the combination information to the database.
The video service module SDK and the video acquisition equipment are accessed to the same router or network segment, and the management system manages the accessed equipment information and transmits the accessed equipment information to the video service module SDK in json format data through socket communication; the issued data includes a device brand, a device type, a device IP, a video streaming port, an operation type, a channel number list (channel unique ID, channel number, etc.), a video encoding type, a device login user name password, etc.
S304, the video service module SDK pulls the direct broadcast video stream from the video acquisition equipment by using an RTSP (Real Time Streaming Protocol, real-time streaming protocol) protocol, and the video service module SDK informs a management system of the pulling result.
The video service module SDK performs streaming address splicing on the issued equipment information, streaming is performed through an RTSP protocol address, a streaming result is fed back to the management system through socket communication in json format data, and the management system prompts the setting result to an operation user.
RTSP is a network application protocol dedicated to the use of video and communication systems to control streaming media servers. The protocol is used to create and control a media session between terminals. The client of the media server issues VCR commands, such as play, record and pause, to facilitate real-time control of the media stream from the server to the client or from the client to the server. The video acquisition equipment supports pushing out the video stream through an RTSP protocol, and the video stream is pulled out by a third party application through splicing a pulling stream address with a specified format so as to be convenient for further processing.
Video capture devices include one or more of a webcam (Internet Protocol Camera, IPC), a network digital hard disk recorder (Network Video Recorder, NVR), a video server (Digital Video Server, DVS), and a digital video recorder (Digital Video Recorder, DVR).
And S305, the operation user adapts a certain video channel with a certain target identification algorithm in the management system, and the management system informs the video service module SDK of an adaptation information socket message form.
And the operation user combines the video stream channel and the target recognition algorithm in the management system, sets a corresponding relation, and the management system transmits the combined algorithm name and channel ID combination to the video service module SDK in json format data.
S306, the video service module SDK calls the appointed target identification plug-in provided by the yolo model according to the algorithm name according to the information notified by the management system.
And according to the algorithm name, the video service module SDK calls the yolo model to specify an algorithm plug-in, and confirms whether the plug-in exists or can work normally.
S307, the SRS streaming media service configuration file configures the running port number, receives the RTMP protocol video stream, and pushes out the video format (supporting HLS, FLV, or webRTC).
According to the video format requirement to be pulled by the browser and the video protocol to be pushed out by the video service module SDK, configuring the running port number in the SRS streaming media service configuration file, receiving the RTMP protocol video stream, pushing out the video format (supporting HLS, FLV or webRTC) and the like.
The SRS streaming media service program receives RTMP stream pushed by the video service module SDK, can be distributed according to any protocol of HLS, FLV or webRTC, and can watch the video stream after algorithm in real time on a page or a player.
The SRS streaming media service program and the video service module SDK can be deployed on the same server or not, i.e. one SRS streaming media service can provide a push stream environment for a plurality of video service modules SDKs.
And S308, the video service identifies the pulled live video stream through a destination identification algorithm, adds an identification result into the live video stream, and pushes the identification result to the SRS streaming media service through an RTMP protocol.
The video service module SDK transmits the video stream of the appointed video channel through a corresponding target recognition algorithm according to the video channel ID, and feeds the result back to the management system; the video service module SDK pushes the video stream passing through the algorithm plug-in to the SRS streaming media server in RTMP protocol; the SRS streaming media service and the video service modules SDK and yolo model can be in the same server or in other servers.
S309, the streaming media service SRS receives the video stream pushed by the video service module SDK, and pushes the video stream out in HLS, FLV, or webRTC.
And editing the SRS streaming media service configuration file after downloading and installing by the wget, configuring an operation port number, a maximum connection number, a pushed live video format HLS, an FLV or a webRTC and the like, and starting the SRS.
S310, the management system opens the live broadcast page to see the live broadcast video stream output by the target recognition algorithm, and the recognition result is identified in the picture.
The pushed video stream can be played on a live broadcast page of the management system, and can also be played and checked on a player supporting a live broadcast video format.
The embodiment of the invention improves the mode that one video channel can only use a fixed target recognition algorithm. The defect that only a video stream can be restarted or re-accessed by applying after an algorithm is added is overcome by deploying and operating a video service module SDK, and the data integration efficiency of a video channel and a target recognition algorithm is improved; based on RTSP protocol, the video stream of the video acquisition equipment is pulled through the SDK of the video service module, so that the flexibility is greatly improved; the video acquisition equipment in the market basically supports the RTSP protocol, and the application range is wide; the SRS streaming media service can be multiplexed, RTMP video streams pushed by a plurality of video services can be received and distributed by one SRS, server resources are saved, and operation cost is reduced. The characteristic that RTSP is fully supported in the adaptive acquisition equipment is fully utilized, the rapid deployment and multiplexing of the SDK of the video service module are realized, the difficulty that a video channel needs to be restarted to apply a new algorithm is solved in an autonomous matching mode, the problem that the two cannot be rapidly adapted is solved, the video channel can be freely matched with a target identification algorithm as required, and the working efficiency of the video channel in the aspect of algorithm application is improved; the method has the advantages that through the controllable mode of the video channel and the algorithm, the free matching of the video stream channel and the target recognition algorithm is realized, the method that the video stream in the traditional mode can only recognize and fix a certain or a plurality of targets is improved, the defect that the video stream and the target recognition algorithm cannot be flexibly configured with each other is overcome, the real-time verification of the target recognition result of a certain channel and the designated algorithm is realized, and stable target recognition service and live video stream are provided.
Example IV
Fig. 6 is a schematic diagram of a video stream recognition device according to a fourth embodiment of the present invention. The fourth embodiment of the present invention is a corresponding apparatus for implementing the video stream identification method provided in the foregoing embodiment of the present invention, where the apparatus may be implemented in software and/or hardware, and may be generally integrated into a computer device, for example, a terminal device, etc., and specifically, a mobile phone, a tablet computer, or a vehicle-mounted device, etc.
Accordingly, the apparatus of this embodiment may include:
a video stream acquisition module 610, configured to acquire a video stream transmitted through a video channel;
an identification type obtaining module 620, configured to obtain at least one identification type corresponding to the video channel;
the target recognition module 630 is configured to perform target recognition on the video stream according to each recognition type, so as to obtain a recognition result corresponding to each recognition type;
and the target adding module 640 is configured to add each identification result to the video stream to obtain a new video stream.
According to the embodiment of the invention, when the application program generates the access requirement of the network server, the video stream identification request is intercepted, the Internet protocol address matched with the video stream identification request is acquired locally, the analysis result is generated and fed back to the application program, the preconfigured Internet protocol address can be acquired, the situation that the Internet protocol address cannot be acquired accurately due to the limitation of the conventional video stream identification server is avoided, the problems that the Internet protocol address of a source station cannot be acquired in a content distribution network and the Internet protocol address of a server which is not registered in the video stream identification server cannot be acquired in the prior art are solved, the range of the acquirable Internet protocol address can be increased, the accurate acquisition of the Internet protocol address of the server by a client is ensured, and the monitoring range of the server is increased, so that the performance of the server is improved.
Further, the obtaining at least one identification type corresponding to the video channel includes: acquiring acquisition equipment corresponding to the video channel; acquiring a shooting range of the acquisition equipment; and acquiring at least one identification type corresponding to the shooting range.
Further, the performing object recognition on the video stream according to each recognition type to obtain a recognition result corresponding to each recognition type includes: performing target recognition on the video stream one by one according to each recognition type to obtain a recognition result corresponding to each recognition type; the step of adding each identification result to the video stream to obtain a new video stream comprises the following steps: and adding the identification results to the video stream according to the corresponding adding mode of the identification types to obtain a new video stream.
Further, the performing object recognition on the video stream includes: determining a region to be identified in an image of the video stream; cutting the video stream according to the position of the region to be identified in the image; and carrying out target recognition on the cut video stream.
Further, the performing object recognition on the video stream according to each recognition type to obtain a recognition result corresponding to each recognition type includes: and calling plug-ins corresponding to the identification types, and carrying out target identification on the video stream to obtain identification results fed back by the plug-ins corresponding to the identification types.
Further, after obtaining the new video stream, the method further includes: and carrying out format conversion on the new video stream, and pushing out the converted video stream so as to play the converted video stream on a page.
Further, the acquiring the video stream transmitted through the video channel includes: acquiring a video stream transmitted by a factory monitoring device through a video channel; the number of the plant monitoring devices is at least one, the plant monitoring devices transmit video streams through at least one video channel, and the identification type comprises at least one type of operation risk behavior.
The video stream identification device can execute the video stream identification method provided by the embodiment of the invention, and has the corresponding functional modules and beneficial effects of the executed video stream identification method.
Example five
Fig. 7 is a schematic structural diagram of a computer device according to a fifth embodiment of the present invention. Fig. 7 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in fig. 7 is only an example and should not be construed as limiting the functionality and scope of use of embodiments of the invention.
As shown in fig. 7, the computer device 12 is in the form of a general purpose computing device. Components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16. Computer device 12 may be a device that is attached to a bus.
Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include industry standard architecture (Industry Standard Architecture, ISA) bus, micro channel architecture (Micro Channel Architecture, MCA) bus, enhanced ISA bus, video electronics standards association (Video Electronics Standards Association, VESA) local bus, and peripheral component interconnect (Peripheral Component Interconnect, PCI) bus.
Computer device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. The computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 7, commonly referred to as a "hard disk drive"). Although not shown in fig. 7, a disk drive for reading from and writing to a removable nonvolatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from and writing to a removable nonvolatile optical disk (e.g., a compact disk Read Only Memory (CD-ROM), digital versatile disk (Digital Video Disc-Read Only Memory, DVD-ROM), or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. The system memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.
The computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the computer device 12, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 12 to communicate with one or more other computing devices. Such communication may be via an Input/Output (I/O) interface 22. The computer device 12 may also communicate with one or more networks (e.g., local area network (Local Area Network, LAN), wide area network (Wide Area Network, WAN)) via the network adapter 20. As shown, the network adapter 20 communicates with other modules of the computer device 12 via the bus 18. It should be understood that, although not shown in FIG. 7, other hardware and/or software modules may be used in connection with the computer device 12, including, but not limited to, microcode, device drivers, redundant processing units, external disk drive arrays, (Redundant Arrays of Inexpensive Disks, RAID) systems, tape drives, data backup storage systems, and the like.
The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing a video stream recognition method provided by any of the embodiments of the present invention.
Example six
A sixth embodiment of the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a video stream identification method as provided in all the inventive embodiments of the present application:
that is, the program, when executed by the processor, implements: when an application program generates a network service access requirement, intercepting a video stream identification request sent by the application program; extracting a domain name to be resolved from the video stream identification request; inquiring the domain name to be resolved in a pre-stored corresponding relation between the domain name and an Internet protocol address; and generating an analysis result matched with the video stream identification request according to the queried internet protocol address matched with the domain name to be analyzed, and feeding back to the application program.
The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a RAM, a Read-Only Memory (ROM), an erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), a flash Memory, an optical fiber, a portable CD-ROM, an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, radio frequency (RadioFrequency, RF), etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a LAN or WAN, or may be connected to an external computer (for example, through the Internet using an Internet service provider).
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (10)

1. A video stream identification method, applied to a video service module, comprising:
acquiring a video stream transmitted through a video channel;
acquiring at least one identification type corresponding to the video channel;
performing target recognition on the video stream according to each recognition type to obtain a recognition result corresponding to each recognition type;
and adding each identification result into the video stream to obtain a new video stream.
2. The method according to claim 1, wherein the obtaining at least one identification type corresponding to the video channel comprises:
Acquiring acquisition equipment corresponding to the video channel;
acquiring a shooting range of the acquisition equipment;
and acquiring at least one identification type corresponding to the shooting range.
3. The method according to claim 1, wherein the performing object recognition on the video stream according to each recognition type to obtain a recognition result corresponding to each recognition type includes:
performing target recognition on the video stream one by one according to each recognition type to obtain a recognition result corresponding to each recognition type;
the step of adding each identification result to the video stream to obtain a new video stream comprises the following steps:
and adding the identification results to the video stream according to the corresponding adding mode of the identification types to obtain a new video stream.
4. A method according to claim 3, wherein said object recognition of said video stream comprises:
determining a region to be identified in an image of the video stream;
cutting the video stream according to the position of the region to be identified in the image;
and carrying out target recognition on the cut video stream.
5. The method according to claim 1, wherein the performing object recognition on the video stream according to each recognition type to obtain a recognition result corresponding to each recognition type includes:
And calling plug-ins corresponding to the identification types, and carrying out target identification on the video stream to obtain identification results fed back by the plug-ins corresponding to the identification types.
6. The method of claim 1, further comprising, after obtaining the new video stream:
and carrying out format conversion on the new video stream, and pushing out the converted video stream so as to play the converted video stream on a page.
7. The method of claim 1, wherein the acquiring the video stream transmitted over the video channel comprises:
acquiring a video stream transmitted by a factory monitoring device through a video channel; the number of the plant monitoring devices is at least one, the plant monitoring devices transmit video streams through at least one video channel, and the identification type comprises at least one type of operation risk behavior.
8. A video stream recognition device, configured in a video service module, comprising:
the video stream acquisition module is used for acquiring a video stream transmitted through a video channel;
the identification type acquisition module is used for acquiring at least one identification type corresponding to the video channel;
The target recognition module is used for carrying out target recognition on the video stream according to each recognition type to obtain a recognition result corresponding to each recognition type;
and the target adding module is used for adding the identification results into the video stream to obtain a new video stream.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the video stream identification method of any of claims 1-7 when the program is executed by the processor.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the video stream identification method according to any of claims 1-7.
CN202311636457.7A 2023-12-01 2023-12-01 Video stream identification method, device, computer equipment and storage medium Pending CN117615172A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311636457.7A CN117615172A (en) 2023-12-01 2023-12-01 Video stream identification method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311636457.7A CN117615172A (en) 2023-12-01 2023-12-01 Video stream identification method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117615172A true CN117615172A (en) 2024-02-27

Family

ID=89959366

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311636457.7A Pending CN117615172A (en) 2023-12-01 2023-12-01 Video stream identification method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117615172A (en)

Similar Documents

Publication Publication Date Title
JP6713034B2 (en) Smart TV audio interactive feedback method, system and computer program
US10313726B2 (en) Distributing media content via media channels based on associated content being provided over other media channels
EP2688296B1 (en) Video monitoring system and method
WO2019242222A1 (en) Method and device for use in generating information
US11227620B2 (en) Information processing apparatus and information processing method
KR20120119758A (en) Apparatus for providing iptv broadcasting contents, user terminal and method for providing iptv broadcasting contents information
CN109448709A (en) A kind of terminal throws the control method and terminal of screen
CN102917247B (en) Automatically the method identifying television channel and TV programme
CN104410923A (en) Animation presentation method and device based on video chat room
WO2018145572A1 (en) Method and device for implementing vr live streaming, ott service system, and storage medium
CN112822435A (en) Security method, device and system allowing user to easily access
KR101915792B1 (en) System and Method for Inserting an Advertisement Using Face Recognition
CN111541906B (en) Data transmission method, data transmission device, computer equipment and storage medium
CN108076323A (en) A kind of visual monitor method
CN102427520A (en) Multi-channel network video monitoring method and system on the basis of two-layered ID (Identification) structure
CN117615172A (en) Video stream identification method, device, computer equipment and storage medium
CN111274449A (en) Video playing method and device, electronic equipment and storage medium
CN116723353A (en) Video monitoring area configuration method, system, device and readable storage medium
CN106165436A (en) Use optical character recognition that Set Top Box is carried out double-direction control
US10650843B2 (en) System and method for processing sound beams associated with visual elements
TWI482470B (en) Digital signage playback system, real-time monitoring system, and real-time monitoring method thereof
CN114745558A (en) Live broadcast monitoring method, device, system, equipment and medium
CN110691256B (en) Video associated information processing method and device, server and storage medium
CN114827753B (en) Video index information generation method and device and computer equipment
US20220406339A1 (en) Video information generation method, apparatus, and system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination