CN117456430A

CN117456430A - Video identification method, electronic equipment and storage medium

Info

Publication number: CN117456430A
Application number: CN202311798755.6A
Authority: CN
Inventors: 杨灯鸳; 李炫质; 黄耀辉; 杨家艺
Original assignee: Guangzhou Huihao Computer Technology Development Co ltd
Current assignee: Guangzhou Huihao Computer Technology Development Co ltd
Priority date: 2023-12-26
Filing date: 2023-12-26
Publication date: 2024-01-26

Abstract

The application provides a video identification method, electronic equipment and a storage medium, and relates to the field of computers. The method comprises the following steps: receiving collected video data of the expressway ramp access opening from a video collecting device arranged at the expressway ramp access opening; performing hardware decoding on the video data to obtain a plurality of image frames; identifying a plurality of image frames through a pre-trained image identification model to generate an identification result; when at least one image frame in the plurality of image frames comprises a target object, calculating by using a target tracking algorithm to obtain a track of the target object; and correlating the track of the target object, the identification of the target object, the time information of the target object and the geographic information of the target object to generate first correlation information, and transmitting the first correlation information to an alarm device arranged on the expressway side. By implementing the technical scheme provided by the application, the potential safety hazard of the expressway can be reduced, and the occurrence rate of traffic accidents of the expressway is reduced.

Description

Video identification method, electronic equipment and storage medium

Technical Field

The present application relates to the field of computers, and in particular, to a video recognition method, an electronic device, and a storage medium.

Background

In the prior art, a video monitoring technology is generally adopted, and a highway entrance is monitored in a manual monitoring and identification mode so as to prevent dangerous factors on the highway, such as pedestrians entering the highway and vehicles going backward at the highway entrance. However, the manual monitoring increases the labor cost; in addition, because of limited effort, a determination blind area is easy to exist by adopting a manual identification mode.

Therefore, the expressway in the prior art adopts a manual identification and judgment mode to prevent the occurrence of dangerous factors on the expressway, and larger potential safety hazards of road traffic still exist.

Disclosure of Invention

The video identification method, the electronic equipment and the storage medium can reduce the potential safety hazard of the expressway, thereby reducing the occurrence rate of traffic accidents of the expressway.

In a first aspect, a video identification method provided in an embodiment of the present application includes: collecting video data of the expressway ramp access opening from a video collecting device arranged at the expressway ramp access opening; performing hardware decoding on the video data to obtain a plurality of image frames; identifying a plurality of image frames through a pre-trained image identification model, and generating an identification result, wherein the identification result indicates whether a target object is included in the plurality of image frames, and the target object comprises at least one of the following: pedestrians, non-motor vehicles, animals, special work clothes and anomalies; when the plurality of image frames comprise the target object, calculating to obtain the track of the target object by utilizing a target tracking algorithm; and correlating the track of the target object, the identification of the target object, the time information of the target object and the geographic information of the target object to generate first correlation information, and transmitting the first correlation information to an alarm device arranged on the expressway side.

According to the video identification method for the expressway, provided by the embodiment of the application, the video data in a larger range can be obtained through the plurality of video acquisition devices arranged at the expressway ramp entrances and exits, so that the monitoring range of the expressway ramp is improved; according to the embodiment of the application, the video data are subjected to hardware decoding to obtain the image frames, so that the video processing efficiency can be improved, and the CPU occupancy rate can be reduced; according to the embodiment of the application, pedestrians, non-motor vehicles, animals and abnormal objects in a plurality of image frames are identified through the pre-trained image identification model, and compared with the prior art, the method and the device can monitor a plurality of target objects, so that the potential safety hazard identification efficiency is improved; according to the method and the device, the track of the target object is obtained by utilizing the target tracking algorithm, the first associated information is generated and sent to the alarm device, and therefore motor vehicles, non-motor vehicles, pedestrians and the like on the expressway can be warned, and potential safety hazards on the expressway are reduced. In addition, the video identification method provided by the embodiment of the application identifies the acquired image of the expressway by adopting a plurality of video acquisition devices and image identification models so as to detect whether the dangerous factors exist on the expressway or not, and timely sends the dangerous factors to the alarm device at the roadside for broadcasting when the dangerous factors appear, so that the situation that workers monitor and identify monitoring images all the time can be avoided, and the labor cost can be reduced. Furthermore, as the video acquisition devices can detect a plurality of areas of the expressway ramp entrances and exits, the blind area range of workers can be reduced, and therefore potential safety hazards of road traffic can be reduced. In summary, the video identification method provided by the embodiment of the application can reduce the potential safety hazard of the expressway and reduce the occurrence rate of traffic accidents on the expressway.

In one possible implementation, a first sample data set is obtained, the first sample data set including a plurality of image samples, each image sample of the plurality of image samples including a target object and first annotation information indicating a category to which the target object belongs; and inputting the sample data set into an initial neural network to obtain an image recognition model.

By adopting the technical scheme, the image recognition model capable of recognizing the target object can be obtained.

In one possible implementation manner, an image frame of a first target object identified as a target object in a plurality of image frames is taken as a target image frame according to time sequence; and processing the continuous preset number of image frames including the target image frames by utilizing a target tracking algorithm to generate a track of the first target object.

By adopting the technical scheme, the track of the target object can be obtained by utilizing the target tracking algorithm, so that vehicles which normally run on the expressway can be warned, and the occurrence of expressway traffic accidents is reduced.

In one possible implementation manner, second association information is generated based on the image frame corresponding to the target object and the first association information; transmitting the second association information to the display device; the first association information is voice information, and the second association information is image information.

Through adopting above-mentioned technical scheme, the staff can obtain the target object that the highway ramp appears through display device, responds to the highway and persuade away the target object, and then reduces the potential safety hazard of highway.

In one possible implementation, a second sample data set is generated based on the second association information and the first sample data set; the second sample data set comprises second labeling information, and the second labeling information is used for indicating the category of the target object which is not recognized by the image recognition model in the target image frame.

In one possible implementation, the second sample data set is input to the image recognition model, and the pre-trained image recognition model is updated.

By adopting the technical scheme, the type of the target object identified by the image identification device can be improved, and the potential safety hazard of the expressway is further reduced.

In one possible implementation, the first association information is sent to a terminal device equipped by the staff.

By adopting the technical scheme, the system can warn the staff to respond to the expressway, reduce the potential safety hazard of the expressway and reduce the accident rate of the expressway.

In a second aspect, an embodiment of the present application provides a video recognition device, including a video acquisition unit, configured to acquire video information of a ramp entrance of a highway, and send the video information to a video processing unit and a display unit respectively; the video processing unit is used for carrying out hardware decoding on the video information to obtain a plurality of image frames, identifying the image frames through a pre-trained image identification model to generate an identification result, wherein the identification result indicates whether a target object is included in the image frames, and the target object comprises at least one of the following: pedestrians, non-motor vehicles, animals, and anomalies; the generation unit is used for correlating the track of the target object, the identification of the target object, the time information of the target object and the geographic information of the target object to generate first correlation information; the alarm unit is used for playing geographic information, time information and track and identification of the found target object to the expressway based on the first association information; and a display unit for displaying the video data.

In a third aspect, embodiments of the present application provide an electronic device including a processor, a memory, and an interface; a memory for storing instructions; an interface for communicating with other devices; a processor for executing instructions stored in a memory to cause an electronic device to perform the method as described in the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium comprising computer instructions which, when run on a computer, cause the computer to perform the method according to the first aspect.

It can be understood that the technical solutions of the second to fourth aspects of the present application are consistent with the technical solutions of the first aspect of the present application, and the beneficial effects obtained by each aspect and the corresponding possible embodiments are similar, and are not repeated.

In summary, one or more technical solutions provided in the embodiments of the present application at least have the following technical effects or advantages:

the video data with a larger range is acquired through the plurality of video acquisition devices arranged at the expressway ramp access opening, so that the monitoring range of the expressway ramp can be improved.

The video data obtained through the hardware decoding processing reduces the occupancy rate of the CPU and improves the video processing efficiency.

The pedestrian, the non-motor vehicle, the animal and the like which appear on the expressway are identified to have potential safety hazards to obtain a target object through the pre-trained image identification model, and the motor vehicle which normally runs on the expressway is warned through the warning device, so that the potential safety hazards of the expressway can be reduced, and the incidence rate of expressway accidents is reduced.

Drawings

Fig. 1 is a schematic view of an application scenario of a video recognition method according to an embodiment of the present application;

FIG. 2 is a flowchart of a video recognition method provided in an embodiment of the present application;

FIG. 3 is a flowchart of training an image recognition model provided in an embodiment of the present application;

FIG. 4 is a target object annotation diagram provided by an embodiment of the present application;

fig. 5 is a schematic structural diagram of a video recognition device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present application, but not all embodiments.

In the description of embodiments of the present application, words such as "for example" or "for example" are used to indicate examples, illustrations or descriptions. Any embodiment or design described herein as "such as" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "or" for example "is intended to present related concepts in a concrete fashion.

In the description of the embodiments of the present application, the term "plurality" means two or more. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating an indicated technical feature. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

In the prior art, a video monitoring technology is generally adopted to monitor whether dangerous factors exist in a highway entrance or not, but most of the methods only monitor a specific target object, such as a single target of a pedestrian, a vehicle and the like. However, in practical application, multiple events, such as pedestrians, non-motor vehicles, animals, dangerous chemical vehicles and the like, need to be monitored simultaneously, and an alarm is sent out timely when the events occur, so that a great potential safety hazard of road traffic exists.

Referring to fig. 1, fig. 1 is a schematic view of an application scenario of a video recognition method applied to a highway according to an embodiment of the present application. In the application scenario shown in fig. 1, the system comprises two video acquisition devices, namely a video acquisition device 101 arranged on one side of the ramp inside and a video acquisition device 103 arranged on one side of the ramp entrance, and further comprises an alarm device 102 arranged on one side of the ramp, a video processing device 104 arranged on one side of the ramp entrance, and a display device 105 arranged in a highway workbench 106, wherein the video acquisition device 103 is arranged above the video processing device 104. In addition, the application scenario may also be provided with a terminal device, not shown in fig. 1. The video acquisition device 101 and the alarm device 102 are connected with the video processing device 104 in a wired manner through an underground buried wire manner, the video acquisition device 103 is connected with the video processing device 104 in a USB manner, and the video processing device 104 is connected with the display device 105 in a wireless manner.

Based on the application scenario shown in fig. 1, in the present embodiment, the video capturing device 101 and the video capturing device 103 are configured to capture video data of a ramp and a ramp entrance, and send the captured video data to the video processing device 104, where the video processing device 104 processes and identifies the video data, and generates an identification result and sends the identification result to the alarm device 102 and the display device 105. As shown in FIG. 1, a plurality of video acquisition devices can be arranged to cover the ramp entrances and the ramp entrances, so that the monitoring range is enlarged, and the accuracy of identifying target objects is improved.

Based on the application scenario shown in fig. 1, please continue to refer to fig. 2, a more detailed description is given of a video recognition method provided in an embodiment of the present application. Fig. 2 is a flowchart of a video recognition method according to an embodiment of the present application. The flow 200 of the video recognition method includes:

step 201, acquiring video data of a highway ramp entrance from a video acquisition device arranged at the highway ramp entrance.

In this embodiment, as shown in fig. 2, the video capturing device 101 and the video capturing device 103 capture video data in the ramp and video data at the ramp entrance respectively, and transmit the captured video data to the video processing device 104 in a wired manner. Furthermore, the video capturing device may be provided with only one or a plurality of video capturing devices in the ramp and the ramp entrance, which is not particularly limited in the embodiment of the present application.

And 202, performing hardware decoding on the video data to obtain a plurality of image frames.

In this embodiment, please continue to refer to fig. 1, the video data acquired by the video acquisition device 101 and the video acquisition device 103 is transmitted to the video processing device 104, and the video processing device 104 decodes the video data into a plurality of image frames arranged in time sequence by performing hardware decoding on the video data by the GPU. Compared with the prior art, the method reduces the power consumption of video processing, reduces the occupancy rate of a CPU and improves the efficiency of video decoding. The hardware decoding used by the video processing apparatus 104 may be Intel, AMD-ATI, nvdia, or the like. Further, the image processing apparatus 104 may also decode the plurality of image frames obtained by the hardware decoding, and restore the plurality of image frames to video data by using hardware encoding.

Step 203, identifying the plurality of image frames through a pre-trained image identification model, and generating an identification result, wherein the identification result indicates whether a target object is included in the plurality of image frames, and the target object comprises at least one of the following: pedestrians, non-motor vehicles, animals, special work clothes, and anomalies.

In the embodiment of the present application, the plurality of image frames obtained in step 202 are input to the image processing apparatus 104, the plurality of image frames are identified by an image identification model trained in advance in the image processing apparatus 104, and an identification result is generated.

In the embodiment of the present application, the image recognition module of the image processing apparatus 104 recognizes a plurality of obtained image frames, and further describes the embodiment of the present application, taking a black motorcycle as a target object.

In the embodiment of the present application, the image recognition model in the image processing apparatus 104 needs to be obtained through training an initial neural network, and specifically referring to fig. 3, fig. 3 is a flowchart of training the image recognition model, where the training process 300 of the image recognition model includes:

step 301, a first sample data set is acquired.

In the embodiment of the application, a first sample data set is acquired through the internet or the like, wherein the first sample data set comprises a plurality of image samples, and each image sample in the plurality of image samples comprises a target object. The target object category in each picture sample may include a plurality of target object categories, which is not specifically limited in the embodiments of the present application.

And 302, labeling a plurality of image samples in the first sample data set to generate first labeling information.

In the embodiment of the application, the category of the target object corresponding to each image sample in the plurality of image samples of the first sample data set is marked, and the detection frame of the position of the knowledge target object in each image sample is marked. Referring specifically to fig. 4, fig. 4 is a target object label chart provided in an embodiment of the present application.

And 303, inputting the first sample data set and the first labeling information into an initial neural network for training to obtain an image recognition model.

In this embodiment of the present application, inputting the first sample data set and the first labeling information to the initial neural network specifically includes inputting each picture sample in the first sample data set and the first labeling information of each picture sample to the initial neural network, generating an output result, and indicating a target object category in the picture sample and a detection frame indicating a target object position by the output result. Constructing a loss function based on the deviation between the output result and the first annotation information, the loss function may include, for example, but is not limited to: mean Absolute Error (MAE) loss function or Mean Square Error (MSE) loss function. The loss function comprises weight coefficients of each layer of network in the initial neural network. And iteratively adjusting the weight coefficient value of each layer of network in the initial neural network by using a back propagation algorithm and a gradient descent algorithm based on the constructed loss function until the error between the output result and the first labeling information is smaller than or equal to a preset threshold value or the iteration number is smaller than the preset threshold value, and storing the weight coefficient value of each layer of network in the initial neural network. At this point, the initial neural network training is complete. Gradient descent algorithms may include, but are not limited to: SGD, adam, etc. When back propagation is performed based on the preset loss function, a gradient of the preset loss function with respect to the weight coefficients of each layer of network in the initial neural network can be calculated by using a chain law.

In step 204, when at least one image frame of the plurality of image frames includes the target object, a track of the target object is calculated by using a target tracking algorithm.

In the embodiment of the present application, when it is recognized that the plurality of image frames all include the target object, the moving speed of the target object is determined according to the time when the target object is found and the moving distance of the target object, and the position of the target object can be estimated according to the moving direction and the moving speed of the target object. For example, when a black motorcycle is identified as being included in each of the plurality of image frames, the position of the black motorcycle can be estimated by combining the moving speed, moving direction and moving time of the black motorcycle, based on the time when the black motorcycle is identified and the moving distance of the black motorcycle in the time period, and the moving speed of the black motorcycle is obtained.

In the embodiment of the present application, the image processing apparatus 104 decodes video data into a plurality of image frames in time sequence, performs recognition on the plurality of image frames by using an image recognition model, and uses the image frame of the first target object in the first recognized target objects as the target image frame. For example, with continued reference to the scenario of fig. 1, when a black motorcycle is identified, the first image frame of the black motorcycle is identified as the target image frame. Processing a continuous preset number of target image frames including the target image frames of the black motorcycle by utilizing a target tracking algorithm, thereby generating a track of the black motorcycle, and further comprising the following processing steps: and obtaining the position of the black motorcycle in the continuous preset image frames including the target image frame, and further obtaining the track of the black motorcycle according to the position change of the black motorcycle in different image frames. The number of the target image frames in the continuous preset number may be N, where N is greater than or equal to 2, and the target object may further include, but is not limited to, pedestrians, animals, dangerous chemical identification vehicles, and anomalies. In embodiments of the present application, the target object may also include special work clothes, which may include, but are not limited to, reflective clothing, military police clothing, high speed industry custom clothing.

In another embodiment of the present application, when the identified target object is a pedestrian, it is further required to identify the clothing of the pedestrian, identify whether the clothing of the pedestrian is one of the special work clothes, and when the clothing of the pedestrian is one of the special work clothes, proceed to step 204, and when the clothing of the pedestrian is not the special work clothes, perform step 205.

Step 205, associating the track of the target object, the identification of the target object, the time information of the target object and the geographic information of the target object, generating first association information, and transmitting the first association information to an alarm device arranged on the expressway side.

In this embodiment of the present application, with continued reference to the scenario of fig. 1, when it is identified that the target object is a black motorcycle, step 204 is performed to obtain a track of the black motorcycle, correlate the track of the black motorcycle, time information and geographic information, generate first correlation information, and send the first correlation information to an alarm device disposed on a highway side. For example, with continued reference to the application scenario of fig. 1, the step 204 is performed to obtain the track of the black motorcycle, where the track of the black motorcycle is a track entering in the direction of the ramp, the identification information of the non-motor vehicle is a black motorcycle, the time information and the geographic information of finding the black motorcycle are combined to generate the first association information, and the first association information is sent to the alarm device 102, and the alarm device 102 plays a voice message of the black motorcycle entering from the ramp entrance to the ramp to the normally running vehicle in the ramp, so as to warn the normally running vehicle to pay attention to avoidance, thereby reducing the occurrence rate of traffic accidents on the expressway.

In this embodiment of the present application, the system further includes a terminal device, where the terminal device is configured for a staff on an expressway, and the terminal device may be a mobile phone, a calling device, or a player, which is not specifically limited in this application. The terminal device is connected with the image processing device 104 in a wireless network mode, and sends the generated first association information to terminal devices equipped by expressway staff, the staff can learn that potential safety hazards intrude into the expressway ramp at specific positions and specific times through the first association information, and then can quickly respond to the expressway ramp to persuade or guide the intruded potential safety hazards to the safe positions, so that the occurrence rate of expressway traffic accidents is reduced.

In the implementation of the present application, with continued reference to the application scenario of fig. 1, the image processing apparatus 104 correlates the first correlation information with the image frame of the discovery target object, generates the second correlation information, and sends the received video data and the second correlation information to the display apparatus 105 through the wireless network, where the display apparatus 105 and the video processing apparatus 104 are connected through the wireless network, and the display apparatus 105 is disposed on the highway workbench 106, and is used for displaying the video data and the second correlation information. For example, the first related information generated from the geographical information and the time information of the found black motorcycle is associated with the image frame in which the black motorcycle is recognized, and the second related information is generated and transmitted to the display device 105.

In one possible implementation, a second sample data set is generated based on the second association information and the first sample data set; the second sample data set comprises second labeling information, and the second labeling information is used for indicating the category of the target object which is not identified by the image identification model in the target image frame.

In the embodiment of the application, the image frame to which the target object not recognized by the image recognition model belongs in the second associated information is marked, and the second labeling information is generated, wherein the labeling information comprises a category to which the target object not recognized by the image recognition model belongs in the target image frame. Further, a category of the target object not recognized by the image recognition model is marked, coordinates of the target object in the target image frame are indicated, and a detection frame indicating the position of the target object in the target image frame is generated according to the coordinates of the target object. Further, when the target object in the target image frame is identified as a black motorcycle, but other potential safety hazards, such as dangerous chemical identification vehicles, are also included in the target image frame, the type of dangerous chemical vehicles in the target image frame is marked, a detection frame for indicating the position of the dangerous chemical in the target image frame is marked, and second marking information is generated, wherein the second associated information includes the second marking information. A second sample data set is generated based on the second association information and the first sample data set.

In the embodiment of the application, the second sample data set is input to the initial neural network for training, the image recognition module is updated, and the updated image recognition module can recognize more types of target objects, so that the potential safety hazard of the expressway is reduced, and the occurrence rate of expressway accidents is reduced. Further, a first sample data set, first association information and second labeling information in the second sample data set are input to the initial neural network, an output result is generated, the output result indicates the category of the target object in the target image frame and a detection frame indicating the position of the target object in the target image frame, and then the image recognition model is updated.

It will be appreciated that the apparatus shown in fig. 5 comprises corresponding hardware and/or software modules for performing the functions described in fig. 2 and/or 3. The steps of the examples described in connection with the embodiments disclosed herein may be embodied in hardware or a combination of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application in conjunction with the embodiments, but such implementation is not to be considered as outside the scope of this application.

It will be appreciated that in order to implement the functions described in fig. 2 and/or fig. 3, the execution body (e.g., server) of the video recognition method includes corresponding hardware and/or software modules that perform the respective functions. The steps of the examples described in connection with the embodiments disclosed herein may be embodied in hardware or a combination of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application in conjunction with the embodiments, but such implementation is not to be considered as outside the scope of this application.

The present embodiment may divide the execution body (e.g., server) of the video recognition method according to the above method example, for example, may divide each different functional module corresponding to each function, or may integrate two or more functions into one processing module. The integrated modules described above may be implemented in hardware. It should be noted that, in this embodiment, the division of the modules is schematic, only one logic function is divided, and another division manner may be implemented in actual implementation.

In the case of dividing the respective functional modules by the respective functions, fig. 5 shows a possible schematic diagram of the video recognition apparatus 500 related to the above embodiment, and the video recognition apparatus 500 corresponding to fig. 5 may be a software apparatus, running on a server, or the video recognition apparatus 500 may be a combination of software and hardware apparatus, which is embedded in an execution subject (e.g., server) of the personnel scheduling method. As shown in fig. 5, the video recognition apparatus 500 may include: the video acquisition unit 501 is used for acquiring video information of the expressway ramp entrance; a video processing unit 502, configured to perform hardware decoding on the video information to obtain a plurality of image frames; the image recognition unit 503 recognizes the plurality of image frames through a pre-trained image recognition model, and generates a recognition result indicating whether a target object is included in the plurality of image frames, the target object including at least one of: pedestrians, non-motor vehicles, animals, special work clothes and anomalies; a calculating unit 504, configured to calculate, when at least one image frame of the plurality of image frames includes the target object, a track of the target object by using a target tracking algorithm; the generating unit 505 is configured to correlate the track of the target object, the identifier of the target object, the time information, and the geographic information, and generate first correlation information.

It should be noted that: in the implementation of the functions of the video recognition apparatus 500 according to the foregoing embodiment, only the division of the functional modules is used as an example, and in practical applications, the functional modules may be allocated to different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the embodiments of the apparatus and the method provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the embodiments of the method are detailed in the method embodiments, which are not repeated herein.

The application also discloses electronic equipment. Referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to the disclosure of the embodiment of the present application. The electronic device 600 may be, for example, a server, and the electronic device 600 is configured to perform the method flow shown in fig. 2, 3 or 5. The electronic device 600 may include: at least one processor 601, at least one network interface 604, a user interface 603, a memory 605, at least one communication bus 602.

Wherein the communication bus 602 is used to enable connected communications between these components.

The user interface 603 may include a Display screen (Display), a Camera (Camera), and the optional user interface 603 may further include a standard wired interface, a wireless interface.

The network interface 604 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.

Wherein the processor 601 may include one or more processing cores. The processor 601 connects various portions of the overall server using various interfaces and lines, performs various functions of the server and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 605, and invoking data stored in the memory 605. Alternatively, the processor 601 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 601 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 601 and may be implemented by a single chip.

The Memory 605 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 605 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 605 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 605 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, etc.; the storage data area may store data or the like involved in the above respective method embodiments. The memory 605 may also optionally be at least one storage device located remotely from the processor 601. Referring to fig. 6, an operating system, a network communication module, a user interface module, and an application program of a data processing method may be included in the memory 605, which is a computer storage medium.

In the electronic device 600 shown in fig. 6, the user interface 603 is mainly used for providing an input interface for a user, and acquiring data input by the user; and processor 601 may be configured to invoke application programs in memory 605 that store a data processing method that, when executed by one or more processors 601, causes electronic device 600 to perform the method as described in one or more of the embodiments above. It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required in the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

In the several embodiments provided herein, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, such as a division of unit modules, merely a division of logic functions, and there may be other manners of dividing actually being implemented, such as a plurality of unit modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some service interface, device or unit indirect coupling or communication connection, electrical or otherwise.

The unit modules described as separate components may or may not be physically separate, and components displayed as unit modules may or may not be physical unit modules, may be located in one place, or may be distributed over a plurality of network unit modules. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit module in each embodiment of the present application may be integrated in one processing unit, or each unit module may exist alone physically, or two or more unit modules may be integrated in one unit. The integrated unit modules can be realized in the form of hardware or software functional units.

The integrated unit modules, if implemented in the form of software functional unit modules and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a memory, including several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned memory includes: various media capable of storing program codes, such as a U disk, a mobile hard disk, a magnetic disk or an optical disk.

The foregoing is merely exemplary embodiments of the present disclosure and is not intended to limit the scope of the present disclosure. That is, equivalent changes and modifications are contemplated by the teachings of this disclosure, which fall within the scope of the present disclosure. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure.

This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a scope and spirit of the disclosure being indicated by the claims.

Claims

1. A method of video recognition, comprising:

collecting video data of the expressway ramp access opening from a video collecting device arranged at the expressway ramp access opening;

performing hardware decoding on the video data to obtain a plurality of image frames;

identifying the plurality of image frames through a pre-trained image identification model, and generating an identification result, wherein the identification result indicates whether a target object is included in the plurality of image frames, and the target object comprises at least one of the following: pedestrians, non-motor vehicles, animals, special work clothes and anomalies;

When at least one image frame in the plurality of image frames comprises the target object, calculating by using a target tracking algorithm to obtain the track of the target object;

and correlating the track of the target object, the identification of the target object, the time information of the target object and the geographic information of the target object to generate first correlation information, and sending the first correlation information to an alarm device arranged on a highway side.

2. The video recognition method of claim 1, wherein prior to the recognizing the plurality of image frames by the pre-trained image recognition model, further comprising:

obtaining a first sample data set, wherein the first sample data set comprises a plurality of image samples, and each image sample in the plurality of image samples comprises the target object and first labeling information indicating a category to which the target object belongs;

and inputting the first sample data set into an initial neural network for training to obtain the image recognition model.

3. The video recognition method of claim 1, wherein when at least one of the plurality of image frames includes the target object, obtaining the track of the target object using a target tracking algorithm comprises:

Taking the image frame of the first identified first target object in the target objects in the plurality of image frames as a target image frame according to the time sequence;

and processing the continuous preset number of image frames comprising the target image frames by utilizing the target tracking algorithm to generate the track of the first target object.

4. The video recognition method according to claim 1, wherein the associating the track of the target object, the identification of the target object, the time information, and the geographic information, after generating the first association information, further comprises:

generating second association information based on the image frame corresponding to the target object and the first association information;

transmitting the second association information to a display device;

the first association information is voice information, and the second association information is image information.

5. The video recognition method according to claim 2 or 4, wherein after generating second association information based on the image frame corresponding to the target object and the first association information, further comprising:

generating a second sample data set based on the second association information and the first sample data set; the second sample data set includes second labeling information, where the second labeling information is used to indicate a category of the target object in the target image frame, where the target object is not identified by the image identification model.

6. The video recognition method of claim 5, wherein after generating a second sample data set based on the second association information and the first sample data set, further comprising:

and inputting the second sample data set into the image recognition model, and updating the pre-trained image recognition model.

7. The video recognition method of claim 1, wherein the method further comprises: and sending the first association information to terminal equipment equipped by the staff.

8. A video recognition apparatus, comprising:

the video acquisition unit is used for acquiring video data of the expressway ramp access opening from a video acquisition device arranged at the expressway ramp access opening;

the video processing unit is used for carrying out hardware decoding on the video information to obtain a plurality of image frames;

the image recognition unit is used for recognizing the plurality of image frames through a pre-trained image recognition model, and generating a recognition result, wherein the recognition result indicates whether a target object is included in the plurality of image frames, and the target object comprises at least one of the following: pedestrians, non-motor vehicles, animals, special work clothes and anomalies;

A calculating unit, configured to calculate, when at least one image frame of the plurality of image frames includes the target object, a track of the target object by using a target tracking algorithm;

the generation unit is used for correlating the track of the target object, the identification of the target object, the time information and the geographic information, generating first correlation information and sending the first correlation information to an alarm device arranged on the expressway side.

9. An electronic device comprising a processor, a memory, a user interface, and a network interface;

the memory is used for storing instructions;

the user interface and the network interface are used for communicating with other devices;

the processor configured to execute instructions stored in the memory to cause the electronic device to perform the method of any one of claims 1-7.

10. A readable storage medium comprising computer instructions which, when run on a computer, cause the computer to perform the method of any of claims 1-7.