CN111885423A - Positioning method and positioning system combining UWB and UTC time stamp synchronization - Google Patents

Positioning method and positioning system combining UWB and UTC time stamp synchronization Download PDF

Info

Publication number
CN111885423A
CN111885423A CN202010707002.XA CN202010707002A CN111885423A CN 111885423 A CN111885423 A CN 111885423A CN 202010707002 A CN202010707002 A CN 202010707002A CN 111885423 A CN111885423 A CN 111885423A
Authority
CN
China
Prior art keywords
video stream
uwb
utc
target object
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010707002.XA
Other languages
Chinese (zh)
Other versions
CN111885423B (en
Inventor
魏志斌
李桂友
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhikan Technology Co ltd
Original Assignee
Shanghai Zhikan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhikan Technology Co ltd filed Critical Shanghai Zhikan Technology Co ltd
Priority to CN202010707002.XA priority Critical patent/CN111885423B/en
Publication of CN111885423A publication Critical patent/CN111885423A/en
Application granted granted Critical
Publication of CN111885423B publication Critical patent/CN111885423B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/643Communication protocols
    • H04N21/6437Real-time Transport Protocol [RTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • H04N7/181Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a plurality of remote sources
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention discloses a positioning method and a positioning system combining UWB and UTC timestamp synchronization, which are based on UTC time correction, utilize MFCC algorithm to synchronize multiple paths of video streams to obtain unified UTC time of frame data, and obtain a conversion matrix between the position of a target object in a picture and the actual physical position of the target object by combining UWB position information. On the basis of the technology, the intelligent monitoring system can be used for intelligent monitoring application of the operation track, the busy line state, the action analysis, the operation efficiency and the like of the forklift, a series of indexes with important significance are obtained, management personnel can be helped to carry out unified management and scheduling, the use efficiency of the forklift and the working efficiency of workers are effectively improved, the situations of idle work and the like are reduced, and the whole economic output of warehousing is improved.

Description

Positioning method and positioning system combining UWB and UTC time stamp synchronization
Technical Field
The invention relates to the field of intelligent warehousing management, in particular to a positioning method and a positioning system combining UWB and UTC timestamp synchronization.
Background
At present, the warehouse storage intelligent management technology is continuously improved, the forklift management strategy is more and more emphasized, and the real-time position information and the operation track of the forklift are more and more important for the overall control of warehouse management personnel. Through the statistics of the forklift conditions in the warehousing target area, a decision maker can master the quantity and the operation condition of the forklifts in each area in time, the dispatching arrangement of the forklifts is facilitated, and the use efficiency of the forklifts is greatly improved.
In the method for monitoring the running track of the forklift in the market, modules such as a GPS (global positioning system) or an RFID (radio frequency identification device) and the like are respectively used for detection, and the two methods are as follows:
1) the position detection of the forklift is realized through the GPS technology. The gps (global Positioning system), which is a Positioning system based on air satellite for high-precision radio navigation, can provide accurate geographic position, vehicle speed and time information anywhere in the world and in the near-earth space. A wireless transmitting device and a GPS positioning module can be installed on each forklift in a specific area, relevant information of each forklift at present, including parameters such as state (idle or busy), position, current speed and the like, is transmitted to a dispatching center through a GPRS (general Packet radio service) network, and the dispatching center comprehensively understands the position and the operation of the forklift and carries out unified management and dispatching according to service conditions in the current area. As mentioned above, the whole GPS system is composed of a dispatch center, a mobile terminal and a mobile communication network.
2) The forklift position detection is realized through the RFID technology. RFID (radio Frequency identification) is a radio Frequency identification technology, which can automatically identify a target object through a radio Frequency signal and can quickly track articles and exchange data to identify a moving object. An RFID system generally comprises a signal transmitter, a signal receiver, and a transmitting and receiving antenna. The ultrahigh frequency RFID vehicle-mounted reader-writer and the RFID antenna are arranged on the forklift, the forklift RFID tags are arranged in a one-to-one correspondence mode, and the ID information of the corresponding forklift is stored in the forklift RFID tags. Meanwhile, RFID readers are arranged in areas such as warehouses and the like to acquire information of the RFID tags of the forklifts, the management platform calculates the positions of the RFID tags of the forklifts in the areas according to the filtered information, and the positions of the RFID tags of the forklifts can be marked on an area map, so that the distribution of the forklifts in the areas at that time can be known. And data are acquired in real time in a certain period, and the position and the running track of the forklift in the area can be accurately and really obtained by calculating according to the process.
As described above, in the GPS or RFID system, a position detection module including a GPS module and an RFID module is generally required to be installed in the forklift. The method has the advantages that the method is realized through hardware equipment, the obtained data are accurate and rich, and the high-precision motion trail can be generated after the calculation of a dispatching center or a management platform. However, this solution has the following significant drawbacks:
1. the time and labor are consumed for the transformation of the forklift equipment, the corresponding GPS/RFID hardware module and the mobile communication module are required to be arranged on each forklift in time and manpower, and the work plan is undoubtedly interrupted for the warehousing operation with busy business.
2. Additional network deployment is needed, a signal acquisition network needs to be deployed at a proper position in a storage area and the like in an RFID mode to receive information, so that the information is limited by warehouse buildings, wiring, security and the like, on the other hand, manual maintenance is increased, and special posts need to be arranged.
3. The economic cost of the scheme is high, and for relevant equipment installed on each forklift and a signal acquisition network arranged in a warehouse area, for a warehouse management operator, besides the initial one-time investment, the economic cost is also the subsequent equipment updating cost and the system operation cost.
The technical scheme for managing the operation of the forklift can influence the cost control and the business operation of a warehousing management unit to a certain extent. Therefore, the market demand for a moderate cost and easy to deploy solution is very vigorous.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a positioning method and a positioning system combining UWB and UTC timestamp synchronization, solves the defects of difficult equipment modification, troublesome network deployment, high investment cost and the like in a warehouse management system, and achieves the aim of matching the position of a target object in a frame in a video stream with the actual physical position by realizing the matching with UWB positioning in a UTC timestamp synchronization mode on the basis of video monitoring.
The technical scheme of the invention is as follows: a positioning method combining UWB and UTC time stamp synchronization is provided, which comprises the following steps:
s1: each camera in the warehouse collects video stream data in a warehousing target area;
s2: acquiring a UTC timestamp of each path of video stream data;
s3: adding the UTC timestamp of the video stream of the certain path calculated in the step S2 to the encoded video stream information, and re-encoding the video stream information by the server;
s4: the server receives and decodes the recoded multi-channel video stream information, and analyzes the UTC timestamp corresponding to the video stream;
s5: obtaining the position information of the target object and the corresponding timestamp by adopting a UWB positioning system, and matching the position information of the target object and the corresponding timestamp obtained by the UWB positioning system with the UTC timestamp corresponding to the video stream analyzed by the server in the step S4 to obtain accurate UWB physical position information corresponding to each frame of video data;
s6: detecting position information of a target object relative to a frame picture in a video stream by adopting a deep learning training model;
s7: establishing a position mapping matrix between the accurate UWB physical position information obtained in the step S5 and the position information of the target object with respect to the frame picture detected in the step S6;
s8: and detecting the position of the target object from the collected video stream data, and obtaining the exact physical position information of the target object according to the position mapping matrix.
Further, the camera comprises a camera terminal array, a data receiving module and a processing server, the camera terminal array is used for collecting video data, the data receiving module is used for receiving the video data collected by the camera terminal array, the processing server is used for processing the collected video data, and the data receiving module and the processing server directly read the video data through an RTSP (real time streaming protocol).
Further, in step S2, the obtaining the UTC timestamp of each video stream data includes the following steps:
s2.1: carrying out UTC time correction on the camera to obtain an accurate UTC timestamp when the camera really acquires video stream data;
s2.2: and synchronizing a plurality of paths of video streams by adopting an MFCC method, selecting one path of video stream with more accurate UTC time correction in the step S2.1 as a reference video stream, calculating the time difference of other video streams relative to the reference video stream according to the MFCC method, converting all the time difference into NTP time, and adding the accurate UTC time stamp when the reference video stream is acquired and the time difference of some other path of video stream relative to the reference video stream calculated according to the MFCC method to obtain the UTC time stamp of the path of video stream.
Further, in the step S2.2, the synchronizing the multiple video streams by using the MFCC method includes:
s2.2.1: placing a sound source in the center of the warehouse, and enabling the audio signals received by the multiple cameras at the corners to be the same;
s2.2.2: after the video stream and the audio stream of a target area are collected by a camera, audio characteristic identification matching is carried out according to an MFCC audio signal matching method, and the time deviation of any two paths of audio streams is obtained;
s2.2.3: the video stream data is added to the time offset of the audio stream obtained in step S2.2.2 to realize synchronous multi-channel video stream.
Further, the matching of audio according to the MFCC audio signal matching method comprises:
in different paths of audio streams, a feature matching algorithm is used for obtaining the parts with the highest identification degree in the two paths of audio streams, and the method comprises the following steps:
arranging a group of band-pass filters from dense to sparse in a section of frequency band from low frequency to high frequency according to the size of critical bandwidth, and filtering any two paths of input voice signals;
converting any two paths of input continuous voice signals into short frames, and performing power spectrum calculation on each frame of voice;
and applying the mel filters to the power spectrum, solving the energy of each group of mel filters, adding the energy of each group of mel filters, carrying out logarithm operation, carrying out DCT (discrete cosine transformation) to obtain a characteristic value, and carrying out characteristic identification matching by using a nearest neighbor identification algorithm to obtain a part with the highest identification degree in the two paths of audio streams.
Further, in step S3, the server re-encodes the video stream information by using H264 or H265 encoding, including:
adding the UTC timestamp of the video stream calculated in step S2 to the SMI part of the NAL information in the H264 or H265 encoding to constitute new video stream information.
Further, in step S4, the server receives the multiple video stream information after re-encoding and decodes the multiple video stream information by using H264 or H265 decoding.
Further, in step S6, detecting the position information of the target object in the video stream with respect to the frame picture by using the deep learning training model includes:
on the basis of a large number of local real video stream pictures acquired in advance, supervised learning and training are carried out by adopting an SSD algorithm until the training is mature;
identifying a target object in the video stream picture by using a trained SSD algorithm;
and on the picture in which the target object is identified, giving the two-dimensional coordinates of the target object in the picture.
Further, the UWB positioning system adopts a TDOA positioning algorithm for positioning.
The present invention also provides a positioning system comprising: the system comprises a camera cluster, a server, a memory and a computer program which is stored in the memory and can run on the server, wherein the camera cluster is installed at each corner of a warehouse, covers all target monitoring areas in the warehouse and is used for collecting video stream data in a warehousing target area, and the positioning method combining UWB and UTC timestamp synchronization is realized when the server executes the computer program.
By adopting the scheme, the invention uses the MFCC algorithm to synchronize the multi-channel video stream to obtain the unified UTC time of the frame data based on the UTC time correction, and then combines the UWB position information to obtain the conversion matrix between the position of the target object in the picture and the actual physical position of the target object. On the basis of the technology, the intelligent monitoring system can be used for intelligent monitoring application of the operation track, the busy line state, the action analysis, the operation efficiency and the like of the forklift, a series of indexes with important significance are obtained, management personnel can be helped to carry out unified management and scheduling, the use efficiency of the forklift and the working efficiency of workers are effectively improved, the situations of idle work and the like are reduced, and the whole economic output of warehousing is improved.
Drawings
Fig. 1 is a schematic diagram of UWB positioning.
FIG. 2 is a flow chart of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and the specific embodiments.
Because a GPS or RFID mode is adopted, a position detection module consisting of a GPS module and an RFID module needs to be installed in the forklift, or additional network deployment needs to be carried out, the original camera in the warehouse is utilized, additional deployment is not needed, and the actual physical position of the target object (in the embodiment, the forklift) can be identified through the video picture acquired by the camera.
UWB location technology belongs to one of wireless location. Popular wireless positioning technology includes GPS location, big dipper location, bluetooth location, WIFI location, RFID location etc. wherein GPS, big dipper mainly are used in outdoor location, and bluetooth location, WIFI location, RFID location, UWB location, operator base station location mainly used are indoor location. UWB location and bluetooth location, WIFI location, RFID location get up and compare has the advantage in the precision.
Wireless location technology refers to measurement and calculation methods, i.e., location algorithms, used to determine the location of a mobile user. The most common positioning techniques currently used are mainly: time difference of arrival location techniques, signal angle of arrival measurement (AOA) techniques, time of arrival location (TOA) and time difference of arrival location (TDOA), and the like. Among them, TDOA technology is the most popular scheme at present, and is widely used in other systems such as AMPS and CDMA systems in addition to the GSM system, and this technology is also used for UWB positioning. TDOA ensures that the high precision of UWB positioning can reach about 10cm, TDOA ranging is essentially a pulse sequence equipment signal, and the anti-interference capability is stronger due to the adoption of a very-expensive frequency spectrum.
The algorithm flow of UWB positioning is as follows:
as shown in fig. 1, the terminal sends out a pulse signal wave, the slave base station R1 transmits information for measuring the same signal at the same time to the master base station R3, the master base station R3 calculates the time difference between the arrival of the signal from the terminal at the antennas of the 2 base stations, and the distance can be calculated from the time difference, so that a hyperbola can be obtained.
One UWB base station is both a master base station and a slave base station, so that 3 hyperbolas can be obtained by positioning 3 base stations, and the positioning point S of the 3 hyperbolas is the position of the terminal.
To implement the TDOA location algorithm for location, the key point is to maintain 3 base stations synchronized, so that the time from each terminal to each UWB base station can be accurately measured, and therefore, the time synchronization between the base stations in nano-second level needs to be maintained.
Referring to fig. 2, the present invention provides a positioning method combining UWB and UTC timestamp synchronization, including the following steps:
s1: and each camera in the warehouse collects video stream data in a warehousing target area.
The camera comprises a camera terminal array, a data receiving module and a processing server. The camera terminal array is used for collecting video data, the data receiving module is used for receiving the video data collected by the camera terminal array, no special parameter or brand requirement exists, the definition is 1080P, and due to the fact that no special requirement exists, the camera cluster which is installed in the original warehouse can be reused, all target monitoring areas are covered by the camera cluster, and therefore data can be fully read so that follow-up calculation and analysis can be sufficiently carried out.
The processing server is used for processing the acquired video data, and the data receiving module and the processing server directly read the video data through an RTSP protocol. Rtsp (real Time Streaming protocol), an application layer protocol in the TCP/IP protocol system, defines how a one-to-many application can effectively transmit multimedia data over an IP network. And the data receiving and processing server receives and unpacks the RTSP, caches the RTSP locally, and then transmits the RTSP to the next module in full.
S2: the method for acquiring the UTC timestamp of each path of video stream data comprises the following steps:
s2.1: and carrying out UTC time correction on the camera to obtain an accurate UTC timestamp when the camera really acquires video stream data.
The camera acquisition process comprises the following processes: the sensor senses incoming optical signal data- > adjusts ISP and converts the incoming optical signal data into YUV frame data- > video coding- > RTSP stream output- > a client receives RTSP stream- > a client end packages and decodes- > restores the frame data. In this process, the frame data obtained by the client has a long time lag, because after obtaining the frame data, the timestamp of the sensor that has come to obtain the frame data needs to be restored for the subsequent operation.
The adopted method is to set UTC time synchronization of the camera through an ONVIF protocol, and set RTSP stream transmission of the camera to contain a UTC time stamp after synchronization. And after receiving the RTSP stream and decoding frame data, the client side takes out UTC timestamp data and adds the internal time error of the camera as an accurate UTC timestamp when the video stream data is really acquired. In the actual process, the internal error time needs to be manually adjusted for many times to obtain a relatively accurate UTC timestamp.
S2.2: and synchronizing a plurality of paths of video streams by adopting an MFCC method, selecting one path of video stream with more accurate UTC time correction in the step S2.1 as a reference video stream, calculating the time difference of other video streams relative to the reference video stream according to the MFCC method, converting all the time difference into NTP time, and adding the accurate UTC time stamp when the reference video stream is acquired and the time difference of some other path of video stream relative to the reference video stream calculated according to the MFCC method to obtain the UTC time stamp of the path of video stream.
Since in the previous step the error between the real image and the real time stamp cannot be eliminated. When multiple video streams are collected simultaneously, network delays in the RTSP transmission process are also different, which is not favorable for establishing a transformation matrix by combining with UWB positioning, and therefore, a method for synchronizing multiple video streams is also needed. In a warehouse field environment, a plurality of cameras with different angles are arranged at the edge, the shape and illumination of a target object are different due to different angles, and great difficulty and errors exist in frame synchronization by using a visual method. In a warehouse environment, although the angles of the cameras are different, the audio signals collected by the cameras are the same, and the audio signals are used as the synchronization standards of different video streams, which is simple and feasible, so that the method for performing frame synchronization by using audio data is specifically as follows:
s2.2.1: and placing a sound source in the center of the warehouse to enable the audio signals received by the multiple cameras at the corners to be the same.
S2.2.2: after the video stream and the audio stream of a target area are collected by the camera, audio characteristic identification matching is carried out according to an MFCC audio signal matching method, and the time deviation of any two paths of audio streams is obtained.
S2.2.3: the video stream data is added to the time offset of the audio stream obtained in step S2.2.2 to realize synchronous multi-channel video stream.
The matching of audio according to the MFCC audio signal matching method includes:
in different paths of audio streams, a feature matching algorithm is used for obtaining the parts with the highest identification degree in the two paths of audio streams, and the method comprises the following steps:
arranging a group of band-pass filters from dense to sparse in a section of frequency band from low frequency to high frequency according to the size of critical bandwidth, and filtering any two paths of input voice signals;
converting any two paths of input continuous voice signals into short frames, and performing power spectrum calculation on each frame of voice;
and applying the mel filters to the power spectrum, solving the energy of each group of mel filters, adding the energy of each group of mel filters, carrying out logarithm operation, carrying out DCT (discrete cosine transformation) to obtain a characteristic value, and carrying out characteristic identification matching by using a nearest neighbor identification algorithm to obtain a part with the highest identification degree in the two paths of audio streams.
The Mel cepstrum coefficient is abbreviated as MFCC, and because the frequency of sound perceived by human ears and the actual frequency of sound are not linear, the influence of a speech signal between 200Hz and 5000Hz on the definition of speech is large. The lower frequencies tend to mask treble tones due to masking effects, so the critical bandwidth for bass masking is smaller than the treble bandwidth. The MFCC filters an input signal in a frequency band from low frequency to high frequency by arranging a group of band-pass filters according to the size of a critical bandwidth from dense to sparse. The signal energy output by each band-pass filter is used as the basic characteristic of the signal, and the characteristic can be used as the input characteristic of voice after being further processed. Since the characteristics do not depend on the properties of the signals, no assumptions and restrictions are made on the input signals, and the research results of the auditory model are utilized. Therefore, such parameters are more robust than the LPCC based on the vocal tract model, more fit to the auditory characteristics of human ears, and still have better recognition performance when the signal-to-noise ratio is reduced.
S3: the UTC timestamp calculated in step S2 for the one video stream is added to the encoded video stream information, and the server re-encodes the one video stream information. In this embodiment, the server re-encodes the video stream information by using H264 or H265 encoding, including:
adding the UTC timestamp of the video stream calculated in step S2 to the SMI part of the NAL information in the H264 or H265 encoding to constitute new video stream information.
In H264/H265 coding, NAL information needs to be added to header information, and SEI is one of the NAL information. In the H.264/AVC standard, the whole system framework is divided into two layers: video Coding Layer (VCL) and Network Abstraction Layer (NAL). The VCL is responsible for representing the content of the valid video data and the NAL is responsible for formatting the data and providing header information to ensure that the data is suitable for transmission on various channels and storage media. Nallunit is the basic syntax structure of NAL, which contains one byte of header information (NAL header) and a series of raw data byte streams from the VCL.
The NAL Type of 6 indicates that the NAL carries auxiliary enhancement information SEI, that is, supplemental enhancement information, and belongs to the category of code streams, and it provides a method for adding additional information into video code streams, which is one of the characteristics of the video compression standards h.264/h.265. When the frame data is transmitted, the self-defined information in the SMI can be obtained by analyzing the NAL header of the frame data, or can be obtained by analyzing the ffmpeg.
According to the requirements of the client, in the following H264 or H265 encoding, the UTC timestamp calculated in step S2 can be added to the video stream information being encoded, so that the client can obtain the UTC timestamp by parsing. The scheme adopts a method of adding a UTC timestamp in an SMI part of NAL information in H264.
S4: and the server receives and decodes the recoded multi-channel video stream information, and analyzes the UTC timestamp corresponding to the video stream. Specifically, the server receives the re-encoded multi-channel video stream information and decodes it using H264 or H265 decoding to obtain the absolute NTP time. In decoding of H264 code stream developed and received by the client, SMI information needs to be additionally analyzed to obtain the UTC timestamp corresponding to the frame data.
S5: and (4) obtaining the position information of the target object and the corresponding timestamp by adopting a UWB positioning system, and matching the position information of the target object and the corresponding timestamp obtained by the UWB positioning system with the UTC timestamp corresponding to the video stream analyzed by the server in the step S4 to obtain accurate UWB physical position information corresponding to each frame of video data.
The UWB positioning system can conveniently obtain the position information of the target object and the corresponding timestamp, and the client side can obtain UTC timestamp information by analyzing SMI information after decoding the video stream from the video stream server. The position information and the time stamp obtained by the UWB positioning system and the UTC time stamp obtained by analyzing the SMI in the video stream can be used as a matching standard to match the position information and the time stamp, and further the UWB positioning information corresponding to each frame of video can be known.
S6: the method for detecting the position information of the target object relative to the frame picture in the video stream by adopting a deep learning training model comprises the following steps:
on the basis of a large number of local real video stream pictures acquired in advance, supervised learning and training are carried out by adopting an SSD algorithm until the training is mature;
identifying a target object in the video stream picture by using a trained SSD algorithm;
and on the picture in which the target object is identified, giving the two-dimensional coordinates of the target object in the picture.
The fork truck in the picture is identified by using an SSD algorithm, and the algorithm has higher precision compared with that of Faster-CNN, and is suitable for scenes with higher precision requirements. On the basis of a large number of local real video stream pictures acquired in advance, the SSD algorithm is supervised, learned and trained, and after the test set achieves a satisfactory effect (the recall rate and the accuracy rate both reach more than 90%), the video pictures of the real environment can be judged, the forklift images in the video pictures are identified, and relevant marks are made.
On the picture where the forklift is identified, its two-dimensional coordinates within the picture are given. For different pictures of the same camera, the same fixed area is irradiated, the SSD algorithm can give out two-dimensional coordinates of all picture sets corresponding to the camera, and the origin of the coordinates is consistent with the XY axes. After identifying the forklift, the SSD algorithm gives a specific two-dimensional coordinate of the forklift determined in the coordinate system of the picture.
S7: a position mapping matrix between the accurate UWB physical position information obtained in the step S5 and the position information of the target object with respect to the frame picture detected in the step S6 is established.
By adopting the deep learning training model, the position information of the target object relative to the frame picture can be detected in the video stream, but the position information is not the actual physical position due to the squint angle of the camera. According to the UWB technique adopted in step S5, the accurate position information acquired by the UWB corresponding to the frame data can be known, and the position information of the frame target object relative to the frame picture can be detected by deep learning, and the two have a certain mapping relationship, and the mapping matrices of the two can be obtained through a plurality of experiments.
S8: and detecting the position of the target object from the collected video stream data, and obtaining the exact physical position information of the target object according to the position mapping matrix.
And obtaining a mapping matrix of the UWB physical position information and the position information of the target object relative to the frame picture, and obtaining the exact physical position information of the target object through the mapping matrix according to the position of the target object in the video stream in the later video stream acquisition, so that the subsequent further development including but not limited to track tracking and the like can be conveniently carried out.
The positioning method combining UWB and UTC time stamp synchronization is summarized as follows:
and placing a sound source in the center of the warehouse to enable the audio signals received by the multiple cameras at the corners to be the same. And performing audio characteristic identification matching according to the MFCC audio signal matching method to obtain the time deviation of any two audio streams, and adding the time deviation to the video stream data to obtain the frame synchronization of the video stream. And the video stream server takes one path as a reference to obtain the time difference of other video streams relative to the reference, converts all the time into NTP time and records the NTP time into NAL information in the video stream H264 coding. The client receives the video stream and analyzes the NAL information to obtain the absolute NTP time, and UTC time correction is carried out according to the time difference from the video stream to the server. And the accurate physical position of each target position in each frame of video can be known by combining the UWB indoor positioning technology to carry out UTC timestamp matching.
The present invention also provides a positioning system comprising: a camera cluster, a server, a memory, and a computer program stored in the memory and executable on the server. The camera cluster is installed at each corner of the warehouse, covers all target monitoring areas in the warehouse and is used for collecting video stream data in a storage target area. The server, when executing the computer program, implements the positioning method that combines UWB and UTC timestamp synchronization as described above.
In actual operation, the operations of decoding, synchronizing and recoding multiple paths of video streams need to be performed, and a server with stronger coding and decoding capabilities is needed. Meanwhile, after processing, image processing algorithms such as target detection and tracking can be combined, and a GPU is needed to be used for algorithm operation acceleration. The scheme adopts a Jetson AGX Xavier device of NVIDIA company, which is provided with a special hardware acceleration coding and decoding module, the highest HEVC decoding of 32 paths of 1080P and 48 paths of 1080P, and simultaneously is provided with a 512-core Volta GPU, the calculation power reaches 11TFOPS, and in addition, a DLA acceleration engine can also be used as AI algorithm hardware for acceleration. The performance is strong, and the hardware requirement required by the scheme is met.
In summary, the present invention uses MFCC algorithm to synchronize multiple video streams to obtain UTC time with unified frame data based on UTC time correction, and thus, in combination with UWB location information, obtains a transformation matrix between the location of the target object in the picture and the actual physical location of the target object. On the basis of the technology, the intelligent monitoring system can be used for intelligent monitoring application of the operation track, the busy line state, the action analysis, the operation efficiency and the like of the forklift, a series of indexes with important significance are obtained, management personnel can be helped to carry out unified management and scheduling, the use efficiency of the forklift and the working efficiency of workers are effectively improved, the situations of idle work and the like are reduced, and the whole economic output of warehousing is improved.
The present invention is not limited to the above preferred embodiments, and any modifications, equivalent substitutions and improvements made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A positioning method combining UWB and UTC timestamp synchronization, comprising the steps of:
s1: each camera in the warehouse collects video stream data in a warehousing target area;
s2: acquiring a UTC timestamp of each path of video stream data;
s3: adding the UTC timestamp of the video stream of the certain path calculated in the step S2 to the encoded video stream information, and re-encoding the video stream information by the server;
s4: the server receives and decodes the recoded multi-channel video stream information, and analyzes the UTC timestamp corresponding to the video stream;
s5: obtaining the position information of the target object and the corresponding timestamp by adopting a UWB positioning system, and matching the position information of the target object and the corresponding timestamp obtained by the UWB positioning system with the UTC timestamp corresponding to the video stream analyzed by the server in the step S4 to obtain accurate UWB physical position information corresponding to each frame of video data;
s6: detecting position information of a target object relative to a frame picture in a video stream by adopting a deep learning training model;
s7: establishing a position mapping matrix between the accurate UWB physical position information obtained in the step S5 and the position information of the target object with respect to the frame picture detected in the step S6;
s8: and detecting the position of the target object from the collected video stream data, and obtaining the exact physical position information of the target object according to the position mapping matrix.
2. The positioning method in conjunction with UWB and UTC timestamp synchronization according to claim 1, wherein said camera comprises a camera terminal array for collecting video data, a data receiving module for receiving video data collected by said camera terminal array, and a processing server for processing said collected video data, said data receiving module and processing server directly reading said video data through RTSP protocol.
3. The positioning method combining UWB and UTC timestamp synchronization according to claim 1, wherein in step S2, the obtaining UTC timestamp of each video stream data comprises the following steps:
s2.1: carrying out UTC time correction on the camera to obtain an accurate UTC timestamp when the camera really acquires video stream data;
s2.2: and synchronizing a plurality of paths of video streams by adopting an MFCC method, selecting one path of video stream with more accurate UTC time correction in the step S2.1 as a reference video stream, calculating the time difference of other video streams relative to the reference video stream according to the MFCC method, converting all the time difference into NTP time, and adding the accurate UTC time stamp when the reference video stream is acquired and the time difference of some other path of video stream relative to the reference video stream calculated according to the MFCC method to obtain the UTC time stamp of the path of video stream.
4. The joint UWB and UTC timestamp synchronized positioning method according to claim 3, wherein in said step S2.2, said synchronizing the multiple video streams using MFCC method comprises:
s2.2.1: placing a sound source in the center of the warehouse, and enabling the audio signals received by the multiple cameras at the corners to be the same;
s2.2.2: after the video stream and the audio stream of a target area are collected by a camera, audio characteristic identification matching is carried out according to an MFCC audio signal matching method, and the time deviation of any two paths of audio streams is obtained;
s2.2.3: the video stream data is added to the time offset of the audio stream obtained in step S2.2.2 to realize synchronous multi-channel video stream.
5. The joint UWB and UTC timestamp synchronized positioning method of claim 4, wherein the matching of audio according to MFCC audio signal matching method comprises:
in different paths of audio streams, a feature matching algorithm is used for obtaining the parts with the highest identification degree in the two paths of audio streams, and the method comprises the following steps:
arranging a group of band-pass filters from dense to sparse in a section of frequency band from low frequency to high frequency according to the size of critical bandwidth, and filtering any two paths of input voice signals;
converting any two paths of input continuous voice signals into short frames, and performing power spectrum calculation on each frame of voice;
and applying the mel filters to the power spectrum, solving the energy of each group of mel filters, adding the energy of each group of mel filters, carrying out logarithm operation, carrying out DCT (discrete cosine transformation) to obtain a characteristic value, and carrying out characteristic identification matching by using a nearest neighbor identification algorithm to obtain a part with the highest identification degree in the two paths of audio streams.
6. The joint UWB and UTC timestamp synchronized positioning method of claim 1, wherein in the step S3, the server re-encodes the video stream information with H264 or H265 coding, comprising:
adding the UTC timestamp of the video stream calculated in step S2 to the SMI part of the NAL information in the H264 or H265 encoding to constitute new video stream information.
7. The joint UWB and UTC timestamp synchronized positioning method of claim 6 wherein in step S4, the server receives the re-encoded multiple video stream information and decodes it using H264 or H265 decoding.
8. The positioning method of joint UWB and UTC timestamp synchronization according to claim 1, wherein in step S6, the detecting the position information of the target object in the video stream relative to the frame picture by means of deep learning training model comprises:
on the basis of a large number of local real video stream pictures acquired in advance, supervised learning and training are carried out by adopting an SSD algorithm until the training is mature;
identifying a target object in the video stream picture by using a trained SSD algorithm;
and on the picture in which the target object is identified, giving the two-dimensional coordinates of the target object in the picture.
9. The joint UWB and UTC timestamp synchronized location method of claim 1 wherein the UWB location system employs a TDOA location algorithm for location.
10. A positioning system, comprising: camera cluster, server, memory, and computer program stored in the memory and executable on the server, the camera cluster being installed at each corner of the warehouse, covering all target monitoring areas within the warehouse, for collecting video stream data within a warehouse target area, characterized in that the server when executing the computer program implements the positioning method combining UWB and UTC timestamp synchronization according to any one of claims 1 to 9.
CN202010707002.XA 2020-07-21 2020-07-21 Positioning method and positioning system combining UWB and UTC time stamp synchronization Active CN111885423B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010707002.XA CN111885423B (en) 2020-07-21 2020-07-21 Positioning method and positioning system combining UWB and UTC time stamp synchronization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010707002.XA CN111885423B (en) 2020-07-21 2020-07-21 Positioning method and positioning system combining UWB and UTC time stamp synchronization

Publications (2)

Publication Number Publication Date
CN111885423A true CN111885423A (en) 2020-11-03
CN111885423B CN111885423B (en) 2022-05-31

Family

ID=73155262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010707002.XA Active CN111885423B (en) 2020-07-21 2020-07-21 Positioning method and positioning system combining UWB and UTC time stamp synchronization

Country Status (1)

Country Link
CN (1) CN111885423B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112770122A (en) * 2020-12-31 2021-05-07 上海网达软件股份有限公司 Method and system for synchronizing videos on cloud director
CN115797445A (en) * 2023-02-06 2023-03-14 成都智元汇信息技术股份有限公司 Indoor personnel positioning method and device based on image recognition and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120314078A1 (en) * 2011-06-13 2012-12-13 Sony Corporation Object monitoring apparatus and method thereof, camera apparatus and monitoring system
CN109240307A (en) * 2018-10-12 2019-01-18 苏州优智达机器人有限公司 Robot precise positioning system
CN111148029A (en) * 2019-12-18 2020-05-12 中国东方电气集团有限公司 Personnel positioning and identifying intelligent management system and method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120314078A1 (en) * 2011-06-13 2012-12-13 Sony Corporation Object monitoring apparatus and method thereof, camera apparatus and monitoring system
CN109240307A (en) * 2018-10-12 2019-01-18 苏州优智达机器人有限公司 Robot precise positioning system
CN111148029A (en) * 2019-12-18 2020-05-12 中国东方电气集团有限公司 Personnel positioning and identifying intelligent management system and method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112770122A (en) * 2020-12-31 2021-05-07 上海网达软件股份有限公司 Method and system for synchronizing videos on cloud director
CN112770122B (en) * 2020-12-31 2022-10-14 上海网达软件股份有限公司 Method and system for synchronizing videos on cloud director
CN115797445A (en) * 2023-02-06 2023-03-14 成都智元汇信息技术股份有限公司 Indoor personnel positioning method and device based on image recognition and medium

Also Published As

Publication number Publication date
CN111885423B (en) 2022-05-31

Similar Documents

Publication Publication Date Title
CN105120230B (en) Unmanned plane picture control and Transmission system
CN111885423B (en) Positioning method and positioning system combining UWB and UTC time stamp synchronization
CN104731856B (en) A kind of method and apparatus of dynamic realtime road conditions query video
CN105120232B (en) Unmanned plane picture control and transmission method
CN105872970B (en) A kind of pseudo-base station detection system based on street lamp Internet of things
CN109525937B (en) Positioning method of indoor positioning management system integrating multiple positioning modes
CN105262989B (en) The aerial unmanned plane of railway line is patrolled and real time image collection transmission method automatically
CN108347427B (en) Video data transmission and processing method and device, terminal and server
CN202018301U (en) Wireless video monitoring system based on 3S (GPS, RS and GIS) technology
US20150268327A1 (en) Ultrasound-based localization of client devices with inertial navigation supplement in distributed communication systems and related devices and methods
CN105208335B (en) The aerial high definition multidimensional of high power zoom unmanned plane investigates Transmission system in real time
CN103458361A (en) Scene acquisition and identification method based on mobile terminal
CN102445681B (en) Indoor positioning method and indoor positioning system of movable device
CN103795987B (en) Embedded video collecting and processing system and implementation method thereof
CN105120240A (en) Aerial high-definition multidimensional real-time detection and transmission monitoring device of high-magnification zooming unmanned plane
GB2556701A (en) Image processing device, image processing system and image processing method
CN103200390A (en) Method for monitoring indoor object moving track
CN111565077B (en) Distributed intelligent spectrum monitoring system and method
CN104243897A (en) Vehicle-mounted monitoring system and method
US8694020B2 (en) Wireless sensor based camera location and orientation
CN114493084A (en) Park emergency linkage method and system based on BIM + GIS
CN102932648B (en) Based on control point localization method and the device of Multimedia Data Transmission
CN111090074A (en) Indoor visible light positioning method and equipment based on machine learning
CN107493454A (en) Video camera cascade system
CN105208336B (en) The aerial high definition multidimensional of high power zoom unmanned plane investigates transmission method in real time

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant