WO2016184314A1

WO2016184314A1 - Device and method for establishing structured video image information

Info

Publication number: WO2016184314A1
Application number: PCT/CN2016/081149
Authority: WO
Inventors: 杜晓通; 王伟; 邢大天
Original assignee: 山东大学
Priority date: 2015-05-20
Filing date: 2016-05-05
Publication date: 2016-11-24
Also published as: CN104899261A; CN104899261B

Abstract

A device and method for establishing structured video image information. The device is characterized in that: a CCD/CMOS image sensor module is connected to a video processor, a wireless sensor network receiving module is connected to a CPU processor, outputs of the video processor and the CPU processor are respectively connected to an information fusion module, and the information fusion module is connected to an Ethernet/WiFi interface. The method comprises: fusing video image information with other data, thus forming structured video image information. Text description information is established for a video image by fusing information collected by various sensors and utilizing information of the different sensors and the advantages of a collecting mode of the different sensors, thereby increasing the retrieval speed and the utilization rate of the video image.

Description

Apparatus and method for constructing structured video image information

Technical field

The present invention relates to the field of video image information processing technologies, and in particular, to an apparatus and method for constructing structured video image information.

Background technique

Internet of Things technology has brought about fundamental changes in the work and lifestyle of the current society, and it has therefore been regarded as a revolution in the scientific community. The integration of security technology and Internet of Things technology is also a development trend, and the important role played by the combination of the two is also becoming increasingly prominent.

As the amount of video surveillance data increases, the importance of data becomes more and more prominent, and video information intelligent analysis technology has become the mainstream of current research. The video information intelligent analysis technology is derived from computer vision technology and pattern recognition technology, which can establish a one-to-one mapping relationship between image and image content description, so that the computer can understand the specific content in the video image through digital image analysis. . It is an important means of mining valuable things from massive video and image resources. At present, the main algorithms of intelligent video analysis technology to realize real-time detection, recognition and multi-target tracking of mobile targets are divided into the following five categories: target detection, target tracking, target recognition, behavior analysis, content-based video retrieval and data fusion. Wait.

However, the focus of current video analytics technology remains on the video image itself, and the video image is not associated with other information. Although the information contained in these video images is rich, it is difficult to quantify, and the content can only be abstractly analyzed, and the description of the text language cannot be effectively completed. These abstract content, like video image information, belong to the category of unstructured information and cannot be parsed by refinement. When the amount of information is extremely large, its storage and retrieval will consume a lot of system resources and time. At the same time, because the intelligent video analysis technology relies on the pattern recognition technology, the accuracy and time of the analysis results will vary with the quality of the recognition algorithm. At present, the basis of the recognition algorithm is based on the theory of decades ago. Because there is no big breakthrough in theory, the content-based video image retrieval technology has not changed qualitatively, which has caused no real solution in the field of video surveillance. Rapid positioning and accurate retrieval of massive video image data.

Summary of the invention

The object of the present invention is to solve the above problems, and an apparatus and method for constructing structured video image information are proposed. By closely cooperating with the Internet of Things technology and the security monitoring technology, the information collected by a plurality of different sensors is integrated. Using the advantages of different sensor information and its collection method, text description information is created for the video image, and the retrieval speed and utilization efficiency of the video image are improved.

In order to achieve the above object, the present invention adopts the following technical solutions:

An apparatus for constructing structured video image information, comprising: a CCD/CMOS image sensor module, a wireless sensor network receiving module, a video processor, a CPU processor, an information fusion module, and an Ethernet/WiFi interface;

The CCD/CMOS image sensor module is connected to a video processor, and the wireless sensor network receiving module is connected to the CPU processor, and the outputs of the video processor and the CPU processor are respectively connected to the information fusion module, and the information fusion module and the Ethernet Network/WiFi interface connection;

The video processor is used to complete the encoding function of the video image, and the wireless sensor network receiving module is used to complete the text description information or the binary data receiving or directly connect the temperature, humidity, illuminance, pressure standard sensor, digitize the analog information; and the video image information Conformed with other data to form structured video image information.

The wireless sensor network module uses the ISM frequency band for data transmission, and has a standard sensor input interface, and supports 0-5v, 0-10v standard sensor signal access.

The apparatus of the present invention provides support for accomplishing the structuring of video image information, wherein the method of constructing structured video image information includes a method of constructing a structured JPEG image file and a method of constructing structured video information by describing a file. And based on the structured video image information, the storage and retrieval methods are optimized.

A method of constructing a structured video image information device, comprising:

(1) constructing a structured JPEG image file;

(2) constructing structured video image information;

(3) storing structured JPEG image files and video image information respectively;

(4) Retrieving structured JPEG image files and video image information, respectively.

The specific method of the step (1) is:

Encrypt the text information, standard sensor data, and voice-recognition data and attach it to the original JPEG file. By adding information to the JPEG file tag code, the original unstructured JPEG image file is constructed into a structured JPEG. File (image information + text information). The text information part needs to be decrypted and displayed during parsing, and the image is not affected at all.

The tag code consists of two bytes. The previous byte is fixed at 0xFF to indicate the start of the tag code. The different values of the latter byte represent different meanings. In the image parsing process, the file is parsed from 0xFFD8, and the end of 0xFFD9 parsing ends.

The specific method of the step (2) is:

The description file and the video file are associated as two parts of the structured information, and a corresponding description file is created for the video file of each time period to supplement the text of the single video image;

Encapsulate the original video stream data packet, and pass the original video stream data packet and the standard transmission received by the Internet of Things sensor. The sensor data or manually input text information is fused, and the packet length, video stream data packet, other information length, and other information are combined into a new data packet for transmission.

In the step (3), the method for storing the structured JPEG image file is:

In the process of storage, segmentation or sub-folder storage according to the content of the text information.

In the step (3), the method for storing the structured video image information is:

An index file is constructed for each field of the description file in the structured video image information, and the index file is divided into several levels, and the received text information is stored according to the content of the corresponding level.

In the step (4), the method for searching the structured JPEG image file is:

When searching, first locate the folder where the condition is located, and then search in the subfolder;

The text information length is obtained by parsing the JPEG file data, and then all the text information is extracted according to the length, and compared with the search condition, if the search condition is met, the JPEG file is the image file we are looking for, otherwise it is performed. The comparison of the next file.

In the step (4), the method for searching the structured video image information is:

Search from the storage directory, enter the corresponding first-level search directory, view the index files of all levels to select the qualified directory, put it into the search queue, and obtain the next-level search file from the search queue after the search directory of the level is completed. The path is searched until the corresponding description file location is retrieved; after the description file is retrieved, the corresponding video file is obtained.

The beneficial effects of the invention are:

The device for constructing structured video image information in the invention can first realize the functions of capturing, encoding and transmitting video and images of a conventional camera, and is fully compatible with existing cameras conforming to international standards. On this basis, by adding new device modules and structured information processing algorithms, it is possible to use the other information (text description, standard sensor data) to create explicit description information for videos and images on the video and image information collection end. .

The invention attaches text information (generally sensor information) to a JPEG file to make it a new file with a sensor information label. The sensor information can describe the specific environment information when the JPEG image file is taken, and can make the JPEG file better. Reproduce the scene when shooting. Since the invention does not destroy the structure and content of the original JPEG file, it does not prevent the existing software from reading and displaying the JPEG file, and also protects the security of the sensor information and prevents tampering by the existing software.

DRAWINGS

1 is a structural diagram of an apparatus for constructing structured video image information according to the present invention;

2 is a schematic diagram of structured JPEG image information according to the present invention;

3 is a schematic diagram showing the macroscopic angle of structured video information according to the present invention;

4 is a schematic diagram of a microscopic angle of structured video information according to the present invention;

5 is a schematic diagram of storing a picture file according to the present invention;

6 is a schematic diagram of storing a multi-index video file according to the present invention;

FIG. 7 is a video image file retrieval process of the present invention.

detailed description

The present invention will be further described below in conjunction with the accompanying drawings and embodiments:

The device structure for constructing structured video image information in the present invention is as shown in FIG. 1, and includes: a CCD/CMOS image sensor module, a wireless sensor network receiving module, a video processor, a CPU processor, an information fusion module, and an Ethernet/WiFi interface. ;

The CCD/CMOS image sensor module is connected to the video processor, the wireless sensor network receiving module is connected to the CPU processor, the output of the video processor and the CPU processor are respectively connected to the information fusion module, and the information fusion module is connected with the Ethernet/WiFi interface.

In addition to the general CCD/CMOS sensor for image information acquisition, the device of the invention also designs a wireless sensor network receiving module capable of accepting standard sensors such as text description, temperature, humidity, illumination, pressure and the like. The general video processor (chip) is used to complete the encoding function of the video image, and the wireless sensor network receiving module is used to complete the text description information or the binary data reception or directly connect the standard sensors such as temperature, humidity, illumination, pressure, etc., and simulate the information. Digitizing. The information structure algorithm of the present invention is implemented on a general purpose CPU, and the video image information is fused with other data to form structured video image information.

In the invention, the video processor supports full HD encoding function, adopts pixel high-definition video and image sensor of 130w or more for video acquisition, and the wireless sensor network module uses ISM frequency band for data transmission, and has a standard sensor input interface and supports 0-5v, 0-10v standard sensor signal access.

The video processor is used to complete the encoding function of the video image, and the wireless sensor network receiving module is used to complete the text description information or the binary data receiving or directly connect the temperature, humidity, illuminance, pressure standard sensor, digitize the analog information; and the video image information It is combined with other data to form structured video image information, and the two types of information are combined and uploaded to the server for processing.

Since the speed of the video stream is 25 frames per second, and the environmental parameters are not changed so fast, the number of sensors is set at the time of fusion to calibrate how many sensors collect new data at this time.

The structured information packet format is set as follows:

Table 1 Structured Information Packet Format

content

length

数据包长度Packet length	6字节6 bytes
传感器数量Number of sensors	1字节1 byte
视频信息长度Video message length	6字节6 bytes
视频信息Video information	由“视频信息长度”确定Determined by "video information length"
传感器信息长度Sensor information length	4字节4 bytes
传感器信息Sensor information	由“传感器信息长度”确定Determined by "sensor information length"

(1) constructing a structured JPEG image file;

(2) constructing structured video image information;

Figure 2 is a schematic diagram of a JPEG image file structuring method. The information such as text information, standard sensor data, and voice recognition data is encrypted and attached to the original JPEG file, and the original unstructured JPEG image file is structured into a structure by adding information after the JPEG file mark code EOI. JPEG file (image information + text information). The text information part needs to be decrypted and displayed during parsing, and the image is not affected at all.

By attaching text information (generally sensor information) to a JPEG file to make it a new file with a sensor information tag, the sensor information can describe the specific environmental information when the JPEG image file is taken, enabling better reproduction of JPEG files. The scene when shooting. Since the invention does not destroy the structure and content of the original JPEG file, it does not prevent the existing software from reading and displaying the JPEG file, and also protects the security of the sensor information and prevents tampering by the existing software.

There are many storage formats for JPEG file formats. Currently, the most commonly used formats are JFIF (JPEG File Interchange Format) and EXIF (Exchange Image File Format), which comply with JIF (JPEG Interchange Format). It is roughly divided into two parts:

Tag code: Two bytes. The previous byte is fixed at 0xFF to indicate the start of the tag code. The different values of the latter byte represent different meanings. When a plurality of consecutive 0xFFs appear, it is understood that a 0xFF also indicates the start of the tag code. Here are a few of the main tag codes:

Table 2 JPEG markup code

标记代码Tag code	格式format	意义significance
SOI(Start Of Image)SOI (Start Of Image)	0xFFD80xFFD8	图像开始Image start
APP0(Application0)APP0 (Application0)	0xFFE00xFFE0	应用程序保留标记0Application retention tag 0
SOFO(Start Of Frame)SOFO (Start Of Frame)	0XFFC00XFFC0	帧图像开始Frame image begins
EOI(End Of Image)EOI(End Of Image)	0xFFD90xFFD9	图像结束，图像文件结束End of image, end of image file

Compressed data: The tag code is followed by compressed data, which records the details of the image file.

In the image parsing process, the software parses the file from 0xFFD8 and ends the parsing at 0xFFD9. If we insert the relevant text information into the position of 0xFFD9 in the file, the software will not parse the segment information. This avoids the impact of textual information on image content and image quality. However, the file is integrated with the text information. From the information point of view, the whole unstructured image data and structured text information constitute a piece of structured information, that is, we will use JPEG image files and various sources. The text information is associated and the text information is inserted into the JPEG file. At the same time, because the current viewing software does not view the content behind the 0xFFD9 markup code, the security and concealment of our information is guaranteed. Only when we need to query and parse the information behind 0xFFD9 can we get the correct image correspondence. Text information. This kind of information not only helps us to accurately describe the specific details of the image, but also can be used as a retrieval condition to retrieve the corresponding image file by retrieving the structured text information.

As shown in FIG. 3 and FIG. 4, the video stream structuring method in the present invention encapsulates the original video stream data packet, and the new data packet is a video stream data packet and standard sensor data received by the Internet of Things sensor or The information such as manually input text information is merged, and the data packet length, video stream data packet, other information length, and other information are combined into a new data packet for transmission. The format definition of the new data packet is shown in Table 2. After receiving the data packet, the server parses it to generate the corresponding video file and its description file. The video file is structured and encapsulated in this way.

The structured video image information device is configured to collect sensor information and video information, and suffix the encoded video information and the encrypted sensor information into a JPEG image file or form a corresponding description file for the video file to form a standard data stream. Output according to standard communication protocols (usually TCP/IP).

Figure 3 is a macro perspective view of structured video information. Macroscopically, a description file for video files is added, and description files and video files are associated as two parts of structured information. A textual supplement to a single video image is created by creating its corresponding profile for the video file for each time period. Make the abstract video picture more specific and more substantial.

Figure 4 is a schematic diagram of the microscopic angle of structured video information. Microscopically, the description file records the location of the video capture, the absolute time of the shot, and the relative time at which the video file began, as well as the textual information generated during the capture. In order to ensure the security of the content in the textual information, the information can be encrypted and stored, and needs to be viewed. It can be viewed by decrypting it with a private key.

The image file storage and retrieval method of the present invention obtains the length of the text information by parsing the 4-byte data behind the JPEG file tail 0xFFD9, and then extracts all the text information according to the length, and compares with the search condition, if the conditions are met Explain that this JPEG file is the image file we are looking for, otherwise we will compare the next file. In the process of comparison, in order to improve the efficiency of retrieval and reduce the number of comparisons, the principle of “B+tree” can be used to store segments according to the content of text information in the process of storage, and the folder where the conditions are located is first searched during retrieval. And then search in a subfolder to reduce the number of files viewed.

Figure 5 is a schematic diagram of image file storage (taking temperature as an example). The image file is "B+tree" for intelligent classification and storage. When searching, first determine which temperature segment the retrieval condition is in, and enter the corresponding directory for retrieval. . When the subdirectory is retrieved, the image that meets the criteria is output.

The video storage and retrieval method of the present invention uses the database "B + tree" idea to construct an index file for each field in the description file, and divides the index file into "engineering", "installation location", "year", "month", "day" At several levels, the received text information is stored in several levels. Search from the storage directory at the time of retrieval, enter the corresponding project directory, view the index files of all levels to select the qualified directory, put it into the retrieval queue, and obtain the next-level search file from the retrieval queue after the retrieval of the retrieval directory at this level. The path is retrieved until the corresponding profile location is retrieved. After the description file is retrieved, the corresponding video file can be obtained.

Figure 6 is a schematic diagram of multi-index video file storage. Index files at various levels record the number of devices, the number of files, conditional flags, and so on. The index file is generated by "bottom-up". After receiving the text information, the index file is updated upwards in turn.

1. "Engineering Index File" records the number of devices included in the current project and the installation location of each device, project creation time and deadline, project file storage directory and number of files stored.

2. "Location index file" records the start time and stop time of the current device shooting, the number of video files captured, the storage address of the "year value" directory, and the text information identification and data content generated during the shooting of the device. Wait.

3. The “Annual Index File” records the month in which the device was used normally, the number of video files captured, the storage address of the “monthly value” directory, and the textual information identifier and its data content generated during the shooting of the device.

4. The “monthly index file” records the number of days the device is used normally in the month, the number of video files captured, the storage address of the “date” directory, and the text information identifier and its data content generated during the shooting of the device.

5. The "date index file" records the number of video files taken on the day of the device, the file name of each video file and the file name of the description file, and the text information identifier and its data content contained in each video file.

Figure 7 is a video image file retrieval process. The detailed search process is as follows:

(1) Retrieve the index file under the project directory, check the device IP of the search condition, and enter the qualified IP into the device queue. If there is no device that meets the conditions, the search will be directly exited.

(2) Dequeue the IPs in the device queue, enter the device directory search, find the annual value index file, and enter the monthly value queue for the matching month.

(3) Dequeue the month in the monthly value queue, enter the corresponding month directory search, find the monthly value index file, and enter the matching date into the date queue.

(4) Dequeue the date in the date queue, enter the corresponding date directory, find the matching file, and output the file name. Repeat step 4 until the date queue is empty.

(5) Repeat step 3 until the month value queue is empty.

(6) Repeat 2 steps until the device queue is empty and the retrieval is completed.

The above description of the specific embodiments of the present invention has been described with reference to the accompanying drawings, but it is not intended to limit the scope of the present invention. Those skilled in the art should understand that the skilled in the art does not require the creative work on the basis of the technical solutions of the present invention. Various modifications or variations that can be made are still within the scope of the invention.

Claims

An apparatus for constructing structured video image information, comprising: a CCD/CMOS image sensor module, a wireless sensor network receiving module, a video processor, a CPU processor, an information fusion module, and an Ethernet/WiFi interface;

The CCD/CMOS image sensor module is connected to a video processor, and the wireless sensor network receiving module is connected to the CPU processor, and the outputs of the video processor and the CPU processor are respectively connected to the information fusion module, and the information fusion module and the Ethernet Network/WiFi interface connection;

The video processor is used to complete the encoding function of the video image, and the wireless sensor network receiving module is used to complete the text description information or the binary data receiving or directly connect the temperature, humidity, illuminance, pressure standard sensor, digitize the analog information; and the video image information Conformed with other data to form structured video image information.
The device for constructing structured video image information according to claim 1, wherein the wireless sensor network module uses the ISM frequency band for data transmission, and has a standard sensor input interface, and supports 0-5v, 0-10v. Access to standard sensor signals.
A method of constructing a structured video image information device according to claim 1, comprising:

(1) constructing a structured JPEG image file;

(2) constructing structured video image information;

(3) storing structured JPEG image files and video image information respectively;

(4) Retrieving structured JPEG image files and video image information, respectively.
The method for constructing a structured video image information device according to claim 3, wherein the specific method of the step (1) is:

The text information, the standard sensor data, and the voice-recognition data information are encrypted and attached to the original JPEG file, and the information is added after the JPEG file mark code, and the original unstructured JPEG image file is constructed into a structured JPEG file; The text information portion is decrypted and displayed during parsing, and the image information is not affected at all.
A method for constructing a structured video image information device according to claim 3, wherein said mark code is composed of two bytes, the previous byte is fixed to 0xFF for the start of the mark code, and the latter byte has a different value. Representing different meanings;

In the image parsing process, the file is parsed from 0xFFD8, and the parsing ends at 0xFFD9.
The method for constructing a structured video image information device according to claim 3, wherein the specific method of the step (2) is:

The description file and the video file are associated as two parts of the structured information, and a corresponding description file is created for the video file of each time period to supplement the text of the single video image;

Encapsulate the original video stream data packet, and fuse the original video stream data packet with the standard sensor data received by the Internet of Things sensor or the manually input text information, and the data packet length, video stream data packet, and other information length. And Other information is merged into new packets for transmission.
The method for constructing a structured video image information device according to claim 3, wherein in the step (3), the method for storing the structured JPEG image file is:

In the process of storage, segmentation or sub-folder storage according to the content of the text information.
The method for constructing a structured video image information device according to claim 3, wherein in the step (3), the method for storing the structured video image information is:

An index file is constructed for each field of the description file in the structured video image information, and the index file is divided into several levels, and the received text information is stored according to the content of the corresponding level.
The method for constructing a structured video image information device according to claim 3, wherein in the step (4), the method for searching the structured JPEG image file is:

When searching, first locate the folder where the condition is located, and then search in the subfolder;

The text information length is obtained by parsing the JPEG file data, and then all the text information is extracted according to the length, and compared with the search condition, if the search condition is met, the JPEG file is the image file we are looking for, otherwise it is performed. The comparison of the next file.
The method for constructing a structured video image information device according to claim 3, wherein in the step (4), the method for searching the structured video image information is:

Search from the storage directory, enter the corresponding first-level search directory, view the index files of all levels to select the qualified directory, put it into the search queue, and obtain the next-level search file from the search queue after the search directory of the level is completed. The path is searched until the corresponding description file location is retrieved; after the description file is retrieved, the corresponding video file is obtained.