CN112051853A

CN112051853A - Intelligent obstacle avoidance system and method based on machine vision

Info

Publication number: CN112051853A
Application number: CN202010986659.4A
Authority: CN
Inventors: 谢金宝; 李紫玉; 殷楠楠; 林木深; 赵楠; 陈小威; 李双庆
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2020-09-18
Filing date: 2020-09-18
Publication date: 2020-12-08
Anticipated expiration: 2040-09-18
Also published as: CN112051853B

Abstract

The invention relates to an intelligent obstacle avoidance system and method based on machine vision, in particular to an intelligent obstacle avoidance suitcase (trolley) based on machine vision, which belongs to the technical field of intelligent robots and aims to solve the problems that an intelligent following suitcase cannot avoid obstacles urgently and an automatic walking process is inconvenient and not intelligent enough in the prior art; the machine hardware drive comprises a singlechip, a drive plate and a camera; the software data processing comprises an image acquisition module, a camera calibration module, an image processing module and a machine vision realization module, the automatic following function is realized by the image processing automatic identification target, the automatic obstacle avoidance and obstacle identification are realized, the intelligent travel suitcase (trolley) is more convenient, and the burden of people in traveling is greatly reduced.

Description

Intelligent obstacle avoidance system and method based on machine vision

Technical Field

The invention relates to an intelligent obstacle avoidance system and method based on machine vision, in particular to an intelligent obstacle avoidance suitcase (trolley) based on machine vision, and belongs to the technical field of intelligent robots.

Background

People research and manufacture mobile robots, and are used for replacing human beings to carry out boring or dangerous work such as cargo handling, sanitation, military investigation, dangerous environment monitoring and the like. The intelligent unmanned automobile leading to industry perplexing is a mechatronic control system integrating functions of environmental monitoring, path planning, automatic control, auxiliary driving and the like.

The mobile robot meets obstacles to be separated from a working state in the traveling process, and the obstacles comprise static objects such as raw materials, workpieces, mechanical equipment, buildings and the like, and possibly human activities randomly appearing on a traveling road of the mobile robot. If the robot cannot be avoided in time, the working state of the mobile robot is influenced, and the loss of casualties or damage to the mobile robot is seriously caused, so that the irretrievable loss is caused. Therefore, the method can be used for researching how the robot detects and identifies the obstacles and reasonably planning the route, and the obstacle avoidance is an important content in the field of design and manufacture of the mobile robot.

The current market is a relatively popular suitcase carrying people, the weight of the reinforced suitcase is up to 15 pounds, and the reinforced suitcase has a small battery, so that the suitcase can carry two adults to run for 37 miles after being charged once. The entire construction includes an electric scooter on which the luggage case is mounted. In addition, the system is provided with a GPS system and an anti-theft alarm system. The design is mainly different from the design in that the weight is light, the structure is simple, and the manned ability is not provided. Therefore, compared with the multifunctionality of the product, the portable function is mainly achieved, the problems of multiple quantity and high energy consumption caused by multiple functions are solved, and personal safety is guaranteed more in comparison. In addition, the safety and the convenient operation of pedestrians on the road are considered, the walking speed of normal people is set for the product, and the working time of the product is greatly prolonged due to low speed and low energy consumption.

A novel portable traveling case is similar to the existing riding traveling case on the market. The main differences are simple use, simple function, energy conservation and environmental protection. Mainly aims at the problem that the luggage case is difficult to carry when being too heavy. The design uses electricity as power, and the storage battery is arranged in the travel suitcase, so that the travel suitcase can automatically travel, the direction of the travel suitcase is controlled by hands, the travel suitcase can automatically track the owner, and the operation is simple.

In recent two years, many companies have started to develop and manufacture intelligent follow-up traveling cases which can follow the owner of the traveling case at any time. The intelligent traveling case has relatively few domestic and foreign researches on the aspect of intelligent traveling cases, and in addition, the improvement design of the traveling case is focused on the aspects of case body materials, personalized design and the like, so that a great space is also provided for improving the intellectualization of the traveling case.

Disclosure of Invention

In order to solve the problems that an intelligent following suitcase cannot avoid an obstacle urgently and an automatic walking process is inconvenient and not intelligent enough in the prior art, the invention provides an intelligent obstacle avoiding system and method based on machine vision;

the specific scheme of the invention is as follows:

the first scheme is as follows: the intelligent obstacle avoidance system based on machine vision is divided into two parts, namely machine hardware driving and software data processing;

the machine hardware drive comprises a singlechip, a drive plate and a camera;

the single chip microcomputer is responsible for running a software program so as to control the drive board to issue a drive command; the driving board is responsible for circuit connection and signal transmission of all configurations of the whole system; the camera is responsible for collecting image information;

the software data processing comprises an image acquisition module, a camera calibration module, an image processing module and a machine vision realization module

The image acquisition module is responsible for acquiring the image of the camera and establishing a coordinate system;

the camera calibration module is responsible for calibrating a coordinate system according to camera parameters;

the image processing module is used for eliminating image noise, and extracting image characteristics and detecting edges;

and the machine vision realization module is responsible for processing the calibrated three-dimensional image and realizing intelligent obstacle avoidance.

Furthermore, the single chip microcomputer is a raspberry pi 3B + single chip microcomputer and is connected to the driving plate through a circuit pin;

the driving board is an L298N driving integrated circuit board;

the camera is provided with at least two cameras and is arranged on the driving plate together.

Scheme II: the intelligent obstacle avoidance method based on the machine vision is realized based on the system, and the specific method comprises the following steps:

firstly, performing machine binocular vision distance measurement and three-dimensional reconstruction through the camera calibration module, and establishing 4 coordinate systems by using a camera imaging model;

secondly, the camera calibration module establishes a camera imaging model through images shot by the camera, simplifies optical imaging, calibrates the camera and completes the establishment of an objective world coordinate system in a camera image plane coordinate system;

step three, the image processing module is used for removing image noise, extracting image characteristics and detecting image edges;

and fourthly, realizing pre-processing image filtering through the machine vision realization module, and filtering again after forming a disparity map through stereo matching to finally realize intelligent obstacle avoidance.

Further, in the process of establishing the coordinate system by the camera calibration module in the step one, the following steps are adopted:

establishing an objective world coordinate system, representing left and right coordinates by an X axis, representing up and down coordinates by a Y axis, representing a depth coordinate by a Z axis, representing a distance by a coordinate system in the real world, and defining the actual position of an object shot by a camera through the objective world coordinate system;

establishing a camera coordinate system, defining the optical center of a camera lens as a coordinate origin, representing the transverse direction by an X axis, representing the longitudinal direction by a Y axis, and establishing the camera coordinate system by taking the optical axis of the camera as a Z axis;

establishing an image plane coordinate system, representing two-dimensional coordinates of a picture shot by a camera by using x 'y' as a coordinate axis, and representing the two-dimensional coordinates by using x 'y' as a coordinate axis, wherein the two-dimensional coordinates are planar projection of an objective world coordinate system on a camera coordinate system, and the image plane coordinate system and the camera coordinate system are parallel and are respectively superposed;

step four, establishing a computer image coordinate system, namely a coordinate system used by the digital image in the computer;

the image shot by the camera is converted into a digital image after being processed, the digital image is stored in a computer, and the coordinate of an image plane is converted in a translation mode and an image rotation mode, namely coordinate translation transformation and rotation transformation, so that the image coordinate of the computer is formed.

Further, the specific matrix transformation mode of the coordinate translation transformation and the rotation transformation is as follows:

mode a, translation matrix:

by amount of translation (X)₀,Y₀,Z₀) Translating the point with coordinates (X, Y, Z) to a new position, the matrix being:

mode b, rotation matrix:

and (3) setting a point A in the coordinate system to rotate clockwise around the Z axis by an angle of gamma from the point A to a point A', wherein the matrix is as follows:

the xy plane of a camera coordinate system and an image plane coordinate are coincided along an optical axis Z axis, a central origin of the image plane coordinate system is coincided with an optical center of a lens, the coordinates are (0,0, f), f is the focal length of the lens, and if the camera coordinate system X, Y and Z are parallel to a world coordinate system X, Y and Z, the (X, Y and Z) are the world coordinates of a midpoint W in a three-dimensional space, the coordinate system is obtained by transforming by using a similar triangle formula:

in the two formulas, after a minus sign is added in front of X and Y, the image coordinate point inversion is realized, so that the image plane coordinates of the three-dimensional object after the projection of the camera can be obtained:

by solving the inverse of the above equation, the plane coordinates are obtained:

further, the camera calibration module in the second step establishes a camera imaging model for the camera, and the specific calibration steps are as follows:

establishing a plane network calibration template, taking pictures of the calibration template by using a double-lens synchronous camera, and taking pictures of the template from different angles by using a camera so as to acquire images;

secondly, acquiring a calibration control point from the image, processing the acquired image, finding out an image plane coordinate of the characteristic point of the calibration template, and establishing a relation between a world coordinate and a camera coordinate by using a geometric coordinate of a corresponding point;

step two, external and internal parameters of the camera are obtained through the relation of a coordinate system, an international chessboard is used as a calibration template, relative coordinates of each point on the chessboard can be obtained through measurement, then the calibration template is shot by a double-lens synchronous camera from different angles, a plurality of paired calibration template layouts are obtained, collected images are processed, coordinates of black and white grid intersection points in the images are detected, and the coordinates of the intersection points are calculated, so that the parameters of the camera can be obtained;

step two, the deflection coefficient is calculated through a minimum variance formula, the final iteration result is calculated according to the nonlinear programming, the deflection coefficient and the distortion coefficient are obtained, the focusing of a camera is required in the calibration process, the chessboard occupation ratio is ensured to be at least over 50 percent in the image, the most accurate calibration result is obtained when the number of the templates is 16,

step two and five, according to the camera model established above, using m ═ u, v for the coordinate point vector of the two-dimensional image in the computer image coordinate system]^TIn this case, a three-dimensional coordinate point in the world coordinate system is represented by M ═ X, Y, Z]^TWherein the homogeneous variables are represented in the form of

Obtaining the following according to the camera imaging model:

wherein [ R, T ] is an external reference matrix, and consists of the translation matrix and the rotation matrix; k is an internal reference matrix expressed in the form of:

and calculating the above formula to obtain the internal and external parameters of the camera.

Further, the image processing module described in step three includes image noise removal, image feature extraction and image edge detection, and the specific steps are as follows:

step three, a filtering mode corresponding to the removal of the image noise adopts median filtering, and a specific algorithm is as follows:

firstly, a small window G is taken from an image, in the window G, the gray value of an image pixel point (x, y) is represented by f (x, y), and then a median filter of a filter window is defined as:

f(x,y)＝MED{f(x,y)}(x,y)∈G

step two, feature extraction is to establish matching operation on each pixel point, the pixel points meeting the requirements of a feature equation are uniformly extracted, the features of the pixel points are determined, the camera is calibrated, black and white cross points of a chessboard in an image are identified, the points are extracted by using the feature equation and are stored in different subsets according to different attributes of the points;

thirdly, detecting and cutting the image edge according to the judgment basis of the image edge, wherein the judgment basis of the image edge is as follows:

(1) the step-shaped edge shows that the gray value between two adjacent pixels is changed suddenly and is increased or reduced suddenly;

(2) a roof-shaped edge, which is represented by a phenomenon that pixel gray values in a region gradually rise and gradually fall;

(3) linear edges, which are characterized by a transition of the pixel gray values from one level to another and then back to the original level, in a region.

Further, the machine vision implementation module described in step four includes an Opencv BM algorithm, and the step of processing the calibrated stereo image is as follows:

step four, the two calibrated and corrected images are subjected to parallel calculation and then are grayed, so that the brightness of the images is normalized, the texture of the images is enhanced, and comparison errors caused by different brightness are prevented, wherein the algorithm is as follows:

performing Sobel operation in the horizontal axis direction, reinforcing image textures in the horizontal direction, simultaneously performing normalization calculation on the brightness and contrast of the two images, eliminating image brightness difference caused by brightness error caused by different parameter settings and shooting angles of the two cameras, moving a window to traverse the whole image, finally obtaining two filtered images, and then using the two filtered images for next step matching;

step four, stereo matching, namely performing multi-path parallel calculation along the horizontal polar line of the image, and performing matching search by using an SAD algorithm to generate a disparity map, an absolute error sum algorithm, wherein the specific algorithm is as follows:

assuming that A (x, y) is a reference map of size M N, B (x, y) is an alignment map of M N, and A > B, a region matching (B) is found in (a),

in the reference graph A, taking the (i, j) as the upper left corner, taking the sub-graphs with the size of M × N, matching the sub-graphs with the alignment graph B, comparing the sub-graphs from the upper left corner to the lower right corner of the reference graph frame by frame, and if one sub-graph can be found out from all the sub-graphs, the sub-graph is similar to the alignment graph B, and the similarity of the sub-graph and the alignment graph B is greater than a matching value, the matching is successful;

the smaller the absolute error sum D (i, j) is, the more similar the sub-graph and the alignment graph is, the smaller the absolute error sum D (i, j) is, the whole reference graph is traversed to find the smallest D (i, j) to obtain the matched sub-graph position, and the matching value measure formula of the SAD algorithm is as follows:

for the features of each left image, after finding the best matching correction for the right image, the matching position of the right image is collinear with the left image, and if there are enough texture characteristics to detect, in the camera view located on the right side, the distance of the matching position and the parallax matching position relative to the left camera can be found;

step three, re-filtering is another filtering process after stereo matching, when the sum of absolute errors is smaller than a set threshold value, an effective matching point is calculated, when the sum of absolute errors is too large, matching of the point is considered to fail, virtual matching is prevented, and when partial areas cannot be matched due to distance, a matching invalid value is output to indicate that the distance is too far;

after obtaining the parallax, the distance of the objective world is obtained according to the relation between the parallax and the depth, and in order to obtain the distance of the objective world, a relation formula between the parallax and the objective depth is obtained according to the geometric relation of parallel binocular vision, and the relation formula is as follows:

depth＝(f*baseline)/disp

in the above formula, depth represents the objective world depth; f represents the normalized focal length; baseline is the distance between the optical centers of the two cameras, called the baseline distance; disp is the image disparity value, from which the guest world depth value is calculated from a known quantity.

The invention has the beneficial effects that:

1. and the image processing automatically identifies the target to realize an automatic following function. And (4) automatically avoiding obstacles and identifying obstacles.

2. The project realizes an offline detection and online learning module, firstly, a target is selected artificially, and data parameters of a certain number of tracked images are stored as offline training samples through TLD calculation. When the trolley power supply is started, the TLD algorithm is run through the script, and the parameters of the first frame of image after offline training are automatically read to initialize the TLD system. Only the nearest neighbor image slice and the variance value are stored, so that the occupied space of the model is reduced, and the time for storing and loading the target model is greatly shortened.

3, the project takes out the least representative positive image slices and negative image slices from the model by using a time and reliability writing algorithm, and limits the number of the positive image slices and the negative image slices to be 20, thereby reducing the image slices in the target model for nearest neighbor calculation and matching, and reducing the calculation amount and the occupied running memory.

4. According to the method, the image frames far away from the target are excluded through a distance clustering algorithm, the image frames near the target are clustered, a new target frame is synthesized, and the nearest neighbor is used for reevaluating, so that the classification effect of the classifier is more accurate.

5. The project uses the CUDA multithreading parallel computing technology to carry out algorithm optimization, and the original serial computing algorithm on the CPU is rewritten into parallel computing, so that the running speed of the algorithm is improved, and the running time is shortened.

Drawings

FIG. 1 is a schematic diagram of the overall system software flow;

FIG. 2 is a software flow chart of the intelligent follow-up traveling case control system;

FIG. 3 is a flow chart of camera image calibration;

FIG. 4 is a model diagram of a projection imaging coordinate system;

FIG. 5 is a schematic diagram of a checkerboard calibration template;

FIG. 6 is a flow chart of median filtering;

fig. 7 is an image stereo matching correspondence diagram;

FIG. 8 is a schematic diagram of virtual map dynamic obstacle avoidance;

FIG. 9 is an overall flow chart of a software system algorithm;

FIG. 10 is a schematic TLD algorithm flow diagram;

FIG. 11 is a schematic diagram of a virtual machine server network environment;

fig. 12 is a schematic diagram of a tk1 network client network environment.

Detailed Description

The embodiments of the present invention are described with reference to fig. 1 to 12, and the specific implementation of the intelligent obstacle avoidance system and method based on machine vision is as follows:

the first embodiment is as follows: the system comprises a machine hardware drive and a software data processing part, wherein the machine hardware drive comprises a single chip microcomputer, a drive plate and a camera, and the software data processing part comprises an image acquisition module, a camera calibration module, an image processing module and a machine vision realization module.

The single chip microcomputer is a raspberry pi 3B + single chip microcomputer, the cameras are provided with at least two cameras, and the single chip microcomputer and the cameras are connected to the L298N driving integrated circuit board through circuit pins.

The second embodiment is as follows: based on the system configuration described in the first embodiment, the specific implementation method of the intelligent obstacle avoidance system based on the machine vision is as follows:

firstly, performing binocular vision distance measurement and three-dimensional reconstruction of a machine through a camera calibration module, and establishing 4 coordinate systems by using a camera imaging model;

step two, the camera calibration module simplifies optical imaging of the camera by establishing a camera imaging model, and then calibrates the camera to complete establishment of an objective world coordinate system in a camera image plane coordinate system;

thirdly, removing image noise, extracting image characteristics and detecting image edges by using an image processing module;

and step four, the machine vision implementation module implements preprocessing image filtering, and performs filtering again after forming a disparity map by stereo matching to finally implement intelligent obstacle avoidance.

The third concrete implementation mode: in addition to the implementation method described in the first step of the second embodiment, the implementation method can be further refined into a process of establishing a coordinate system and matrix transformation, and the implementation steps are as follows:

for binocular vision distance measurement and three-dimensional reconstruction of a machine, firstly, a camera imaging model must be researched, and the camera imaging model used by people generally has 4 coordinate systems:

(1) objective world coordinate system: the coordinate system in the real world (X-axis for abscissa, i.e. left and right, Y-axis for ordinate, i.e. up and down, and Z-axis for depth coordinate, i.e. distance) expresses the actual position of the object in the objective world.

(2) Camera coordinate system: the optical center of the camera lens is set as the coordinate origin, the transverse direction is still represented by an X axis, the longitudinal direction is represented by a Y axis, and the optical axis (namely the focal length or the depth of field) of the camera is taken as a Z axis, so that the coordinate system of the camera is established.

(3) Image plane coordinate system: is the two-dimensional coordinates (represented by x 'y' as the coordinate axis) of the picture taken by the camera. It is a planar projection of the objective world coordinate system onto the camera coordinate system. The image plane coordinate system and the camera coordinate system are parallel and respectively superposed.

(4) Computer image coordinate system: coordinate system for digital images inside a computer.

The pictures shot by the camera are converted into digital images after being processed, and the image plane coordinates are converted into computer image coordinates through translation, image rotation, scale transformation and the like in order to store the digital images in a computer.

Coordinate translation transformation is used in the design, and a matrix expression form is used as follows:

a. translation matrix

By amount of translation (X)₀,Y₀,Z₀) The point with coordinates (X, Y, Z) is translated to a new position. The matrix form is:

b. rotation matrix

And setting a point A in the coordinate system to rotate clockwise around the Z axis by an angle gamma. Turning from point A to point A', the matrix is:

the formula of the relationship between the spatial point and its imaging point in the camera is shown in fig. 4, and the xy plane and the imaging plane coordinates of the camera coordinate system coincide along the optical axis Z axis. The central origin of the image plane coordinate system coincides with the optical center of the lens, the coordinates are (0,0, f), and f is the focal length of the lens. We assume that the camera coordinate system X, Y, Z is parallel to the world coordinate system X, Y, Z. Let (X, Y, Z) be the world coordinate of the midpoint W in the three-dimensional space, and be transformed using the similar triangle formula:

in the formulas (3-3) and (3-4), image dot inversion is realized after adding a "-" sign in front of X and Y. The image plane coordinates of the three-dimensional object after the camera projection can be obtained by the formulas (3-3) and (3-4):

by solving the inverse of equations (3-5) and (3-6) above, one can obtain:

the fourth concrete implementation mode: in addition to the implementation method described in the second step of the second embodiment, the implementation step can be further subdivided into: the internal and external parameters of each camera are different, the camera parameters must be known in advance in the relation model, the image distortion can be caused by inaccurate camera parameters, the camera parameters can be calculated by the embodiment, and the process of obtaining the camera parameters is called camera calibration.

The camera has different geometric parameters and imaging models, and optical imaging is simplified by establishing the camera imaging model. The camera is then calibrated to determine the method of representation of the objective world coordinate system in the camera image plane coordinate system.

A calibration process:

(1) setting a calibration template, photographing the calibration template by using a double-lens synchronous camera, and acquiring an image;

(2) processing the collected image to find out the image plane coordinates of the characteristic points of the calibration template;

(3) and establishing the relation between the world coordinate and the camera coordinate by using the geometric coordinate of the corresponding point, so as to obtain the external and internal parameters of the camera. The calibration of the camera is an important step for establishing a machine binocular vision system, and if the calibration is not standard, the error is increased, and the precision and the accuracy of three-dimensional reconstruction are influenced.

The conversion of world coordinates and camera coordinates can be realized by calibrating, calculating and obtaining the internal and external parameters of the camera and combining the binocular parallax principle to calculate. This embodiment uses Zhangyingyou scaling method, and it has the advantage that the requirement is not high to the equipment and the operation is got up comparatively easily, and the precision is higher, stability is better. For simplicity, we use the international chessboard as the calibration template, as shown in fig. 5, and the relative coordinates of each point on the chessboard can be obtained by measurement. Then, the calibration templates are shot from different angles by a double-lens synchronous camera to obtain a plurality of paired calibration template images. And processing the acquired image to detect the coordinates of the intersection points of the black square and the white square in the image. The coordinates of these intersection points are calculated, and the parameters of the camera can be obtained. Then the deflection coefficient is calculated by the minimum variance formula. And finally, solving a final iteration result according to the nonlinear programming to obtain a deflection coefficient and a distortion coefficient. During the calibration process, attention is paid to focusing of the camera, and the chessboard ratio is ensured to exceed 50% at least in the image, according to the embodiment, the most accurate calibration result is obtained when the template amount is 16 groups, and the calibration flow is shown in fig. 3.

According to the camera model established above, the two-dimensional image coordinate point vector in the computer image coordinate system is represented by m ═ u, v]^TIn this case, a three-dimensional coordinate point in the world coordinate system is represented by M ═ X, Y, Z]^T. Their homogeneous variable representations are respectively

According to the camera imaging model, the following can be obtained:

wherein [ R, T ] is an external reference matrix, consisting of the translation matrix and the rotation matrix mentioned above; k is an internal reference matrix expressed in the form of:

The fifth concrete implementation mode: the image processing means that in a computer, an image is a pixel sequence composed of pixel values, and the pixel values have different color expressions due to different color systems. The image processing means that a computer performs calculation processing on a pixel sequence. The image processing technology can be used for eliminating image noise, enhancing image quality, enhancing characteristic information, or segmenting and splicing images and the like; the image processing process in this embodiment includes the following three parts:

A. denoising an image:

in general, due to factors such as camera quality and quality of an imaging element, there is some information or useless information that interferes with original image information during image capturing, and the interfering information and the useless information are referred to as noise. Noise is generated during the process of digitizing or transmitting an image, and it is necessary to remove the noise from the image in order to prevent the original image information from being damaged or to remove unnecessary information. In digital image processing, the noise mainly involved is gaussian noise and salt and pepper noise, and the corresponding filtering modes are wiener filtering and median filtering:

(1) the wiener filtering is to adjust the optimal filter according to the local variance of the image, and compared with other filtering modes, the wiener filtering can best keep the edge part of the image and has obvious effect on the filtering effect of Gaussian noise and white noise. Because the algorithm is complex, the processing time for the image is also long.

(2) Median filtering, which is a nonlinear smoothing filter, is developed from mean filtering. Therefore, the filtering mode of the method is to keep the median part of the image, remove the overlarge pixel value and the undersize pixel value, and have obvious filtering effect on salt and pepper noise. The method has the characteristics of simple and efficient algorithm. The binocular camera generates a lot of salt and pepper noises due to the quality of the imaging element, and medium filtering is adopted to remove noise aiming at the salt and pepper noises in the embodiment. The median filtering algorithm is as follows:

firstly, a small window G is taken from an image, in the window G, the gray value of an image pixel point (x, y) is represented by f (x, y), and then a median filter of a filtering window can be defined as:

f(x,y)＝MED{f(x,y)}(x,y)∈G

the median filtering procedure and effect are shown in fig. 6.

B. Feature extraction:

in the image, some points or regions have some characteristics, which have special significance in image processing and analysis, and when image information is analyzed, some pixel points, continuous regions or object edges in the image can be often found to be distinguished from other parts in the image, and these pixel points, regions and edges are called image characteristics. The characteristic extraction refers to establishing a characteristic equation, performing matching operation on each pixel point, uniformly extracting the pixel points meeting the requirement of the characteristic equation, and determining the characteristic of the pixel points. The feature extraction method is used when the camera is calibrated and a disparity map is established. When the camera is calibrated, black and white intersection points of a chessboard in an image are identified, characteristic equations are utilized to extract the points, and the points are stored in different subsets according to different attributes of the points. Commonly used image features are: color features, texture features, shape features, spatial relationship features.

(1) Color characteristics: reflecting the different color characteristics of the scene in the image. Generally, when image processing is performed, color features are expressed as colors of pixel points. When the local features of the object are determined, feature extraction is often failed due to the fact that the same region of the image is approximate in color and small in area. Meanwhile, when the color of the image is complex and the amount of pixel data is large, false detection also occurs. A color histogram is generally used as an expression method of color features. The method has the advantages of being free from the influence of changes such as image movement, rotation, scaling and the like, and has the defects that space color position information cannot be expressed, and the specific characteristics of specific objects in the image cannot be analyzed by using the color characteristics alone.

(2) Texture characteristics: textural features are a feature used to describe the surface properties of a scene. The surface texture of the object in the objective world can be expressed by the characteristics of height, flatness, lines and the like of the surface of the object. The texture features have similar properties to the color features and are not easily affected by motion. However, since the texture feature has a large influence on illumination, distortion, and the like, it is not generally considered as a study target in the detection of an obstacle.

(3) Shape characteristics: the shape features are important features used in binocular vision disorder recognition, and can help to restore a stereoscopic model of the barrier. The target area in the image can be effectively searched for the obstacle based on the shape characteristics. Shape features are generally represented by edges and regions. The edge reflects the outline information of the object in the image, which plays an important role in the detection of the object type information, and the area reflects the size of the object in the image. The combination of the two can reflect the specific appearance of the object in the image. And by combining binocular stereo vision, the shape characteristics can restore the stereo model of the object, so that the machine vision reaches the degree similar to human eyes. The method can reflect high-level information of the image, but has slow reaction time due to large calculation amount, and has larger calculation amount for complex environment, occupies memory and increases the difficulty of feature extraction. And the characteristic quantity changes along with the rotation of the object. The current algorithms have not been applied perfectly.

(4) Spatial relationship characteristics: in binocular vision, the spatial relationship features can express the geographical positions of a plurality of obstacles and the relative position relationship between the obstacles, and obstacle positioning and distance information is provided for a subsequent obstacle avoidance strategy. The absolute spatial position refers to that in a spatial coordinate system, the precise coordinates of each target object are determined, and the spatial relationship between the coordinates is calculated. Relative spatial position refers to the relative orientation between the object and the object. The spatial relationship feature has a description distinguishing function for enhancing the image content, but when the image is rotated, enlarged, reduced and the like, the result of the spatial relationship is changed, and the stability is lacked. Therefore, in practical applications, when the spatial relationship features are used for expressing the image features, the spatial relationship features also need to be used together with other features. In the binocular visual impairment detection system, the spatial relationship features are easily derived because binocular vision can provide depth information. In monocular systems, however, this feature becomes very difficult to extract. The spatial relationship features also have the disadvantage of instability, and the purpose requirement of feature extraction can be met by cooperating with other feature relationships.

C. Edge detection

The image edge is also the most important information in the image, the edge has the characteristic of not changing with the change of an observation visual angle, the shape characteristic of an object can be calculated by utilizing the edge, the image is segmented, and an interested region is extracted, so that the image edge can be used as a precondition for image segmentation and is also a necessary condition in machine learning. In a commonly used recognition mode, an edge is represented in the image in the form of a line, and when the gray values of two sides of the line are different greatly, or there is a gray transition process, in this case, the edge is generally detected. In binocular stereo matching, edge information can be used as a good feature point. The process of edge detection comprises filtering, enhancing, detecting and positioning.

Common image edges are:

(1) step-shaped edge, which shows that the gray value between two adjacent pixels has jump change and is suddenly increased or decreased;

(2) roof-shaped edge, which is represented by the phenomenon that the gray value of pixels in one area gradually rises and gradually falls;

(3) linear edge-it is shown that in a region, the gray value of a pixel jumps from one level first and then returns to the original level.

In this case, the image needs to be segmented by edge features, and then the interested part needs to be extracted. This process is called image segmentation. There are three main methods for image segmentation according to the region of interest: the image segmentation technique is adopted in the present embodiment because the edge segmentation method, the shape segmentation method, and the threshold segmentation method are used.

The sixth specific implementation mode: in the fourth step of the specific implementation, the adopted algorithm is substantially an Opencv BM algorithm, and mainly includes:

(1) preprocessing and filtering, namely performing parallel calculation on the two calibrated and corrected images, graying the images to normalize the brightness of the images and strengthen the texture of the images so as to prevent comparison errors caused by different brightness;

(2) and (3) stereo matching, namely performing multi-path parallel computation along a horizontal epipolar line of the image, and performing matching search by using a SAD algorithm to generate a disparity map.

(3) Filtering again, namely removing bad matching points in a filtering process after the stereo matching is successful, preventing virtual matching and further processing the calibrated stereo image;

the process can be refined as follows:

s1 preprocessing filtering:

firstly, the two calibrated and corrected images are subjected to parallel calculation, and then the images are grayed, so that the brightness of the images is normalized, and the texture of the images is enhanced. The algorithm is as follows:

in the horizontal axis direction, Sobel operation is carried out to strengthen the image texture in the horizontal direction, meanwhile, normalization calculation is carried out on the brightness and the contrast of the two images, and the image brightness difference caused by the brightness error caused by different parameter settings and shooting angles of the two cameras is eliminated (an approximate value of the image brightness gradient is calculated and then normalized). The moving window traverses the entire image. And finally, obtaining two filtered images, and then using the two filtered images for next matching.

S2 stereo matching:

that is, a parallax map is generated by performing multi-path parallel calculation along the horizontal epipolar line of an image, performing matching search by the SAD algorithm, and calculating the result.

The specific algorithm of the sum of absolute differences algorithm (SAD algorithm for short) is as follows:

let a (x, y) be a reference image of size M × N, B (x, y) be an alignment image of M × N, and a > B, as shown in fig. 7, where left is (a) and right is (B), respectively, the purpose of matching is: finding a region in (a) that matches (b) (shown in block).

Taking an M-N sub-graph in the size with (i, j) as the upper left corner in the reference graph A, and matching the sub-graph with the alignment graph B; and comparing the reference image from the upper left corner to the lower right corner frame by frame, and if one sub image is similar to the alignment image B and the similarity is greater than the matching value, the matching is successful in all the sub images.

The matching value measure formula of the SAD algorithm is as follows. The smaller the absolute error sum D (i, j) is, the more similar the subgraph and the alignment graph is, so that the subgraph position which can be matched can be obtained by traversing the whole reference graph only by finding the smallest D (i, j).

For each left image feature, after finding the best match correction for the right image, then the right picture matching position is collinear with the left picture, and if there are enough detected texture characteristics, in the right camera view, the matching position and the distance of the disparity matching position relative to the left camera can be found.

S3 refiltering:

the method is a filtering process after stereo matching, when the sum of absolute errors is smaller than a set threshold value, an effective matching point is calculated, and when the value of the sum of absolute errors is too large, the point is considered to be failed in matching, and virtual matching is prevented. When the partial regions cannot be matched due to the distance, the output matching invalid value represents that the distance is too far.

S4 obtaining a depth map from the disparity:

by obtaining the parallax, the distance of the objective world can be obtained according to the relation between the parallax and the depth. In order to obtain the distance of the objective world, a relation between parallax and objective depth can be obtained according to a geometric relation (similar triangle) of parallel binocular vision, as follows:

depth＝(f*baseline)/disp

in the above formula, depth represents the objective world depth; f represents the normalized focal length; baseline is the distance between the optical centers of the two cameras, called the baseline distance; disp is the image disparity value. The depth values are calculated as known from the equation below.

The seventh embodiment: in addition to the implementation processes of the first to sixth embodiments, the embodiment can also be implemented by setting an entity as an intelligent following travel suitcase, and by the design of the intelligent obstacle avoidance system and the construction of the existing hardware equipment, the implementation process is as follows:

and (3) selecting a TLD tracking algorithm to realize the detection and tracking of the target, and using python3 programming language to realize the TLD algorithm on an OpenCV software platform for verification analysis.

The software environment is Ubuntu18.04+ opencv 4.0. The server side is located in the local area network and is configured as an Ubuntu 18.0464-bit operating system, intel core i5 and an 8G memory. EAIDK-310 is used as a client, a computer network card is used as a data transmission medium, a transmission port is used as 8000, and the pictures are transmitted to a server in a binary system mode in real time.

The intelligent following travel suitcase has a tracking function and an alarming function, the alarming device is realized through an infrared sensor arranged at the bottom of the travel suitcase, and when the ground clearance of the travel suitcase reaches 20c, an alarming signal is sent out.

The automatic obstacle avoidance function uses the ultrasonic sensor, can perceive the change condition of the nearby environment in real time, and intelligently avoids surrounding obstacles. Even after falling behind in the process of avoiding an obstacle, COWAROBOT R1 accelerates back to your side.

Visual tracking system hardware:

the visual tracking system platform consists of an NVIDIA TK1 embedded platform, a power supply module, an intel 5260 wireless transmission module and a camera module. Fig. 1 shows the overall structure of the intelligent trolley, and because the camera and the complex TLD algorithm are used for real-time acquisition and long-time identification and tracking of images in the design, the calculated amount, the storage amount and the speed have some pressure. For this purpose, NVIDIA TK1 super embedded platform is used. The characteristics of strong CPU computing capacity and GPU parallelization computing are combined, and hybrid programming is adopted to improve the computing capacity of the tracking system and meet the requirement of ensuring the real-time performance of the system. The obtained data is sent to the local area network through the wireless transmission module, so that the monitoring effect is achieved.

Visual tracking system software algorithm architecture:

the software Pigtail of the visual tracking system of the design comprises a tracking algorithm platform and a C/S server platform based on a P2P model. The design system adopts an ubuntu 14.04 system customized by NVIDIA, and aims at the ROS autonomous system to simplify the output of data and logs. The TLD visual tracking algorithm is mainly written based on linux c + + and an Nvidia cuda platform, an opencv library is called to simplify programming, and GPU core parallelization programming is called to improve the computing speed. For a server platform, under a visual system, data under a specified path are read in real time by adopting a genius process, then der coding is carried out, then the data are transmitted to an upper computer, and the data are decoded and displayed after being received by the upper computer.

1) The USB camera: images are acquired from the environment in real time and fed into the tracking system via the opencv programming interface.

2) TLD tracking algorithm: and during the first frame, reading off-line trained parameters and sending the parameters into the system for initialization. And after the second frame and the second frame, tracking results with high robustness are obtained through mutual influence and mutual restraint of tracking, detecting and learning modules.

3) The GPU optimizes the variance classifier of the TLD tracking algorithm: due to the self-limitation of the algorithm, before each frame enters detection, 25 scales of scanning is carried out, nearly 30W of patches are generated, then each patch is required to enter a cascaded detector, and for a first variance classifier, a large amount of application space and frequent calculation are required, so that the characteristics of a dual-core GPU of NVIDIA TK1 are fully utilized to carry out parallel calculation to replace a CPU which has strong calculation capacity and slow data size. The results show that the speed is obviously improved.

4) Through long-time tracking, the data of the template library is abnormally large, and the embedded memory is limited, so that the data size is limited within 30 by adopting a rolling array method. When 30 sample libraries are satisfied, 10 templates are used as a long-term tracking basis, and the newly tracked template data updates the remaining 20 template data from back to front. When the target runs from a complex environment for a long time and suddenly returns to the original environment, the data can be taken out from 10 templates, the target updating of a tagging module is carried out, and the data can be continuously and effectively tracked.

5) Modeling the direction and the rotating speed of the motor: and calculating the moving direction of the motor and the image coordinate zooming-in and zooming-out direction according to the positions of the front frame and the rear frame of the TLD algorithm to calculate the motor speed.

6) A serial port communication module: and according to the tracking result and the modeling result of the TLD algorithm, the tracking result and the modeling result are sent to the bottom stm32 through the serial port to control the motor module.

7) A server client module: and writing a server and a client program based on the P2P model by using a socket network programming technology and through a TCP/IP protocol. And meanwhile, the client server is made into a daemon process, and data is read and written into the designated path in real time.

8) And (3) Der coding: the der coding can ensure the complete transmission of data, break the bottleneck that different platforms are incompatible or the read-write formats are different, and improve the accuracy of transmission. And the client reads the data and then performs der coding transmission, and then the server receives the data, decodes the data in a der mode and displays and outputs the data in real time by using opencv.

The robot trolley system consists of a vision processing system, a bottom trolley control system and a server client model based on a P2P model. The visual processing system provides a tracking system with higher robustness by using a complex and reliable TLD tracking algorithm to replace a simple tracking algorithm, and simultaneously, a CUDA platform of a GPU is adopted to optimize the algorithm, so that the real-time performance of the tracking system is improved. The bottom layer trolley control platform provides moving performance for the trolley, and the Mecanum omni-directional wheel is adopted in the design, so that the trolley is controlled simply and reliably, and system modeling of an upper layer visual tracking part is facilitated. The tracking state of the object can be detected in real time by reading and outputting the tracking state to an upper computer in real time based on the P2P model server client.

In the project, a robot trolley is used for simulating an intelligent following suitcase, and a trolley control system platform consists of a trolley body mechanical frame, a main controller circuit board and a direct current motor driving circuit. The location of the circuitry for the various portions of the platform of the cart control system within the overall cart structure is shown in fig. 4-1. The main controller uses a development board circuit of an STM32F429 singlechip of the Italian Semiconductor (ST) company. The trolley system control platform mainly comprises an input/output (IO) interface circuit for a development board circuit, a power supply circuit, a speed measurement encoder interface circuit, an infrared sensor module interface circuit, interface circuits of four direct current motor driving circuits for driving Mecanum wheels and the like. The direct current motor driving circuit in the design adopts a pulse width modulation driving mode, and particularly uses an H-bridge driving circuit to drive a direct current motor of the four-wheel trolley.

The software design of the trolley control is shown in the flow chart of fig. 2. After the trolley is powered on, the configuration of the external serial port, the timer and the infrared module is initialized. After initialization, the STM32 detects whether an infrared interrupt signal is input. If the signal is available, the trolley moves in a direction with an obstacle, and executes a corresponding program to be far away from the obstacle. When there is no obstacle, STM32 waits for serial port interrupt, if there is interrupt, it indicates that there is command in upper layer. Analyzing the command and judging the state of the trolley to be moved, then setting the moving state of the trolley, simultaneously starting a timer and a PID controller, and interrupting the timer every 10ms to calculate the numerical value output by the PID as the pulse width output of the PWM so as to regulate the speed to reach the target speed. If no serial port command exists, the trolley waits all the time.

The software Pigtail of the visual tracking system of the design comprises a tracking algorithm platform and a C/S server platform based on a P2P model. The design system adopts an ubuntu 14.04 system customized by NVIDIA, and an ROS autonomous system simplifies the output of data and logs. The TLD visual tracking algorithm is mainly written based on linux c + + and an Nvidia cuda platform, an opencv library is called to simplify programming, and GPU core parallelization programming is called to improve the computing speed. For the server platform, under a visual system, data under a specified path are read in real time by adopting a genius process, and then der coding is carried out and the data are transmitted to an upper computer. And the upper computer decodes and displays the received data. A software technology roadmap is shown in figure 9,

the TLD algorithm mainly comprises 3 parts of tracking, detecting and learning, and a flow chart of a running flow chart of the TLD algorithm is shown in figure 1. The target object to be tracked is manually defined in the first frame image to carry out initialization operation on the tracking module and the detection module, 3 modules for tracking, detection and learning are mutually supplemented from the second frame image, and errors are continuously evaluated, so that the effect that the target can still be accurately tracked under the condition of deformation or shielding is achieved.

The tracking module employs a median flow tracking method (mediaflow), which is essentially a pyramid LK optical flow method based on a forward-backward error estimate (forward-backward error). The basic principle is as follows: selecting a plurality of pixel points from a target frame of a previous frame as feature points, searching the corresponding positions of the feature points of the previous frame in a current frame by using a pyramid LK optical flow method, then sequencing the displacement transformation of the feature points between two adjacent frames to obtain a median value of the displacement change, taking the feature points smaller than 50% of the median value as the feature points of the next frame, and proceeding in sequence, thereby realizing the purpose of dynamically updating the feature points. The tracker is used for tracking the motion between continuous frames, the tracker is effective only when an object is visible all the time, the tracker estimates the position of the current frame according to the known position of the object in the previous frame, and meanwhile, a target motion track is generated in the estimation process and is used as a positive sample of the learning module.

The detection module is used for cascading the variance filter, the set classifier and the nearest neighbor classifier. The variance filter is used for calculating the variance of the gray value of the pixels of the image slice and marking the sample with the variance less than half of the variance of the original image slice as negative; the set classifier compares the brightness values of two points arbitrarily selected from the image slice by using a random forest form, if the brightness of the point A is greater than that of the point B, the characteristic value is recorded as 1, otherwise, the characteristic value is recorded as 0, a new characteristic value is obtained when a pair of new positions is selected, and each node of the random forest is compared with a pair of pixel points. The nearest neighbor classifier calculates the relative similarity of the new samples by setting a threshold, and if the relative similarity is greater than the set threshold, the new samples are regarded as positive samples. The positions of all appearances similar to the target object are found by comprehensively scanning the image of each frame, and a positive sample and a negative sample are generated from the result generated by detection and transmitted to the learning module. The learning module adopts a semi-supervised machine learning algorithm P-N learning (P-NLearning), and detects a positive sample of missed detection (the positive sample is wrongly divided into the negative sample) by using a P expert (P-expert) and corrects a positive sample of false detection (the negative sample is wrongly divided into the positive sample) by using an N expert (N-expert) according to positive and negative samples generated by the tracking module and the detection module. Since the object in each frame of image is only possible to appear at most at one position, and the motion of the object between adjacent frames is continuous, the positions of the continuous frames can form a smoother track. The role of the P expert is to find the temporal structure of the data, which uses the results of the tracker to predict the location of the object at +1 frames. If the location (bounding box) is classified as negative P expert by the detector, the location is changed to positive. That is, the P expert ensures that the positions of the objects appearing on the continuous frames can form a continuous track; the role of the N expert is to find the spatial structure of the data by comparing all positive samples generated by the detector and by the P expert, selecting a most reliable location, ensuring that the object is present at most in one location, and using this location as the result of the TLD algorithm. And this position is also used to reinitiate the body tracking test.

In the following figures, the verification of several trackers such as KFC, csrt, mediaflow, MOSSE, MIL, boost, TLD, etc. is shown.

In an experimental environment, a book is selected as a tracking target, target sample parameters are obtained through an offline training target and stored in a document, and the test is used for performing offline training and online tracking on the target through tld tracking algorithm and transmitting the target to an upper computer for checking in real time. The server is located in the local area network, and is configured as an ubuntu 64-bit operating system, intel core i7-4710HQ CPU @2.5GHz, 8G memory, and has an IP address of 192.168.92.134, and the virtual machine server network environment is shown in fig. 11.

The Nvidia tegra k1 originally serves as a client, in the embodiment, a wired network module is adopted, and a memory picture is read in real time through a Tcp protocol of a Tcp/ip protocol family and a server model for realizing p2p, and data pictures are transmitted to target ip addresses 192.168.92.134 and 8000 in real time, and the picture data are transmitted in a binary manner. The client network environment is shown in fig. 12.

The server receives data transmitted by the client, stores and displays the data in real time, and in the embodiment, an offline training target, a tracking target frame of tracking, namely a target with a green frame foreground, is displayed respectively, a green point represents a target tracked by a tracking algorithm, and a white point represents target data which is not tracked. And (4) integrating the obtained target objects through an integration module (integration of a tapping module and a detecting module), and finally obtaining a target result through clustering. With the tracking system we get matched template data, which is the result of the positive sample data from the integration of our tracking module and detector module, we convert the image to a grayscale map and get a 15X15 image slice through affine transformation and save it.

The above embodiments are merely illustrative of the present patent and do not limit the scope of the patent, and those skilled in the art can make modifications to the parts thereof without departing from the spirit and scope of the patent.

Claims

1. Barrier system is kept away to intelligence based on machine vision, its characterized in that: the system is divided into two parts of machine hardware driving and software data processing;

the machine hardware drive comprises a singlechip, a drive plate and a camera;

2. The machine-vision-based intelligent obstacle avoidance system of claim 1, wherein: the single chip microcomputer is a raspberry pi 3B + single chip microcomputer and is connected to the driving plate through a circuit pin;

the driving board is an L298N driving integrated circuit board;

3. An intelligent obstacle avoidance method based on machine vision is realized on the basis of the system of any one of claims 1-2, and is characterized in that: the method comprises the following specific steps:

4. The machine vision-based intelligent obstacle avoidance method according to claim 3, wherein: in the process of establishing the coordinate system by the camera calibration module in the first step, the following steps are adopted:

5. The machine vision-based intelligent obstacle avoidance method according to claim 4, wherein: the specific matrix transformation mode of the coordinate translation transformation and the rotation transformation is as follows:

mode a, translation matrix:

mode b, rotation matrix:

6. the machine vision-based intelligent obstacle avoidance method according to claim 5, wherein: the camera calibration module in the second step establishes a camera imaging model for the camera, and the specific calibration steps are as follows:

Obtaining the following according to the camera imaging model:

7. The machine vision-based intelligent obstacle avoidance method according to claim 6, wherein: the image processing module in step three comprises image noise removal, image feature extraction and image edge detection, and the specific steps are as follows:

f(x,y)＝MED{f(x,y)}(x,y)∈G

step two, feature extraction is to carry out matching operation on each pixel point, the pixel points meeting the requirement of a feature equation are uniformly extracted, the features of the pixel points are determined, the camera is calibrated, black and white cross points of a chessboard in an image are identified, the points are extracted by using the feature equation and are stored in different subsets according to different attributes of the points;

8. The machine vision-based intelligent obstacle avoidance method according to claim 7, wherein: the machine vision implementation module described in step four includes an Opencv BM algorithm, and the steps of processing the calibrated stereo image are as follows:

depth＝(f*baseline)/disp