CN113108794B - Position identification method, device, equipment and computer readable storage medium - Google Patents

Position identification method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN113108794B
CN113108794B CN202110342856.7A CN202110342856A CN113108794B CN 113108794 B CN113108794 B CN 113108794B CN 202110342856 A CN202110342856 A CN 202110342856A CN 113108794 B CN113108794 B CN 113108794B
Authority
CN
China
Prior art keywords
target
prediction
distance
feasible region
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110342856.7A
Other languages
Chinese (zh)
Other versions
CN113108794A (en
Inventor
张树
俞益洲
李一鸣
乔昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shenrui Bolian Technology Co Ltd
Shenzhen Deepwise Bolian Technology Co Ltd
Original Assignee
Beijing Shenrui Bolian Technology Co Ltd
Shenzhen Deepwise Bolian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shenrui Bolian Technology Co Ltd, Shenzhen Deepwise Bolian Technology Co Ltd filed Critical Beijing Shenrui Bolian Technology Co Ltd
Priority to CN202110342856.7A priority Critical patent/CN113108794B/en
Publication of CN113108794A publication Critical patent/CN113108794A/en
Application granted granted Critical
Publication of CN113108794B publication Critical patent/CN113108794B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a position identification method, which comprises the following steps: acquiring a target image to be identified by using a position identification device; predicting a target image to predict a target feasible region in the target image and a first distance transformation graph corresponding to the target image; and identifying the relative position between the target person wearing the position identification device and the target feasible region according to the target feasible region and the first distance transformation diagram. In the application, the first distance transformation graph represents the prediction distance between each target pixel point in the target image and the boundary of the feasible region, so that the position relation between the visually impaired people and the feasible region of the target can be accurately identified based on the first distance transformation graph and the feasible region of the target, and therefore the safe walking of the visually impaired people can be effectively guaranteed. The application also provides a position identification device, equipment and a computer readable storage medium.

Description

Position identification method, device, equipment and computer readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for position identification.
Background
People with visual dysfunction cannot know a drivable area due to lack of effective perception of the external environment, thereby affecting the quality of life and the range of motion of the people. For a long time, manufacturers at home and abroad provide various types of walking aids and equipment for people with visual dysfunction, and try to assist in enhancing the environmental perception capability of the people with visual dysfunction and improve the life quality of the people. In recent years, with the development of miniaturized wearable electronic equipment and the rapid development of image acquisition, processing and analysis algorithms, walking aid equipment for people with visual dysfunction, which calculates in real time, is emerging continuously, and better products and services are provided.
Taking an existing product as an example, the carrier of the system is an integrated module, can be worn in front of the chest and can automatically process information acquired by various sensors to acquire information such as objects, road signs and weather in the surrounding environment, so that a feasible area range is provided, and the system is beneficial to improving the independent action capacity of people with visual impairment. However, the existing products cannot accurately identify the position relationship between the visually impaired people and the feasible region, so that the safe walking of the visually impaired people cannot be effectively guaranteed.
Disclosure of Invention
The application provides a position identification method, a position identification device, position identification equipment and a computer readable storage medium, which can accurately identify the position relation between a visually impaired person and a feasible region.
In a first aspect, the present application provides a location identification method, including:
acquiring a target image to be identified by using a position identification device;
predicting the target image to predict a target feasible region in the target image and a first distance transformation map corresponding to the target image, wherein the first distance transformation map represents the prediction distance between each target pixel point in the target image and a feasible region boundary;
and identifying the relative position between the target person wearing the position identification device and the target feasible region according to the target feasible region and the first distance transformation diagram.
Optionally, the identifying the relative position between the target person wearing the position identifying device and the target feasible region according to the target feasible region and the first distance transformation map includes:
calculating a second distance transformation graph corresponding to the target image according to the target feasible region;
calculating a third distance transformation graph corresponding to the target image according to the first distance transformation graph and the second distance transformation graph;
wherein the second and third distance transformation maps characterize a calculated distance between each target pixel point in the target image and a feasible region boundary;
and determining the position of the target person wearing the position recognition device in the third distance conversion map according to the third distance conversion map.
Optionally, the method further includes:
and carrying out alarm prompt on the target personnel based on the position of the target personnel in the third distance transformation diagram.
Optionally, the predicting the target image includes:
and predicting the target image by using a pre-trained prediction model.
Optionally, the prediction model is trained in the following manner:
extracting a sample image and a labeling result of a feasible region in the sample image from a training data set in each round of training of the prediction model to serve as training data of the round;
predicting a sample feasible region in the sample image and a sample distance transformation map corresponding to the sample image based on the training data of the current round, wherein the sample distance transformation map represents the prediction distance between each sample pixel point in the sample image and a feasible region boundary;
and adjusting the current parameters of the prediction model according to the sample feasible region and the sample distance transformation graph.
Optionally, the predicting the feasible sample region in the sample image and the sample distance transformation map corresponding to the sample image includes:
predicting a sample feasible region in the sample image by using a dedicated prediction parameter for predicting the feasible region; predicting a sample distance transformation map corresponding to the sample image by using the exclusive prediction parameter for predicting the distance transformation map;
alternatively, the first and second electrodes may be,
predicting a sample distance transform map in the sample image using the shared prediction parameters and the proprietary prediction parameters used to predict the distance transform map; and predicting the feasible sample area corresponding to the sample image by using the shared prediction parameter, the exclusive prediction parameter for predicting the feasible area and the prediction result of the sample distance transformation map.
Optionally, the loss function used by the prediction model in the training phase includes: a first penalty function for distance transform prediction; wherein the first loss function is obtained according to a Euclidean distance loss function and a countermeasure loss function.
Optionally, the loss function used by the prediction model in the training phase includes: a second penalty function for feasible region prediction; the second loss function is obtained according to a standard cross entropy loss function and a soft cross entropy loss function, wherein the soft cross entropy loss function is a cross entropy loss function constructed by a model learning target based on a distance transformation graph.
In a second aspect, the present application provides a position recognition apparatus, comprising:
an image acquisition unit for acquiring a target image to be recognized by using the position recognition device;
the image prediction unit is used for predicting the target image so as to predict a target feasible region in the target image and a first distance transformation map corresponding to the target image, wherein the first distance transformation map represents the prediction distance between each target pixel point in the target image and a feasible region boundary;
and the position identification unit is used for identifying the relative position between the target person wearing the position identification device and the target feasible region according to the target feasible region and the first distance transformation diagram.
Optionally, the position identifying unit is specifically configured to:
calculating a second distance transformation graph corresponding to the target image according to the target feasible region;
calculating a third distance transformation graph corresponding to the target image according to the first distance transformation graph and the second distance transformation graph;
wherein the second and third distance transformation maps characterize a calculated distance between each target pixel point in the target image and a feasible region boundary;
and determining the position of the target person wearing the position recognition device in the third distance conversion map according to the third distance conversion map.
Optionally, the apparatus further comprises:
and the alarm prompting unit is used for carrying out alarm prompting on the target personnel based on the position of the target personnel in the third distance transformation diagram.
Optionally, the image prediction unit is specifically configured to:
and predicting the target image by using a pre-trained prediction model.
Optionally, the apparatus further comprises:
the data extraction unit is used for extracting a sample image and a labeling result of a feasible region in the sample image from a training data set in each round of training of the prediction model to serve as training data of the round;
a sample prediction unit, configured to predict, based on the current round of training data, a sample feasible region in the sample image and a sample distance transformation map corresponding to the sample image, where the sample distance transformation map represents a prediction distance between each sample pixel point in the sample image and a feasible region boundary;
and the parameter adjusting unit is used for adjusting the current parameters of the prediction model according to the sample feasible region and the sample distance transformation graph.
Optionally, the sample prediction unit is specifically configured to:
predicting a sample feasible region in the sample image by using a dedicated prediction parameter for predicting the feasible region; predicting a sample distance transformation map corresponding to the sample image by using the exclusive prediction parameter for predicting the distance transformation map;
alternatively, the first and second electrodes may be,
predicting a sample distance transform map in the sample image using the shared prediction parameters and the proprietary prediction parameters used to predict the distance transform map; and predicting the feasible sample area corresponding to the sample image by using the shared prediction parameter, the exclusive prediction parameter for predicting the feasible area and the prediction result of the sample distance transformation map.
Optionally, the loss function used by the prediction model in the training phase includes: a first penalty function for distance transform prediction; wherein the first loss function is obtained according to a Euclidean distance loss function and a countermeasure loss function.
Optionally, the loss function used by the prediction model in the training phase includes: a second penalty function for feasible region prediction; the second loss function is obtained according to a standard cross entropy loss function and a soft cross entropy loss function, wherein the soft cross entropy loss function is a cross entropy loss function constructed by a model learning target based on a distance transformation graph.
In a third aspect, the present application provides an electronic device, comprising: a processor, a memory;
the memory for storing a computer program;
the processor is used for executing the position identification method by calling the computer program.
In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described position identification method.
In the technical scheme provided by the application, a position identification device is utilized to obtain a target image to be identified; predicting a target image to predict a target feasible region in the target image and a first distance transformation graph corresponding to the target image; and identifying the relative position between the target person wearing the position identification device and the target feasible region according to the target feasible region and the first distance transformation diagram. In the application, the first distance transformation graph represents the prediction distance between each target pixel point in the target image and the boundary of the feasible region, so that the position relation between the visually impaired people and the feasible region of the target can be accurately identified based on the first distance transformation graph and the feasible region of the target, and therefore the safe walking of the visually impaired people can be effectively guaranteed.
Drawings
Fig. 1 is a schematic flow chart of a position identification method according to the present application;
FIG. 2 is a block diagram of a training phase flow illustrated in the present application;
FIG. 3 is a block flow diagram of the inference phase shown in the present application;
FIG. 4 is a schematic diagram of the components of a position identification device shown in the present application;
fig. 5 is a schematic structural diagram of an electronic device shown in the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The embodiment of the application provides a position identification method, the method takes an image or a video collected by a vision sensor as input, obtains a more accurate feasible region and distance transformation diagram in the image or the video, and can provide environment perception assistance for visually impaired people based on the obtained feasible region and distance transformation diagram, so that the life quality of the visually impaired people is improved.
Referring to fig. 1, a schematic flow chart of a position identification method provided in an embodiment of the present application is shown, where the method includes the following steps S101 to S103:
s101: and acquiring a target image to be identified by using the position identification device.
In the embodiment of the present application, the type of the position recognition apparatus is not limited, and for example, the position recognition apparatus is blind-guide glasses.
In practical applications, when a target person (such as a visually impaired person) wearing the position recognition apparatus uses the position recognition apparatus, since the position recognition apparatus is equipped with a vision sensor, an image or a video of a scene around the target person can be captured by the vision sensor, and herein, any one of the captured image or the captured video is defined as a target image.
S102: and predicting the target image to predict a target feasible region in the target image and a first distance transformation map corresponding to the target image, wherein the first distance transformation map represents the prediction distance between each target pixel point in the target image and the boundary of the feasible region.
In the embodiment of the present application, image prediction can be performed from two aspects. On one hand, a target feasible region in a target image is predicted, and the feasible region range belongs to the safe advancing range of target personnel; and on the other hand, predicting a distance transformation graph, which is defined as a first distance transformation graph, wherein the first distance transformation graph carries the predicted distances between a plurality of target pixel points in the target image and a feasible region boundary, and the plurality of target pixel points in the target image may include all pixel points in the target image or include part of pixel points selected from the target image at preset intervals.
In one implementation manner of the embodiment of the present application, the "predicting a target image" in S102 may include: and predicting the target image by using a pre-trained prediction model. In this implementation, the target image collected by the vision sensor of the position recognition device may be used as an input, and the target image is predicted by using a pre-trained prediction model, that is, the first distance transformation map corresponding to the target feasible region and the target image in the target image is predicted by using the prediction model, where the prediction model may be a semantic segmentation model.
In fact, the prediction model is a multitask deep learning model that predicts the feasible region range and the distance transformation map simultaneously. Before the initial prediction model is trained, data for model training needs to be prepared, that is, a plurality of sample images X for model training and a feasible region labeling result Y, Y in each sample image X may be a binary image in which a region with a pixel value of 0 represents an infeasible region and a region with a pixel value of 1 represents a feasible region. The acquisition and labeling of training data needs to be obtained in advance.
The structure of the prediction model can be based on common deep network models for dense target prediction, such as U-Net, FCN, deep lab series, FPN, PSPNet, HRNet, and the like. In the model training phase, the input of the model is the acquired sample image X, and in the model using phase (inference phase), the input of the model is the acquired target image, and the size of the prediction layer of the model can be equal to the original size of the image or slightly scaled.
In the embodiment of the application, in the model training stage, the distance transformation graph is introduced as an auxiliary supervision signal, and a prediction model capable of predicting a feasible region and the distance transformation graph simultaneously is constructed. In general, in the training phase of the prediction model, multiple rounds of training of the prediction model are required until the training end condition is satisfied, and each round of training of the prediction model is described below with reference to the flow chart of the training phase shown in fig. 2, which may include the following steps a 1-A3:
step A1: in each round of training of the prediction model, a sample image and the labeling result of the feasible region in the sample image are extracted from the training data set and used as the training data of the round.
In the embodiment of the present application, a sample image X and a labeling result Y of a feasible region in the sample image may be extracted from a training data set prepared in advance, and the sample image X and the labeling result Y of the feasible region thereof are used as training data of the current round.
Step A2: and predicting a sample feasible region in the sample image and a sample distance transformation map corresponding to the sample image based on the training data of the current round, wherein the sample distance transformation map represents the prediction distance between each sample pixel point in the sample image and the feasible region boundary.
The target pixel points in the sample image may include all the pixel points in the sample image, or may include some pixel points selected from the sample image at preset intervals.
In the embodiment of the present application, the prediction head of the prediction model is required to perform prediction, and the prediction model is a multi-task learning model having two prediction heads, which are respectively used for performing segmentation prediction of a feasible region and regression prediction of a distance transformation map.
In a first implementation manner of the embodiment of the present application, the "predicting a sample feasible region in the sample image and a sample distance transformation map corresponding to the sample image" in step a2 may include: predicting a sample feasible region in the sample image by using an exclusive prediction parameter for predicting the feasible region; and predicting the sample distance transformation map corresponding to the sample image by using the exclusive prediction parameter for predicting the distance transformation map.
In this implementation, the predictive model includes two predictive heads that are connected in a relatively independent manner. One prediction head comprises a special prediction parameter for predicting the feasible region, and the feasible region of the sample in the sample image can be predicted by using the special prediction parameter; the other prediction header comprises a dedicated prediction parameter for predicting the distance transform map, by means of which the sample distance transform map corresponding to the sample image can be predicted. In this relatively independent connection, the two prediction tasks share a feature layer, but after the feature layer, the two separate prediction headers are connected to predict different tasks.
In a second implementation manner of the embodiment of the present application, the "predicting a sample feasible region in the sample image and a sample distance transformation map corresponding to the sample image" in step a2 may include: predicting a sample distance transform map in the sample image using the shared prediction parameters and the proprietary prediction parameters used to predict the distance transform map; and predicting the feasible sample area corresponding to the sample image by using the shared prediction parameter, the exclusive prediction parameter for predicting the feasible area and the prediction result of the sample distance transformation map.
In this implementation, the prediction model includes two prediction heads, which are connected in a cascade. Different from the relatively independent connection mode, the cascaded mode is that the two prediction tasks share a common characteristic layer and also share most of characteristics (namely shared prediction parameters) in a prediction head, different parameters are only used in the final prediction layer, namely, the sample distance transformation map in the sample image is predicted by using the shared prediction parameters and the exclusive prediction parameters for predicting the distance transformation map, and the sample feasible region corresponding to the sample image is predicted by using the shared prediction parameters and the exclusive prediction parameters for predicting the feasible region; meanwhile, the prediction head used for predicting the feasible region segmentation needs to splice the prediction results of the distance transformation graph prediction branches to the shared features besides using the shared features, and a convolution layer is used for aggregating the features to obtain the aggregated cascade features, so that the prediction head is finally used for predicting the feasible region segmentation.
Step A3: and adjusting the current parameters of the prediction model according to the sample feasible region and the sample distance transformation graph.
In the embodiment of the present application, after the sample feasible region and the sample distance transformation map are obtained through prediction in step a2, the current parameters of the prediction model may be adjusted based on the sample feasible region and the sample distance transformation map, so as to implement the model training of the present round.
It should be noted that, a loss function is also needed in the model training phase, and before introducing the loss function, the deep learning prediction target related to the embodiment of the present application is introduced, and the prediction model for implementing the multi-task learning provided in the embodiment of the present application needs to predict two targets, one is prediction of a partition result of a binary feasible region, and the other is prediction of a distance transformation map. The binary prediction target for the drivable region segmentation is directly obtained by data labeling; the prediction target of the distance transformation graph needs to be obtained through the following calculation formula, and particularly, a special distance transformation graph can be used as a learning target, and the special distance transformation graph can be used as the prediction target of two prediction branches when a subsequent loss function is designed.
The calculation process of the special distance transformation map is described below.
First, a signed distance transformation map is calculated based on the following formula:
Figure BDA0002999779000000101
wherein x is i,j Representing the coordinate position of a sample pixel point (i, j) in the sample image; d i,j Representing a value in the distance transformation graph corresponding to the sample pixel point (i, j); z represents the coordinate position of the foreground pixel closest to the sample pixel (i, j) in the sample image; m represents a set of sample pixel points in the sample image.
Next, in order to obtain a distance transformation map that can be used for regression prediction, the above-described signed distance transformation map may be subjected to a process of maximum value truncation and normalization. Specifically, first, the value range of the signed distance transformation map is normalized to the range of [ -1,1] (where 0 represents a feasible region boundary); then, the data is transformed to the interval of [0,1] through the normalization operation (in this case, 0.5 represents a feasible region boundary); and finally, processing the value of the normalized distance transformation graph by using gamma transformation to enhance the contrast of the distance transformation graph, wherein when the gamma transformation is used, the value of the parameter of the gamma transformation is less than 1 to obtain the effect of enhancing the contrast.
After the transformation, a special distance transformation graph with a final value range of 0-1 and enhanced contrast is obtained, and the special distance transformation graph can be used as prediction targets of two prediction branches during subsequent loss function design.
The loss function used in the model training phase is described below.
The loss function used by the predictive model in the training phase may include: a first penalty function for distance transform prediction and a second penalty function for feasible region prediction; the first loss function is obtained according to a Euclidean distance loss function and a countermeasure loss function, the second loss function is obtained according to a standard cross entropy loss function and a soft cross entropy loss function, and the soft cross entropy loss function is a cross entropy loss function constructed based on a model learning target of a distance transformation graph.
Specifically, the loss function of the network model is divided into two parts, one is a partition prediction part of the feasible region and the other is a regression part of the distance transformation map.
In the regression part of the distance transformation graph, the Euclidean distance loss function and the countermeasure loss function can be jointly used for calculating the loss between the model prediction value and the prediction target, so that the network is optimized. Specifically, the euclidean distance penalty, i.e., the L2 penalty (or the L1 penalty may also be used), may constrain the output of the network to be as numerically consistent as possible with the learning objective; and the introduced antagonistic loss (additive losses) can effectively restrict the output of the network prediction to be consistent with the distribution of the learning target as much as possible, so that more accurate prediction is obtained, the two loss functions are balanced by the weight a with the value range of 0-1, and finally the loss function (a) Euclidean distance loss + (1-a) antagonistic loss is obtained.
In the prediction part of the feasible region segmentation, a soft cross entropy loss can be constructed by taking the special distance transformation diagram introduced above as a learning target on the basis of the traditional cross entropy loss (the traditional cross entropy loss calculation is the loss between the network prediction and the learning target of one-hot), and since the special distance transformation diagram is a diagram with the value range of 0-1, the foreground target is close to 1, and the background target is close to 0, the method is reasonable to use as the learning target of the cross entropy. Finally, the loss function of the feasible region division branch is b standard cross entropy loss + (1-b) soft cross entropy loss, and b is a weight balance parameter with the value between 0 and 1. The soft cross entropy loss calculation formula is the same as the standard cross entropy loss calculation formula, and the difference is that the value in the standard cross entropy loss is only yi which can be 0 or 1, and the yi is changed into any floating point number with the value in the range of 0-1.
The calculation formula of the standard cross entropy loss is as follows:
Figure BDA0002999779000000111
as can be seen from the above, the prediction model provided in the embodiment of the present application is a multitask deep learning model capable of predicting a feasible region and a distance transformation map at the same time, and the prediction model includes two prediction heads, one for predicting the distance transformation map and one for predicting the feasible region. The method comprises the following steps that branches of a distance transformation graph are predicted, a special distance transformation graph is introduced to serve as an auxiliary supervision signal, and the distance transformation graph is predicted through antagonistic loss and regression loss; and (3) predicting branches of the feasible region, and then taking a special distance transformation graph as a learning target on the basis of common prediction one-hot coding, and optimizing the learning target by using cross entropy loss.
After the training of the prediction model is completed, the target image may be input into the trained prediction model, and the model is used to infer the target feasible region and the first distance transformation map (as shown in the flow diagram of the inference stage in fig. 3).
S103: and identifying the relative position between the target person wearing the position identification device and the target feasible region according to the target feasible region and the first distance transformation diagram.
In the embodiment of the application, the relative position between the target person wearing the position recognition device and the target feasible region can be recognized according to the target feasible region and the first distance transformation map, so that the distance between the target person and the boundary of the feasible region can be determined, and the distance can be used for indicating the traveling route and the safety prompt of the target person.
In an implementation manner of the embodiment of the present application, the "identifying a relative position between a target person wearing the position identifying device and a target feasible region according to the target feasible region and the first distance transformation map" in S103 may include: calculating a second distance transformation graph corresponding to the target image according to the target feasible region, and calculating a third distance transformation graph corresponding to the target image according to the first distance transformation graph and the second distance transformation graph, wherein the second distance transformation graph and the third distance transformation graph represent the calculated distance between each target pixel point in the target image and the boundary of the feasible region; then, the position of the target person wearing the position recognition device in the third distance conversion map is determined according to the third distance conversion map.
In this implementation, a distance transformation map may be calculated according to a standard distance transformation map calculation method based on the target feasible region, and defined as a second distance transformation map, and then the calculated second distance transformation map and the first distance transformation map predicted before are weighted and averaged, specifically, a value on a corresponding pixel point may be weighted and calculated, so as to obtain a more accurate third distance transformation map (as shown in the flow diagram of the inference stage shown in fig. 3), so as to determine the position of the target person wearing the position recognition device in the third distance transformation map based on the third distance transformation map.
Further, the embodiment of the present application may further include: and carrying out alarm prompt on the target personnel based on the position of the target personnel in the third distance transformation diagram. Specifically, in the third distance transformation map, the value range is [0,1] (0.5 represents a feasible region boundary), so that the pixel points closer to 1 in the map are relatively safer, and the regions closer to 0.5 or less than 0.5 are infeasible regions, so that different thresholds can be set using the pixel values in the third distance transformation map, so that different levels of alarm can be provided for the target person based on the distance between the target person and the feasible region boundary of the target, and the alarm level is higher as the target person is farther from the feasible region of the target, or vice versa (as shown in the flow chart of the inference stage in fig. 3).
Therefore, in order to provide more accurate and robust action assistance for target personnel, when the feasible region segmentation result is used for providing reminding for the target personnel, a threshold reminding mechanism based on a distance transformation graph is set, alarm reminding of different levels is provided according to the distance between the target personnel and the feasible region boundary, danger of the target personnel can be effectively avoided, and the activity safety of the target personnel is ensured.
It should be noted that the prediction of the feasible region can assist the visually impaired people to obtain the perception of the current environment, and effectively prompt the user when the user approaches the feasible region boundary, so as to avoid the occurrence of danger. However, in the existing products (such as walking glasses, automatic driving systems), when the analysis method of the travelable region is performed by relying on the deep learning technology, a semantic segmentation network is trained by using the standard cross entropy loss as a loss function during model learning, which results in that the learned model has a weak modeling capability for the overall structure of the travelable region and a low prediction accuracy for the boundary of the travelable region. Because these inaccurate predictions have brought certain difficulty to providing the supplementary warning for the visual impairment crowd, consequently, this application embodiment goes to promoting the warning accuracy from two angles. Firstly, a distance transformation graph is introduced as an auxiliary supervision signal during model training, and a prediction model capable of predicting a feasible region and the distance transformation graph simultaneously is constructed; secondly, in the embodiment of the application, the weighted average of the distance transformation map predicted by the model and the distance transformation map calculated by the feasible region is used as the final distance transformation map to provide different levels of alarm for the visually impaired people, so that the danger can be avoided.
In the position identification method provided by the embodiment of the application, the position identification device is used for acquiring a target image to be identified; predicting a target image to predict a target feasible region in the target image and a first distance transformation graph corresponding to the target image; and identifying the relative position between the target person wearing the position identification device and the target feasible region according to the target feasible region and the first distance transformation diagram. In the embodiment of the application, the first distance transformation graph represents the predicted distance between each target pixel point in the target image and the boundary of the feasible region, so that the position relationship between the visually impaired people and the feasible region of the target can be accurately identified based on the first distance transformation graph and the feasible region of the target, and the safe walking of the visually impaired people can be effectively guaranteed.
Referring to fig. 4, a schematic composition diagram of a position identification apparatus provided in an embodiment of the present application is shown, where the apparatus includes:
an image acquisition unit 410 for acquiring a target image to be recognized by using the position recognition device;
an image prediction unit 420, configured to predict the target image to predict a target feasible region in the target image and a first distance transformation map corresponding to the target image, where the first distance transformation map represents a predicted distance between each target pixel point in the target image and a feasible region boundary;
a position identifying unit 430, configured to identify a relative position between a target person wearing the position identifying device and the target feasible region according to the target feasible region and the first distance transformation map.
In an implementation manner of the embodiment of the present application, the position identifying unit 430 is specifically configured to:
calculating a second distance transformation graph corresponding to the target image according to the target feasible region;
calculating a third distance transformation graph corresponding to the target image according to the first distance transformation graph and the second distance transformation graph;
wherein the second and third distance transformation maps characterize a calculated distance between each target pixel point in the target image and a feasible region boundary;
and determining the position of the target person wearing the position recognition device in the third distance conversion map according to the third distance conversion map.
In an implementation manner of the embodiment of the present application, the apparatus further includes:
and the alarm prompting unit is used for carrying out alarm prompting on the target personnel based on the position of the target personnel in the third distance transformation diagram.
In an implementation manner of the embodiment of the present application, the image prediction unit 420 is specifically configured to:
and predicting the target image by using a pre-trained prediction model.
In an implementation manner of the embodiment of the present application, the apparatus further includes:
the data extraction unit is used for extracting a sample image and a labeling result of a feasible region in the sample image from a training data set in each round of training of the prediction model to serve as training data of the round;
a sample prediction unit, configured to predict, based on the current round of training data, a sample feasible region in the sample image and a sample distance transformation map corresponding to the sample image, where the sample distance transformation map represents a prediction distance between each sample pixel point in the sample image and a feasible region boundary;
and the parameter adjusting unit is used for adjusting the current parameters of the prediction model according to the sample feasible region and the sample distance transformation graph.
In an implementation manner of the embodiment of the present application, the sample prediction unit is specifically configured to:
predicting a sample feasible region in the sample image by using a dedicated prediction parameter for predicting the feasible region; predicting a sample distance transformation map corresponding to the sample image by using the exclusive prediction parameter for predicting the distance transformation map;
alternatively, the first and second electrodes may be,
predicting a sample distance transform map in the sample image using the shared prediction parameters and the proprietary prediction parameters used to predict the distance transform map; and predicting the feasible sample area corresponding to the sample image by using the shared prediction parameter, the exclusive prediction parameter for predicting the feasible area and the prediction result of the sample distance transformation map.
In an implementation manner of the embodiment of the present application, the loss function used by the prediction model in the training phase includes: a first penalty function for distance transform prediction;
wherein the first loss function is obtained from a Euclidean distance loss function and a penalty loss function.
In an implementation manner of the embodiment of the present application, the loss function used by the prediction model in the training phase includes: a second penalty function for feasible region prediction;
the second loss function is obtained according to a standard cross entropy loss function and a soft cross entropy loss function, wherein the soft cross entropy loss function is a cross entropy loss function constructed by a model learning target based on a distance transformation graph.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.
An embodiment of the present application further provides an electronic device, a schematic structural diagram of the electronic device is shown in fig. 5, where the electronic device 5000 includes at least one processor 5001, a memory 5002, and a bus 5003, and the at least one processor 5001 is electrically connected to the memory 5002; the memory 5002 is configured to store at least one computer-executable instruction, and the processor 5001 is configured to execute the at least one computer-executable instruction to perform the steps of any of the location identification methods as provided in any of the embodiments or any alternative embodiments herein.
Further, the processor 5001 may be an FPGA (Field-Programmable Gate Array) or other devices with logic processing capability, such as an MCU (micro controller Unit) and a CPU (Central processing Unit).
By applying the embodiment of the application, the first distance transformation graph represents the prediction distance between each target pixel point in the target image and the boundary of the feasible region, so that the position relation between the visually impaired people and the feasible region of the target can be accurately identified based on the first distance transformation graph and the feasible region of the target, and the safe walking of the visually impaired people can be effectively guaranteed.
The embodiments of the present application further provide another computer-readable storage medium, which stores a computer program, and the computer program is used for implementing the steps of any one of the location identification methods provided in any one of the embodiments or any one of the alternative embodiments of the present application when the computer program is executed by a processor.
The computer-readable storage medium provided by the embodiments of the present application includes, but is not limited to, any type of disk including floppy disks, hard disks, optical disks, CD-ROMs, and magneto-optical disks, ROMs (Read-Only memories), RAMs (Random Access memories), EPROMs (Erasable Programmable Read-Only memories), EEPROMs (Electrically Erasable Programmable Read-Only memories), flash memories, magnetic cards, or optical cards. That is, a readable storage medium includes any medium that stores or transmits information in a form readable by a device (e.g., a computer).
By applying the embodiment of the application, the first distance transformation graph represents the prediction distance between each target pixel point in the target image and the boundary of the feasible region, so that the position relation between the visually impaired people and the feasible region of the target can be accurately identified based on the first distance transformation graph and the feasible region of the target, and the safe walking of the visually impaired people can be effectively guaranteed.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims (9)

1. A method of location identification, comprising:
acquiring a target image to be identified by using a position identification device;
predicting the target image by using a pre-trained prediction model to predict a target feasible region in the target image and a first distance transformation map corresponding to the target image, wherein the prediction model is a multi-task deep learning model for simultaneously predicting a feasible region range and a distance transformation map, and is provided with two prediction heads which are respectively used for segmentation prediction of the feasible region and regression prediction of the distance transformation map, and the first distance transformation map represents the prediction distance between each target pixel point in the target image and a feasible region boundary;
calculating a second distance transformation graph corresponding to the target image according to the target feasible region; calculating a third distance transformation graph corresponding to the target image according to the first distance transformation graph and the second distance transformation graph; wherein the second and third distance transformation maps characterize a calculated distance between each target pixel point in the target image and a feasible region boundary; and determining the position of the target person wearing the position recognition device in the third distance conversion map according to the third distance conversion map.
2. The method of claim 1, further comprising:
and carrying out alarm prompt on the target personnel based on the position of the target personnel in the third distance transformation diagram.
3. The method of claim 1, wherein the predictive model is trained in the following manner:
extracting a sample image and a labeling result of a feasible region in the sample image from a training data set in each round of training of the prediction model to serve as training data of the round;
predicting a sample feasible region in the sample image and a sample distance transformation map corresponding to the sample image based on the training data of the current round, wherein the sample distance transformation map represents the prediction distance between each sample pixel point in the sample image and a feasible region boundary;
and adjusting the current parameters of the prediction model according to the sample feasible region and the sample distance transformation graph.
4. The method of claim 3, wherein predicting the sample feasible region in the sample image and the corresponding sample distance transform map for the sample image comprises:
predicting a sample feasible region in the sample image by using an exclusive prediction parameter for predicting the feasible region; predicting a sample distance transformation map corresponding to the sample image by using the exclusive prediction parameter for predicting the distance transformation map;
alternatively, the first and second electrodes may be,
predicting a sample distance transform map in the sample image using the shared prediction parameters and the proprietary prediction parameters used to predict the distance transform map; and predicting the feasible sample area corresponding to the sample image by using the shared prediction parameter, the exclusive prediction parameter for predicting the feasible area and the prediction result of the sample distance transformation map.
5. The method of claim 3 or 4, wherein the predictive model uses a loss function in a training phase, comprising: a first penalty function for distance transform prediction;
wherein the first loss function is obtained from a Euclidean distance loss function and a penalty loss function.
6. The method of claim 3 or 4, wherein the loss function used by the predictive model in the training phase comprises: a second penalty function for feasible region prediction;
the second loss function is obtained according to a standard cross entropy loss function and a soft cross entropy loss function, wherein the soft cross entropy loss function is a cross entropy loss function constructed by a model learning target based on a distance transformation graph.
7. A position recognition apparatus, comprising:
an image acquisition unit for acquiring a target image to be recognized by using the position recognition device;
the image prediction unit is used for predicting the target image by using a pre-trained prediction model to predict a target feasible region in the target image and a first distance transformation graph corresponding to the target image, the prediction model is a multi-task deep learning model for simultaneously predicting a feasible region range and the distance transformation graph, the image prediction unit is provided with two prediction heads which are respectively used for performing segmentation prediction of the feasible region and regression prediction of the distance transformation graph, and the first distance transformation graph represents the prediction distance between each target pixel point in the target image and a feasible region boundary;
the position identification unit is used for calculating a second distance transformation graph corresponding to the target image according to the target feasible region; calculating a third distance transformation graph corresponding to the target image according to the first distance transformation graph and the second distance transformation graph; wherein the second and third distance transformation maps characterize a calculated distance between each target pixel point in the target image and a feasible region boundary; and determining the position of the target person wearing the position recognition device in the third distance conversion map according to the third distance conversion map.
8. An electronic device, comprising: a processor, a memory;
the memory for storing a computer program;
the processor, configured to execute the location identification method according to any one of claims 1 to 6 by calling the computer program.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the position recognition method of any one of claims 1 to 6.
CN202110342856.7A 2021-03-30 2021-03-30 Position identification method, device, equipment and computer readable storage medium Active CN113108794B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110342856.7A CN113108794B (en) 2021-03-30 2021-03-30 Position identification method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110342856.7A CN113108794B (en) 2021-03-30 2021-03-30 Position identification method, device, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN113108794A CN113108794A (en) 2021-07-13
CN113108794B true CN113108794B (en) 2022-09-16

Family

ID=76712875

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110342856.7A Active CN113108794B (en) 2021-03-30 2021-03-30 Position identification method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113108794B (en)

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106726377A (en) * 2016-12-08 2017-05-31 上海电力学院 Road surface Feasible degree indicator based on artificial intelligence
CN107990902B (en) * 2017-12-29 2019-08-16 达闼科技(北京)有限公司 Air navigation aid, navigation system based on cloud, electronic equipment
CN108345875B (en) * 2018-04-08 2020-08-18 北京初速度科技有限公司 Driving region detection model training method, detection method and device
DE102018208143A1 (en) * 2018-05-24 2019-11-28 Zf Friedrichshafen Ag Method and device for detecting a trafficable area off the roads
CN111209779A (en) * 2018-11-21 2020-05-29 北京市商汤科技开发有限公司 Method, device and system for detecting drivable area and controlling intelligent driving
CN111352430B (en) * 2020-05-25 2020-09-25 北京云迹科技有限公司 Path planning method and device and robot
CN112419154A (en) * 2020-11-26 2021-02-26 三一专用汽车有限责任公司 Method, device, equipment and computer readable storage medium for detecting travelable area
CN112541408B (en) * 2020-11-30 2022-02-25 北京深睿博联科技有限责任公司 Feasible region identification method, device, equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN113108794A (en) 2021-07-13

Similar Documents

Publication Publication Date Title
CN110443969B (en) Fire detection method and device, electronic equipment and storage medium
CN112417953B (en) Road condition detection and map data updating method, device, system and equipment
CN111209832B (en) Auxiliary obstacle avoidance training method, equipment and medium for substation inspection robot
CN112990211A (en) Neural network training method, image processing method and device
CN113177968A (en) Target tracking method and device, electronic equipment and storage medium
WO2021134357A1 (en) Perception information processing method and apparatus, computer device and storage medium
CN112435469A (en) Vehicle early warning control method and device, computer readable medium and electronic equipment
CN112101114B (en) Video target detection method, device, equipment and storage medium
CN114926791A (en) Method and device for detecting abnormal lane change of vehicles at intersection, storage medium and electronic equipment
CN111161545A (en) Intersection region traffic parameter statistical method based on video
CN114359618A (en) Training method of neural network model, electronic equipment and computer program product
CN113108794B (en) Position identification method, device, equipment and computer readable storage medium
CN112633074A (en) Pedestrian information detection method and device, storage medium and electronic equipment
CN116935356A (en) Weak supervision-based automatic driving multi-mode picture and point cloud instance segmentation method
CN115019218B (en) Image processing method and processor
CN110738208A (en) efficient scale-normalized target detection training method
CN116704574A (en) Fatigue driving detection method and system based on yolov7 end-to-end multitask learning
CN115953668A (en) Method and system for detecting camouflage target based on YOLOv5 algorithm
CN113643529B (en) Parking lot lane congestion prediction method and system based on big data analysis
CN111062311B (en) Pedestrian gesture recognition and interaction method based on depth-level separable convolution network
WO2019228654A1 (en) Method for training a prediction system and system for sequence prediction
CN115249269A (en) Object detection method, computer program product, storage medium, and electronic device
CN111695404A (en) Pedestrian falling detection method and device, electronic equipment and storage medium
CN114970654B (en) Data processing method and device and terminal
CN111950507B (en) Data processing and model training method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant