CN110610191A

CN110610191A - Elevator floor identification method and device and terminal equipment

Info

Publication number: CN110610191A
Application number: CN201910718684.1A
Authority: CN
Inventors: 朱诚; 白刚; 夏舸
Original assignee: Uditech Co Ltd
Current assignee: Uditech Co Ltd
Priority date: 2019-08-05
Filing date: 2019-08-05
Publication date: 2019-12-24

Abstract

The application provides an elevator floor identification method, an elevator floor identification device and terminal equipment, which are applicable to the technical field of data processing, and the method comprises the following steps: the method comprises the steps of obtaining a floor image to be identified and a plurality of preset first floor image samples corresponding to n floors, wherein each floor corresponds to at least one first floor image sample, and the floor image to be identified and the floor image samples both comprise display screens for displaying floors; combining the floor image to be identified with each first floor image sample respectively to obtain a plurality of corresponding image pairs; respectively inputting the plurality of image pairs into a pre-trained twin neural network model for processing to obtain a plurality of corresponding image similarities; and identifying the floor corresponding to the first floor image sample in the image pair with the highest image similarity as the floor corresponding to the floor image to be identified. The embodiment of the application ensures the accuracy of floor identification, improves the speed of floor identification, and meets the requirement of the robot for taking the elevator immediately.

Description

Elevator floor identification method and device and terminal equipment

Technical Field

The application belongs to the technical field of data processing, and particularly relates to an elevator floor identification method and terminal equipment.

Background

When the robot in the prior art identifies elevator floors, image identification is often directly carried out on floor display screens, but the existing image identification algorithm is either too simple and difficult to deal with complex environment interference, so that the identification accuracy is difficult to guarantee, or too fat, so that the identification speed is too slow, the efficiency is too low, and the elevator taking requirement of the robot cannot be responded immediately.

Disclosure of Invention

In view of this, the embodiment of the present application provides an elevator floor identification method and a terminal device, which can solve the problems that the elevator floor identification accuracy is low, the speed is low, and the immediate requirement of a robot for taking an elevator cannot be met.

A first aspect of an embodiment of the present application provides an elevator floor identification method, including:

the method comprises the steps of obtaining a floor image to be identified and a plurality of preset first floor image samples corresponding to n floors, wherein n is an integer larger than 1, each floor corresponds to at least one first floor image sample, and the floor image to be identified and the floor image samples both comprise a display screen for displaying the floor;

combining the floor image to be identified with each first floor image sample respectively to obtain a plurality of corresponding image pairs;

respectively inputting the images into a pre-trained twin neural network model for processing to obtain a plurality of corresponding image similarities, wherein the twin neural network model is used for identifying the image similarities of the two floor images;

and identifying the floor corresponding to the first floor image sample in the image pair with the highest image similarity as the floor corresponding to the floor image to be identified.

With reference to the first aspect, in a first possible implementation manner of the first aspect, before the combining the floor image to be identified with each of the first floor image samples, the method further includes:

and carrying out image smoothing treatment on the floor image to be identified.

With reference to the first aspect, in a second possible implementation manner of the first aspect, before the combining the floor image to be identified with each of the first floor image samples, the method further includes:

and cutting the image of the floor to be identified to obtain the floor image to be identified only comprising the display screen.

With reference to the first aspect, in a third possible implementation manner of the first aspect, the training process for the twin neural network model includes:

acquiring n preset first image sample sets and n preset second image sample sets, wherein each first image sample set uniquely corresponds to one floor, each floor image sample set comprises h second floor image samples corresponding to the floors, each second image sample set uniquely corresponds to one floor, each second image sample set comprises one second floor image sample corresponding to the floor and r second floor image samples not corresponding to the floors, h and r are positive integers, and h is greater than r;

combining image samples in the sets of each first image sample set and each second image sample set to obtain n x (h-1+ r) image sample pairs containing two second floor image samples, and adding corresponding matching labels to each image sample pair, wherein the image sample pair corresponding to each second image sample set contains the second floor image sample of the floor corresponding to the second image sample set, the matching labels of the image sample pairs corresponding to all the first image sample sets are matched, and the matching labels of the image sample pairs corresponding to all the second image sample sets are not matched;

inputting each image sample pair into a preset twin neural network model for training to obtain corresponding n x (h-1+ r) image similarities;

and identifying corresponding n x (h-1+ r) prediction labels based on the n x (h-1+ r) image similarities, performing iterative training on the twin neural network model based on the corresponding prediction labels and matching labels of the n x (h-1+ r) image samples until the twin neural network model meets a preset convergence condition, and finishing the training of the twin neural network model.

With reference to the third possible implementation manner of the first aspect, in a fourth possible implementation manner of the first aspect, the inputting each image sample pair into a preset twin neural network model for training to obtain corresponding n × (h-1+ r) image similarities includes:

selecting an image sample pair to be processed from the image sample pair;

inputting two second floor image samples in the to-be-processed image sample pair into an upper half branch network and a lower half branch network in the twin neural network model respectively for processing to obtain corresponding output vectors of the upper half branch network and the lower half branch network;

calculating corresponding image similarity based on the vector distance between the output vector of the upper half branch network and the output vector of the lower half branch network;

and returning to execute the operation step of selecting one image sample pair to be processed from the image sample pair until obtaining n x (h-1+ r) image similarities corresponding to n x (h-1+ r) image samples.

With reference to the third possible implementation manner of the first aspect, in a fifth possible implementation manner of the first aspect, the network structure of the upper half-branch network and the lower half-branch network includes:

the device comprises an input layer, a first convolution layer, a first inactivation layer, a first pooling layer, a second convolution layer, a second inactivation layer, a second pooling layer, a single-dimension layer and a full-connection layer.

With reference to the third possible implementation manner of the first aspect and the sixth possible implementation manner of the first aspect, the network structure of the upper half branch network and the lower half branch network includes:

the input data of the input layer is a 28 × 28 × 1 image;

the convolution kernels and the number of the first convolution layer and the second convolution layer are both 5 multiplied by 6;

the neuron discarding probabilities of the first and second inactivation layers are both 0.1%;

the window sizes of the first pooling layer and the second pooling layer are both 2 × 2.

With reference to any one of the first to sixth possible implementation manners of the first aspect, in a seventh possible implementation manner of the first aspect, the inputting the plurality of image pairs into a pre-trained twin neural network model for processing, respectively, to obtain a plurality of corresponding image similarities includes:

selecting a pair of images to be processed from the pair of images;

inputting two images in the pair of images to be processed into an upper half branch network and a lower half branch network in a twin neural network model respectively for processing to obtain corresponding output vectors of the upper half branch network and the lower half branch network;

carrying out vector distance operation on the output vector of the upper half branch network and the output vector of the lower half branch network to obtain corresponding image similarity;

and returning to the operation step of selecting one image pair to be processed from the image pair until the image similarity corresponding to all the image pairs is obtained.

A second aspect of the embodiments of the present application provides an elevator floor recognition apparatus, including:

the image acquisition module is used for acquiring a floor image to be identified and a plurality of first floor image samples preset by the correspondence of n floors, wherein n is an integer greater than 1, each floor corresponds to at least one first floor image sample, and the floor image to be identified and the floor image samples all comprise display screens used for displaying floors.

And the image combination module is used for respectively combining the floor image to be identified with each first floor image sample to obtain a plurality of corresponding image pairs.

And the image processing module is used for respectively inputting the images into a pre-trained twin neural network model for processing to obtain a plurality of corresponding image similarities, wherein the twin neural network model is used for identifying the image similarities of the two floor images.

And the floor identification module is used for identifying the floor corresponding to the first floor image sample in the image pair with the highest image similarity as the floor corresponding to the floor image to be identified.

A third aspect of the embodiments of the present application provides a terminal device, which includes a memory and a processor, where the memory stores a computer program that is executable on the processor, and the processor implements the steps of the elevator floor identification method according to any one of the first aspect when executing the computer program.

A fourth aspect of an embodiment of the present application provides a computer-readable storage medium, including: stored computer program, characterized in that the computer program realizes the steps of the elevator floor identification method according to any of the first aspect when executed by a processor.

In a fifth aspect, the present application provides a computer program product, which when run on a terminal device, causes the terminal device to perform the steps of the elevator floor identification method according to any one of the first aspect.

It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

Compared with the prior art, the embodiment of the application has the advantages that: the floor images to be recognized and the floor image samples obtained by actual floor shooting are combined, the image pair obtained by combination is subjected to twin neural network model processing, the image characteristics of the floor images to be recognized and the floor image samples and the corresponding characteristic matching conditions (namely image similarity) can be extracted, so that the image matching accuracy in the embodiment of the application is high, even if the images have complex environment conditions of different illumination, insufficient definition, noise influence and the like, a better matching result can be obtained, the recognition accuracy is ensured, meanwhile, only the floor images to be recognized and the floor image samples of all floors need to be subjected to twin neural network model processing, the processed data volume is extremely small, the floor recognition speed is improved, and the instant elevator taking requirements of robots are met.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flow chart of an implementation of an elevator floor identification method according to an embodiment of the present application;

fig. 2 is a schematic flow chart of an implementation of an elevator floor identification method provided in the second embodiment of the present application;

fig. 3 is a schematic flow chart of an implementation of an elevator floor identification method provided in the third embodiment of the present application;

fig. 4 is a schematic flow chart of an implementation of an elevator floor identification method provided in the fourth embodiment of the present application;

fig. 5 is a schematic structural diagram of an elevator floor recognition device provided in the fifth embodiment of the present application;

fig. 6 is a schematic diagram of a terminal device provided in a sixth embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.

In order to facilitate understanding of the application, the embodiment of the application is briefly described here, and since the robot needs to quickly and accurately identify the floor where the elevator is located when the robot takes the elevator to ensure subsequent safe and normal elevator getting-on and getting-off running, the elevator floor identification method in the prior art directly identifies the image of the floor display screen, or the method is too simple to deal with the interference of complex environmental factors, so that the identification accuracy cannot be effectively ensured, or the method is too bulky to cause the identification speed to be too slow and the efficiency to be too low, so that the elevator taking requirement of the robot cannot be immediately responded.

In order to realize the rapid and accurate identification of the elevator floor, the embodiment of the application combines the floor image to be identified and the floor image sample obtained by the actual shooting of the floor, and processes the image pair obtained by combination by a twin neural network model, therefore, the image characteristics of the floor image to be identified and the floor image sample and the corresponding characteristic matching condition (namely image similarity) can be extracted, so that the image matching accuracy in the embodiment of the application is high, even under the complex environment conditions of different illumination, insufficient definition, noise influence and the like of the image, a better matching result can be obtained, the accuracy of identification is ensured, meanwhile, only the floor image to be identified and the floor image samples of all floors need to be subjected to twin neural network model processing, the data volume of processing is extremely small, the speed of floor recognition is improved, and the instant elevator taking requirement of the robot is met. The details are as follows:

fig. 1 shows a flowchart of an implementation of an elevator floor identification method according to an embodiment of the present application, which is detailed as follows:

s101, acquiring a floor image to be identified and a plurality of preset first floor image samples corresponding to n floors, wherein n is an integer larger than 1, each floor corresponds to at least one first floor image sample, and the floor image to be identified and the floor image samples all comprise display screens used for displaying floors.

Firstly, it should be noted that in the embodiment of the present application, in order to identify the floor where the elevator is located to help the robot to safely take the elevator, that is, the final purpose of the present application is to provide timely and accurate elevator floor information for the robot, on this basis, the execution subject of the embodiment of the present application may be the robot, or may be other devices that can acquire floor images in the elevator and process the identification, when the execution subject is the robot, the robot in the embodiment of the present application takes a picture by its own camera, or acquires a floor image taken by other cameras (for example, acquires a floor image taken by monitoring the elevator by means of wireless transmission), and then the robot processes the acquired floor image to identify the floor where the elevator is located, when the execution subject is other devices, the other devices also acquire the elevator floor image first and then process and identify the floor image, and finally, sending the identified floor to the robot, wherein the mode of acquiring the floor map includes but is not limited to shooting the floor image by the robot or elevator monitoring equipment and the like, and then transmitting the floor image to other equipment serving as an executive body.

Meanwhile, it should be noted that, in the embodiment of the present application, the floor image to be recognized refers to a floor image that needs to be currently recognized, and may refer to a floor image obtained by real-time shooting, or a floor image stored in advance, which needs to be determined by actual application scene requirements.

In order to realize the identification of the floor images, in the embodiment of the present application, at least one first floor image sample is preset for each floor respectively to serve as an object for image comparison, wherein the number of the first floor image samples selected for each floor can be set by a technician, preferably, the number corresponding to each floor is the same, and simultaneously, the preferred number can be set to 2 or 3, and in order to improve the accuracy of image identification, preferably, a plurality of floor image samples are shot for each floor display screen in advance under the conditions of different shooting angles, light brightness and the like, and then the first floor image sample with better image quality is selected from the floor image samples.

And S102, combining the floor image to be identified with each first floor image sample respectively to obtain a plurality of corresponding image pairs.

And S103, respectively inputting the plurality of image pairs into a pre-trained twin neural network model for processing to obtain a plurality of corresponding image similarities, wherein the twin neural network model is used for identifying the image similarities of the two floor images.

In order to realize the processing and floor identification of a floor image to be processed, a twin neural network model which can be used for identifying the image similarity of two images is trained in advance in the embodiment of the application, wherein the first half part of the twin convolutional neural network based on the similarity comprises two branch networks, the two branch networks are characteristic learning branch networks, the structures and parameters of the two branch networks are the same, so that the twin neural network is called as 'twin', the second half part of the twin convolutional neural network based on the similarity is a similarity learning network which comprises two fully-connected layers and an output layer, the output layer only has one neuron, and predicted values of the similarity of the two input images are output.

On the basis that the needed twin neural network model is trained in advance, the floor image to be recognized and the first floor image sample of each floor are combined pairwise to obtain a plurality of corresponding image pairs, the image pairs are input into the twin neural network model respectively, and the image similarity corresponding to the image pairs is calculated to provide data for subsequent floor recognition.

And S104, identifying the floor corresponding to the first floor image sample in the image pair with the highest image similarity as the floor corresponding to the floor image to be identified.

After the image similarity corresponding to each image pair is calculated, the image pair with the highest image similarity is screened out in the embodiment of the application, and the floor corresponding to the first floor image sample is identified as the floor corresponding to the image to be identified, so that the elevator floor can be accurately and quickly identified.

In the embodiment of the application, the floor image to be identified and the floor image sample obtained by actually shooting the floor are combined, and the image pair obtained by combining is processed by the twinning neural network model, so that the image characteristics of the floor image to be identified and the floor image sample and the corresponding image similarity can be extracted, the image matching accuracy in the embodiment of the application is high, a better matching result can be obtained even under the complex environments of different illumination, insufficient definition, noise influence and the like of the image, the identification accuracy can be ensured under the conditions of certain deformation, rotation, displacement and the like of the shot image sample by arranging at least one first floor image sample for comparison processing on each floor, and meanwhile, only the floor image to be identified and the floor image sample of each floor need to be processed by the twinning neural network model, the processing data volume is extremely small, the processing efficiency is higher compared with a neural network model with a huge and bulky network structure, the speed of floor recognition is improved, and the instant elevator taking requirement of the robot is met.

As an embodiment of the present application, in order to improve the accuracy of processing and identifying floor images as much as possible, before performing image pair combination on the floor images to be identified, the method further includes:

and carrying out image smoothing on the floor image to be identified.

When the elevator display screen is shot in an actual field, the situation that the shot floor image is too dark or too bright often occurs due to the problems of ambient light and the like, and the situation that the floor image is too dark or too bright is not suitable for directly conducting image processing matching, so that the effectiveness of subsequent image processing matching is improved, the floor image to be recognized can be subjected to smoothing processing after the floor image to be recognized is obtained, and the brightness of the image is smoothed, wherein a specific smoothing processing method is not limited here, and can be set by a technician according to actual requirements.

As a specific implementation way for carrying out image smoothing processing, the new application embodiment adopts a Gaussian filter operator, the Gaussian template is a point which properly increases the weight of the central point of the template, and the weight is rapidly reduced along with the distance from the central point, so that the central point is ensured to be closer to the point closer to the user. The gaussian template is just a continuous two-dimensional gaussian discretization representation, so that a gaussian template with an arbitrary size can be obtained by establishing a matrix M of (2k +1) × (2k +1), where the (i, j) position element can be determined as shown in the following formula (1):

wherein k is a technician preset constant.

As another specific implementation of the image smoothing processing of the present application, the average value of the pixel brightness of the image may be obtained first, and then the corresponding average value is subtracted from all the pixel brightness values, so as to smooth the brightness.

As an embodiment of the present application, corresponding to the above smoothing processing on the floor image to be recognized, in this embodiment of the present application, the same smoothing processing may also be performed on the first floor image samples of each floor acquired in advance, so as to improve the accuracy and reliability of matching between the first floor image sample and the floor image to be recognized.

Considering that only the display screen area actually contains floor information in the floor image, and other non-display screen area images belong to a noise part in floor recognition, if the photographed floor image to be recognized is directly processed and matched, on one hand, the workload of the whole process is increased, and on the other hand, the accuracy and reliability of image matching are affected by introducing excessive noise.

The specific image clipping method is not limited herein, and may be specifically set by the technician according to the actual requirements, including but not limited to, for example, performing pixel screening based on the brightness of the pixel points, screening out the brightness greater than the brightness threshold, and the number of pixel points in the connected region greater than the number threshold, and the connected region graph is an image region with a preset display screen shape, the image region is identified as the display screen region for image cutting, the brightness threshold may be a preset fixed value, or may be calculated from the brightness threshold as an average value x a preset coefficient after calculating the average value, to screen out the display screen pixel points with higher brightness value from the image, the number threshold is set by the technician, the method is used for eliminating noises with high brightness values such as mobile phone display screens, the preset display screen shape can be one or more, and specifically, technical personnel need to set the preset display screen shape according to the display screen condition in the actual application scene.

As a specific implementation manner of pre-training the twin neural network model in the present application, as shown in fig. 2, an embodiment of the present application includes:

s201, n preset first image sample sets and n preset second image sample sets are obtained, wherein each first image sample set uniquely corresponds to one floor, each floor image sample set comprises h second floor image samples corresponding to the floors, each second image sample set uniquely corresponds to one floor, each second image sample set comprises one second floor image sample corresponding to the floor and r second floor image samples not corresponding to the floors, h and r are positive integers, and h is greater than r.

In order to train the twin neural network model, preparation of a training sample is first required. The final target of the embodiment of the application is to obtain a twin neural network model capable of identifying the similarity of any two floor images, and in practical situations, buildings with elevators have a plurality of different floors, so that corresponding samples are distributed and set for each floor in the embodiment of the application.

Specifically, in the embodiment of the present application, 1 first image sample set and 1 second image sample set are set for each floor, so as to obtain n first image sample sets and n second image sample sets required, wherein h second floor image samples corresponding to a certain floor are stored in the first image sample set, that is, h second floor image samples are taken for the floor in advance, and the second floor image samples are stored as the first image sample set in a unified manner, since the second floor image samples are all floor images shot for the floor, the corresponding floor numbers are all the same, and only one second floor image sample of the floor is stored in the second image sample set, and r second floor image samples which are not the floor are stored at the same time, wherein the r second floor image samples in the second image sample set are different from the floor. The specific values of h and r can be set by a technician according to actual requirements, and preferably, r is a value which is equal to the total floor number n-1 in order to ensure the effectiveness of the trained model as much as possible. In addition, the h second floor image samples are preferably photographed under different environmental factor states as much as possible, for example, the images are photographed under different running states, different photographing angles and different light rays of the elevator, so as to enhance the effectiveness of the samples, so that the model can cope with different environmental interferences as much as possible, and the accuracy and reliability of image matching are improved, specifically, environmental factors which can affect the quality of the photographed images can be listed, and corresponding possible states can be determined, for example, it is assumed that the environmental factors which affect the photographing include: the method comprises the steps of obtaining a corresponding possible scene by combining the possible states, wherein the elevator running state comprises a stopping state and a running state, the shooting angle comprises 60 degrees, 90 degrees and 120 degrees, the light intensity comprises weak, moderate and strong, and then the possible states are combined to obtain the corresponding possible scene, the 2 x 3 x 18 scenes can be obtained by combining the states, and finally, at least one corresponding second floor image sample is shot under each possible scene, so that the shot of the corresponding h second floor image samples is obtained, and at the moment, h is equal to the total number of the calculated possible scenes. Meanwhile, for the shot number of r second floor image samples, it is preferable that the second floor image samples of all other floors are included, that is, for each other floor except the floor, at least one corresponding second floor image sample is preset to cope with all floor image contrast situations that may actually occur, so that n-1 is not less than r at this time.

Therefore, through the analysis, in the embodiment of the application, the second floor image samples in each first image sample set are all directed to the same floor so as to provide training sample data of the same floor for the training of the twin neural network model, and meanwhile, the trained twin neural network model can cope with the interference of various different environmental factors by shooting the first image samples in different environmental factors and concentrating the second floor image samples, so that the anti-interference capability of the model is improved, and the model is more stable and reliable. The second floor image samples in each second image sample set point to different floors, training sample data of different floors are provided for the training of the twin neural network model, and meanwhile, the accuracy of the trained twin neural network model in comparison and identification of the images of different floors can be improved.

S202, combining image samples in the sets of each first image sample set and each second image sample set to obtain n x (h-1+ r) image sample pairs containing two second floor image samples, and adding corresponding matching labels to each image sample pair, wherein the image sample pair corresponding to each second image sample set contains the second floor image sample of the floor corresponding to the second image sample set, the matching labels of the image sample pairs corresponding to all the first image sample sets are matched, and the matching labels of the image sample pairs corresponding to all the second image sample sets are not matched.

After the two types of required image sample sets are obtained, the image samples are further combined to obtain an image sample pair serving as actual model training data in the embodiment of the present application, specifically:

for each first image sample set, because the second floor image samples contained in the first image sample set are all the same floor, when two sets of images are combined, corresponding n x (h-1) image sample pairs can be obtained through random combination, or one second floor image sample can be selected from the two sets of image samples and then is respectively combined with other second floor image samples to obtain corresponding n x (h-1) image sample pairs, so that various possible scene combination responses can be realized. For each second image sample set, since only one second image sample of the corresponding floor is included, in the embodiment of the present application, the matching is performed based on the one second image sample of the corresponding floor, so as to obtain corresponding n × r image sample pairs, so as to implement the comparison and matching for various possible different floor combinations.

S203, inputting each image sample pair into a preset twin neural network model for training to obtain corresponding n x (h-1+ r) image similarities.

In the embodiment of the present application, the setting manner of the initial parameter value of the twin neural network model is not limited herein, and includes but is not limited to setting by a technician, or randomly generating, or initializing and generating in a normal distribution manner. On the basis of setting initial parameter values, inputting the image sample pairs into a value twin neural network model respectively to obtain corresponding n x (h-1+ r) image similarities.

And S204, identifying corresponding n x (h-1+ r) prediction labels based on the n x (h-1+ r) image similarities, performing iterative training on the twin neural network model based on the corresponding prediction labels and matching labels of the n x (h-1+ r) image samples until the twin neural network model meets a preset convergence condition, and finishing the training of the twin neural network model.

In the embodiment of the present application, a model convergence condition is preset, which may be specifically set by a technician according to actual requirements, for example, one or a combination of multiple conditions of the maximum number of iterative training, the loss function threshold, and the tag accuracy threshold, and when the model training process satisfies one of the convergence conditions, for example, when the training number reaches the maximum number, the training of the model is stopped, so as to obtain a corresponding trained twin neural network model.

In step S203, the image similarity of the twin neural network model that is not trained is obtained for each image sample pair, and at this time, in the embodiment of the present application, a corresponding prediction label is further added to each image sample pair based on the image similarities, specifically, a similarity threshold is preset in the embodiment of the present application, for each image sample pair, if the image similarity is greater than or equal to the similarity threshold, it is determined that the corresponding prediction labels are matched, otherwise, if the image similarity is smaller than the similarity threshold, it is determined that the corresponding prediction labels are not matched, so that n × (h-1+ r) prediction labels are obtained.

After the corresponding prediction labels are obtained, the embodiment of the application further calculates the recognition capability of the current twin neural network model based on the prediction labels and the matching labels of all the image sample pairs, for example, calculates the corresponding prediction accuracy or the corresponding model loss function, when the recognition capability meets the convergence condition, the twin neural network model training is considered to meet the required requirement, the training is completed, if the convergence condition is not met, the recognition capability is still not met, at the moment, model parameter adjustment, image sample pair processing and the like are carried out based on the current twin neural network model, that is, iterative training is carried out on the twin neural network model continuously until the convergence condition is met.

As a specific implementation manner of determining whether the convergence condition is satisfied in the second embodiment of the present application, in the present application, after the prediction tag is obtained, a loss function of the first twin neural network model is calculated to determine whether convergence is satisfied based on the loss function value, which is described in detail as follows:

first loss functions Lw1 corresponding to the n first image sample sets and second loss functions Lw2 corresponding to the n second image sample sets are calculated.

And (3) processing the first loss function and the second loss function based on the formula (2) to calculate a third loss function Lw3 corresponding to the twin neural network model.

Lw3＝(1-)Lw1+yLw2 (2)

Wherein y is the type of the matching label corresponding to the image sample pair, and if the matching label is matched, y is 1, and if the matching label is not matched, y is 0. The specific loss function calculation method adopted by Lw1 and Lw2 is not limited herein, and can be set by the skilled person according to the actual requirement.

As a specific implementation manner of calculating the image similarity of the image sample pair in the second embodiment of the present application, as shown in fig. 3, the third embodiment of the present application includes:

s301, selecting one to-be-processed image sample pair from the image sample pairs.

S302, respectively inputting two second floor image samples in the image sample pair to be processed into an upper half branch network and a lower half branch network in the twin neural network model for processing to obtain corresponding output vectors of the upper half branch network and the lower half branch network.

Since parameters of the upper half branch network and the lower half branch network in the twin neural network model in the embodiment of the present application are completely the same, in the embodiment of the present application, it is all right to perform branch network division on the second floor image sample in the image sample pair.

In the embodiment of the application, the branch network is used for performing image convolution feature extraction and vectorization on the floor images, so that each input floor image can be extracted with corresponding image features, and corresponding feature vectors are output. The specific branch network structure is not limited herein, and can be set by a technician according to actual requirements.

And S303, calculating the corresponding image similarity based on the vector distance between the output vector of the upper half branch network and the output vector of the lower half branch network.

The image similarity is higher as the vector distance is shorter, and the specific vector calculation method is not limited herein, including but not limited to manhattan distance between vectors, and performing full join operation with degree according to the manhattan distance to obtain the corresponding image similarity.

S304, the operation step of selecting one to-be-processed image sample pair from the image sample pair is returned to be executed until n x (h-1+ r) image similarities corresponding to n x (h-1+ r) image samples are obtained.

As a specific implementation manner of the network structure of the upper half branch network and the lower half branch network in the third embodiment of the present application, the implementation manner includes:

The function of each network structure layer is described as follows:

and (3) rolling layers: the method is mainly used for feature extraction (that is, mapping original data to a hidden layer feature space) of an input image (such as a training sample or an image to be identified), where the size of a convolution kernel may be determined according to an actual application, and optionally, in order to improve the expressive ability of a model, a non-linear factor may be added by adding an activation function, and in this embodiment, the activation function may be set as a relu function.

Deactivation layer: the method is used for randomly discarding the neurons, ensures the generalization capability of the framework and avoids overfitting.

A pooling layer: the method is used for further reducing the feature matrix, reducing the size of the matrix, improving the computing power and reducing the training time.

Single dimension layer: for unfolding the feature matrix into a one-dimensional vector matrix.

Full connection layer: the learned "distributed feature representation" may be mapped to a sample label space, which mainly functions as a "classifier" in the whole convolutional neural network, and each node of the fully-connected layer is connected to all nodes of the output of the previous layer, where one node of the fully-connected layer is referred to as one neuron in the fully-connected layer, and the number of neurons in the fully-connected layer may be determined according to the requirements of the practical application, for example, in the upper half branch network and the lower half branch network of the twin neural network model, the number of neurons in the fully-connected layer may be set to 512 each, or may be set to 128 each, and so on. Similar to the convolutional layer, optionally, in the fully-connected layer, a non-linear factor may be added by adding an activation function, for example, an activation function sigmoid (sigmoid function) may be added.

In the embodiment of the application, because the number of layers is shallow, the convergence speed is high, that is, the twin neural network model provided by the embodiment of the application not only occupies less computing resources (light weight), but also has high recognition speed and high efficiency in the operation time period, and can meet the real-time requirement of taking the elevator by the robot.

As a specific implementation manner of setting network structure parameters of the upper half branch network and the lower half branch network in the embodiment of the present application, the implementation manner includes:

the input data of the input layer is a 28 × 28 × 1 image.

The convolution kernels and the number of the first convolution layer and the second convolution layer are both 5 multiplied by 6.

The neuron discarding probability was 0.1% for both the first and second inactive layers.

In order to realize rapid and effective feature extraction and vectorization of the floor images, in the embodiment of the application, parameters of each layer of the network structure are specifically set correspondingly, and on the basis, the processing process of the branch network on the floor images is as follows in sequence:

in order to meet the requirement of the input layer on the image size, in the embodiment of the present application, the sample size of the second floor image needs to be adjusted first, so that the image of the final input branch network input layer is a 28 × 28 × 1 image. Preferably, the image of the input layer may be grayed out to improve the processing efficiency of the image.

The first convolution layer performs convolution calculation on the output image of the input layer by using 6 convolution kernels of 5 × 5 to obtain a corresponding feature matrix of 23 × 23 × 6.

Then, through the treatment of a first inactivation layer, 0.1% of neurons are randomly discarded, and a characteristic matrix of 23 multiplied by 6 is obtained, so that the generalization capability of the model is ensured not to be over-fitted.

Inputting the obtained 23 × 23 × 6 feature matrix into the first pooling layer, extracting a corresponding 12 × 12 × 6 feature matrix by using a 2 × 2 pooling layer window, reducing the feature matrix, reducing the matrix size, improving the computing power, and reducing the training time.

The feature matrix of 12 × 12 × 6 size is input to the second convolution layer and processed by 6 convolution kernels of 5 × 5 size to obtain a corresponding feature matrix of 7 × 7 × 16.

And (3) treating by a second inactivation layer, and randomly discarding 0.1% of neurons to obtain a 7 × 7 × 16 characteristic matrix, so as to ensure that the generalization capability of the model is not over-fitted.

The obtained 7 × 7 × 16 feature matrix is input to the first pooling layer, and a corresponding 4 × 4 × 16 feature matrix is extracted using a 2 × 2 pooling layer window.

The single-dimension layer expands the feature matrix with the size of 4 multiplied by 16, arranges the feature matrix according to a plurality of rows and a column to obtain the corresponding feature matrix with the size of 256 multiplied by 1, and then processes the feature matrix by the full-connection layer to obtain the corresponding feature vector with the size of 128 multiplied by 1.

In the embodiment of the application, a simple and clear network structure is set for the twin neural network model, so that overfitting can be prevented under the conditions of small data volume and strong generalization capability when the robot takes the elevator, and meanwhile, compact and short depth characteristics can be obtained without manually marking samples, so that the embodiment of the application can realize rapid and accurate similarity calculation of images, and the high efficiency and reliability of the image similarity calculation are ensured.

As a fourth embodiment of the present application, in addition to the above-described embodiments of the present application, as shown in fig. 4, the present application includes:

s401, selecting a pair of images to be processed from the pair of images.

S402, respectively inputting two images in the pair of images to be processed into an upper half branch network and a lower half branch network in the twin neural network model for processing to obtain corresponding output vectors of the upper half branch network and the lower half branch network.

And S403, performing vector distance operation on the output vector of the upper half branch network and the output vector of the lower half branch network to obtain corresponding image similarity.

S404, returning to the operation step of selecting one image pair to be processed from the image pair until the image similarity corresponding to all the image pairs is obtained.

The principle of the fourth embodiment of the present application is the same as that of the third embodiment of the present application and other related embodiments, and specific reference may be made to other related embodiments in this specification, which are not repeated herein.

Fig. 5 shows a block diagram of the structure of the elevator floor recognition device provided in the embodiment of the present application, corresponding to the method of the above embodiment, and only the parts related to the embodiment of the present application are shown for convenience of explanation. The elevator floor recognition device illustrated in fig. 5 may be an execution subject of the elevator floor recognition method provided in the first embodiment.

Referring to fig. 5, the elevator floor recognition apparatus includes:

the image obtaining module 51 is configured to obtain a floor image to be identified and a plurality of preset first floor image samples corresponding to n floors, where n is an integer greater than 1, each floor corresponds to at least one first floor image sample, and the floor image to be identified and the floor image samples all include a display screen for displaying floors.

And the image combination module 52 is configured to combine the floor image to be identified with each first floor image sample respectively to obtain a plurality of corresponding image pairs.

And the image processing module 53 is configured to input the plurality of image pairs to a pre-trained twin neural network model respectively for processing, so as to obtain a plurality of corresponding image similarities, where the twin neural network model is used to identify the image similarities of two floor images.

And the floor identification module 54 is configured to identify a floor corresponding to the first floor image sample in the image pair with the highest image similarity as a floor corresponding to the floor image to be identified.

Further, this elevator floor recognition device still includes:

and the smoothing module is used for carrying out image smoothing treatment on the floor image to be identified.

Further, this elevator floor recognition device still includes:

and the judging module is used for cutting the image of the floor image to be identified to obtain the floor image to be identified only comprising the display screen.

Further, this elevator floor recognition device still includes:

the system comprises a sample acquisition module, a storage module and a display module, wherein the sample acquisition module is used for acquiring n preset first image sample sets and n preset second image sample sets, each first image sample set uniquely corresponds to one floor, each floor image sample set comprises h second floor image samples corresponding to the floors, each second image sample set uniquely corresponds to one floor, each second image sample set comprises one second floor image sample corresponding to the floor and r second floor image samples not corresponding to the floors, h and r are positive integers, and h > r.

And the sample combination module is used for combining the image samples in the set for each first image sample set and each second image sample set to obtain n x (h-1+ r) image sample pairs containing two second floor image samples, and adding corresponding matching labels to each image sample pair, wherein the image sample pair corresponding to each second image sample set contains the second floor image sample of the floor corresponding to the second image sample set, the matching labels of the image sample pairs corresponding to all the first image sample sets are matched, and the matching labels of the image sample pairs corresponding to all the second image sample sets are not matched.

And the model training module is used for inputting each image sample pair into a preset twin neural network model for training to obtain corresponding n x (h-1+ r) image similarities.

And the model convergence module is used for identifying corresponding n x (h-1+ r) prediction labels based on the n x (h-1+ r) image similarities, carrying out iterative training on the twin neural network model based on the corresponding prediction labels and matching labels of the n x (h-1+ r) image samples until the twin neural network model meets a preset convergence condition, and finishing the training of the twin neural network model.

Further, the model training module comprises:

and selecting one image sample pair to be processed from the image sample pairs.

And respectively inputting two second floor image samples in the to-be-processed image sample pair into an upper half branch network and a lower half branch network in the twin neural network model for processing to obtain corresponding output vectors of the upper half branch network and the lower half branch network.

And calculating the corresponding image similarity based on the vector distance of the output vector of the upper half branch network and the output vector of the lower half branch network.

Further, the network structure of the upper half branch network and the lower half branch network includes:

Further, the network structure of the upper half branch network and the lower half branch network further includes:

the input data of the input layer is a 28 × 28 × 1 image.

The convolution kernels of the first convolution layer and the second convolution layer are 5 multiplied by 6.

The neuron discarding probabilities of the first and second inactivation layers are both 0.1%.

Further, the image processing module 53 includes:

and selecting a pair of images to be processed from the pair of images.

And respectively inputting the two images in the pair of images to be processed into an upper half branch network and a lower half branch network in the twin neural network model for processing to obtain corresponding output vectors of the upper half branch network and the lower half branch network.

And carrying out vector distance operation on the output vector of the upper half branch network and the output vector of the lower half branch network to obtain corresponding image similarity.

The process of implementing each function by each module in the elevator floor recognition device provided in the embodiment of the present application may specifically refer to the description of the first embodiment shown in fig. 1, and is not described herein again.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance. It will also be understood that, although the terms first, second, etc. may be used herein to describe various elements in some embodiments of the application, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first table may be named a second table, and similarly, a second table may be named a first table, without departing from the scope of various described embodiments. The first table and the second table are both tables, but they are not the same table.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

The elevator floor identification method provided by the embodiment of the application can be applied to terminal devices such as mobile phones, tablet computers, wearable devices, vehicle-mounted devices, Augmented Reality (AR)/Virtual Reality (VR) devices, notebook computers, ultra-mobile personal computers (UMPCs), netbooks, Personal Digital Assistants (PDAs), and the like, and the embodiment of the application does not limit the specific types of the terminal devices at all.

For example, the terminal device may be a Station (ST) in a WLAN, which may be a cellular phone, a cordless phone, a Session Initiation Protocol (SIP) phone, a Wireless Local Loop (WLL) station, a Personal Digital Assistant (PDA) device, a handheld device with wireless communication capability, a computing device or other processing device connected to a wireless modem, a vehicle-mounted device, a vehicle-mounted networking terminal, a computer, a laptop, a handheld communication device, a handheld computing device, a satellite wireless device, a wireless modem card, a television set-top box (STB), a Customer Premises Equipment (CPE), and/or other devices for communicating over a wireless system and a next generation communication system, e.g., a Mobile terminal in a 5G Network or a Public Land Mobile Network (future evolved, PLMN) mobile terminals in the network, etc.

By way of example and not limitation, when the terminal device is a wearable device, the wearable device may also be a generic term for intelligently designing daily wearing by applying wearable technology, developing wearable devices, such as glasses, gloves, watches, clothing, shoes, and the like. A wearable device is a portable device that is worn directly on the body or integrated into the clothing or accessories of the user. The wearable device is not only a hardware device, but also realizes powerful functions through software support, data interaction and cloud interaction. The generalized wearable intelligent device has the advantages that the generalized wearable intelligent device is complete in function and large in size, can realize complete or partial functions without depending on a smart phone, such as a smart watch or smart glasses, and only is concentrated on a certain application function, and needs to be matched with other devices such as the smart phone for use, such as various smart bracelets for monitoring physical signs, smart jewelry and the like.

Fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 6, the terminal device 6 of this embodiment includes: at least one processor 60 (only one shown in fig. 6), a memory 61, said memory 61 having stored therein a computer program 62 executable on said processor 60. The processor 60, when executing the computer program 62, implements the steps in the various elevator floor identification method embodiments described above, such as the steps 101-104 shown in fig. 1. Alternatively, the processor 60, when executing the computer program 62, implements the functions of the modules/units in the above-mentioned device embodiments, such as the functions of the modules 51 to 54 shown in fig. 5.

The terminal device 6 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 60, a memory 61. It will be appreciated by those skilled in the art that fig. 6 is merely an example of a terminal device 6 and does not constitute a limitation of the terminal device 6 and may include more or less components than those shown, or some components may be combined, or different components, for example the terminal device may also include an input transmitting device, a network access device, a bus, etc.

The Processor 60 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 61 may in some embodiments be an internal storage unit of the terminal device 6, such as a hard disk or a memory of the terminal device 6. The memory 61 may also be an external storage device of the terminal device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 6. Further, the memory 61 may also include both an internal storage unit and an external storage device of the terminal device 6. The memory 61 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 61 may also be used to temporarily store data that has been transmitted or is to be transmitted.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.

The embodiments of the present application provide a computer program product, which when running on a mobile terminal, enables the mobile terminal to implement the steps in the above method embodiments when executed.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application, and are intended to be included within the scope of the present application.

Claims

1. An elevator floor identification method, comprising:

2. The elevator floor identification method of claim 1, prior to said combining said floor image to be identified with each of said first floor image samples, further comprising:

and carrying out image smoothing treatment on the floor image to be identified.

3. The elevator floor identification method of claim 1, prior to said combining said floor image to be identified with each of said first floor image samples, further comprising:

4. The elevator floor identification method of claim 1 wherein the training process for the twin neural network model comprises:

5. The method for identifying elevator floors according to claim 4, wherein the step of inputting each image sample pair into a preset twin neural network model for training to obtain corresponding n x (h-1+ r) image similarities comprises the following steps:

selecting an image sample pair to be processed from the image sample pair;

6. The elevator floor identification method of claim 5 wherein the network structure of the upper half branch network and the lower half branch network comprises:

7. The elevator floor identification method of claim 6, comprising:

the input data of the input layer is a 28 × 28 × 1 image;

8. The elevator floor recognition method according to any one of claims 1 to 7, wherein the inputting of the plurality of image pairs into a pre-trained twin neural network model for processing to obtain a plurality of corresponding image similarities comprises:

selecting a pair of images to be processed from the pair of images;

9. A terminal device, characterized in that the terminal device comprises a memory, a processor, a computer program being stored on the memory and being executable on the processor, the processor implementing the steps of the method according to any of claims 1 to 8 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.