CN112509011B

CN112509011B - Static commodity statistical method, terminal equipment and storage medium thereof

Info

Publication number: CN112509011B
Application number: CN202110169851.9A
Authority: CN
Inventors: 丁明; 李海荣; 陈永辉
Original assignee: Guangzhou Xuanwu Wireless Technology Co Ltd
Current assignee: Guangzhou Xuanwu Wireless Technology Co Ltd
Priority date: 2021-02-08
Filing date: 2021-02-08
Publication date: 2021-05-25
Anticipated expiration: 2041-02-08
Also published as: CN112509011A

Abstract

The invention discloses a static commodity statistical method, which comprises the following steps: performing target detection and feature extraction on video data according to a target detector to obtain a detection target list and a tracking target list, and calculating a presumed position of a tracking target according to the tracking target list; carrying out weight optimal matching according to the detection target list and the tracking target list; if the detection target is matched with the tracking target, updating the tracking target list, if the detection target is not matched with the tracking target, establishing new tracking information for the detection target, adding the new tracking information into the tracking target list, and updating the current position for the tracking target; and deleting the tracking targets matched and identified for many times in the tracking list, and counting the types and the number of the tracking targets appearing in the tracking process after the video data is processed. Therefore, by dynamic tracking and identification of videos, the types and the number of commodities can be accurately checked in real time, and the positions of the commodities can be confirmed.

Description

Static commodity statistical method, terminal equipment and storage medium thereof

Technical Field

The invention relates to the technical field of image recognition, in particular to a static commodity statistical method, terminal equipment and a storage medium thereof.

Background

With the development of retail industry, more and more commodities are sold through off-line stores, supermarkets, markets, large stores and the like, and commodity shelves are often adopted for displaying and displaying the commodities for the convenience of customers to browse or purchase the commodities. Therefore, the detection, identification and number statistics of the static commodity position targets on the shelf display have wide application value, for example, a salesperson needs to count various commodities when laying the commodities so as to know the shortage situation and the commodity arrangement position situation of each commodity.

Most of the commonly used scene target detection and number statistics are carried out based on multi-picture splicing, the commodity display state cannot be tracked in real time, and then the number of commodities is effectively counted.

Disclosure of Invention

The invention aims to provide a static commodity counting method, which can accurately count the types and the number of commodities and confirm the positions of the commodities in real time.

In order to achieve the above object, an embodiment of the present invention provides a static commodity statistics method, including:

performing target detection and feature extraction on video data according to a target detector to obtain a detection target list and a tracking target list, wherein the detection target list comprises a detection frame, a classification category and feature expression;

calculating a speed vector of the movement of the tracking target according to the tracking target list, and calculating the presumed position of the tracking target according to the speed vector;

calculating the feature similarity between a detection target and the tracking target according to the detection target list and the tracking target list, and performing weight optimal matching according to the feature similarity;

if the detection target is matched with the tracking target, updating the tracking target list according to the detection frame and the feature expression of the current detection target, if the detection target is not matched with the tracking target, establishing new tracking information for the detection target, adding the new tracking information into the tracking target list, and updating the current position for the tracking target;

and deleting the tracking targets matched and identified for many times in the tracking list, and counting the types and the number of the tracking targets appearing in the tracking process after the video data is processed.

In one embodiment, the method further comprises training a target detection model according to a target detection algorithm and a data set of the commodity to obtain the target detector.

In one embodiment, the target detection algorithm includes Faster Rcnn, SSD, and RefineDet.

In one embodiment, the calculating a velocity vector of the movement of the tracking target according to the tracking target list and the calculating the estimated position of the tracking target according to the velocity vector specifically include:

calculating the velocity vector of the motion of the tracking target, wherein the calculation formula is as follows:

wherein, S [0]]For the speed of movement of the tracked object in the X-axis direction, S1]For the velocity of the tracked target moving in the Y-axis direction, (x)₁,y₁) Previous target position, n, updated in real time for the tracked target₁For its corresponding video frame number, (x)₂,y₂) First and second target positions, n, updated in real time for the tracked target₂The corresponding video frame number is;

calculating the tracked target current inference position P (x)₀,y₀) The formula is as follows:

the next frame position of each target in the tracking target list is marked as the current presumed position (x)₀,y₀)。

In one embodiment, the position of the tracking target is the central point of a target rectangular frame, and the width and height of the rectangular frame are kept unchanged.

In one embodiment, the calculating a feature similarity between the detection target and the tracking target according to the detection target list and the tracking target list, and performing weight optimal matching according to the feature similarity specifically includes:

calculating the characteristic cosine distance between the tracking target and the detection target, wherein the formula is as follows:

wherein d is_nFor the nth detected target in the detected target list, t_mFor the mth tracking target in the tracking target list, A is d_nB is t_mA, B are all one-dimensional vector features;

calculating the distance similarity between the tracking target and the detection target, wherein the formula is as follows:

representing an object d_nAnd t_mValue of smaller area, p₁Is d_nTarget rectangular position, p₂Is t_mThe position of the rectangle is inferred and,

representing a rectangle intersection area value;

calculating the similarity of fusion characteristics matching the tracking target and the detection target, wherein the formula is as follows:

wherein k is an empirical value of 0.4;

and calculating the optimal matching of the weight according to the fusion feature similarity and the optimal matching algorithm.

In one embodiment, the best match algorithm comprises a KM algorithm.

In a certain embodiment, the updating the current position for the tracking target specifically includes:

searching a target set in which the previous frame and the current frame in the tracking target list are successfully matched;

finding a target with the center of the target of the previous frame closest to the tracking target in the target set, and calculating the relative position of the target and the tracking target, wherein the formula is as follows:

wherein M is_mDetecting a frame position, P, for a previous frame of the tracked object_nDetecting the position of a frame in the last frame of the target;

updating the current position of the tracking target, wherein the formula is as follows:

wherein Q is_mIs the current position of the target.

The embodiment of the invention also provides a static commodity counting device, which is applied to the static commodity counting method in any one of the embodiments. The method comprises the following steps:

the video data extraction module is used for carrying out target detection and feature extraction on video data according to a target detector to obtain a detection target list and a tracking target list, wherein the detection target list comprises a detection frame, classification categories and feature expressions;

the position presumption module is used for calculating a speed vector of the movement of the tracking target according to the tracking target list and calculating a presumed position of the tracking target according to the speed vector;

the optimal weight matching module is used for calculating the feature similarity between the detection target and the tracking target according to the detection target list and the tracking target list and performing optimal weight matching according to the feature similarity;

an information updating module, configured to update the tracking target list according to the detection frame and the feature expression of the current detection target if the detection target is matched with the tracking target, and establish new tracking information for the detection target and add the new tracking information to the tracking target list if the detection target is not matched with the tracking target, and update the current position for the tracking target;

and the counting module is used for deleting the tracking targets which are matched and identified for many times in the tracking list, and counting the types and the number of the tracking targets appearing in the tracking process after the video data is processed.

The embodiment of the invention also provides computer terminal equipment which comprises one or more processors and a memory. A memory coupled to the processor for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement a static merchandise statistics method as described in any of the embodiments above.

An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a static commodity statistics method according to any one of the above embodiments.

In the static commodity statistical method provided by the embodiment of the invention, the types and the number of commodities are accurately counted in real time and the positions of the commodities are confirmed through video dynamic tracking and identification. Due to the fact that the texture feature and distance similarity fusion judgment rule is used, the target tracking effect is better for targets with similar or identical appearances, and the speed is higher than that of a common deep learning tracking method. Still can track to the static target leaving the video picture, judge the target classification with the current multiframe, fault-tolerant rate is high to the classifier classification; and the number of the statistical targets of the single picture in comparison shooting is superior in that the shot scene is large, the shooting is not influenced by a single angle, the comparison splicing recognition shooting freedom degree is high, and the recognition accuracy rate is high.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of a static merchandise statistics method according to an embodiment of the invention;

FIG. 2 is a flow chart illustrating a static merchandise statistics method according to an embodiment of the invention;

fig. 3 is a schematic structural diagram of a computer terminal device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be understood that the step numbers used herein are for convenience of description only and are not intended as limitations on the order in which the steps are performed.

It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

The terms "comprises" and "comprising" indicate the presence of the described features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The term "and/or" refers to and includes any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1, an embodiment of the invention provides a static merchandise statistics method, including:

s10, carrying out target detection and feature extraction on the video data according to a target detector to obtain a detection target list and a tracking target list, wherein the detection target list comprises a detection frame, a classification category and a feature expression;

s20, calculating a speed vector of the movement of the tracking target according to the tracking target list, and calculating the presumed position of the tracking target according to the speed vector;

s30, calculating the feature similarity between the detection target and the tracking target according to the detection target list and the tracking target list, and performing weight optimal matching according to the feature similarity;

s40, if the detection target is matched with the tracking target, updating the tracking target list according to the detection frame and the feature expression of the current detection target, if the detection target is not matched with the tracking target, establishing new tracking information for the detection target, adding the new tracking information into the tracking target list, and updating the current position for the tracking target;

and S50, deleting the tracking targets matched and identified for many times in the tracking list, and counting the types and the number of the tracking targets appearing in the tracking process after the video data is processed.

Referring to fig. 2, in the present embodiment, first, stationary object detection and feature extraction are performed by using a detector. The detector is typically implemented using a deep learning model, such as fast Rcnn, SSD, RefineDet, etc. detection models. In the tracking process, besides detecting the position and the identification category of the target, the detector also needs to extract the feature expression layer data of the detection model for matching the target. The preparation of a typical detector requires the following steps:

(1) data acquisition and detection model training of the detection and identification commodities;

(2) and carrying out target detection and feature extraction on each video frame.

Secondly, predicting a tracking target position, and calculating the target position of the current frame by using the tracking historical target position:

(1) calculating a velocity vector of the movement of the target by using the historical position of the tracking target;

(2) and calculating the presumed position of the current tracking target by using the calculated velocity vector and the position of the previous frame.

Then, calculating the feature similarity between the detection target and the tracking target:

(1) calculating the guessed position of the tracking target and the IOU of the current frame detection target pairwise to obtain distance similarity measurement;

(2) calculating the characteristics of a current frame detection target and the cosine distance of a tracking target to obtain appearance similarity measurement;

(3) linearly fusing the two similarity measurement values to obtain a final similarity matching score;

(4) and carrying out target matching by using an optimal weight matching algorithm.

Further, an update of the tracking list is performed:

(1) the location of the matching target in the tracking list is updated. Updating the matched tracking target by using the position and the characteristic value of the current detection target;

(2) and updating the positions of the commodities which are not matched and are included in the departure picture. And tracking on the unmatched position by using the relative position of the last frame and the position of the current existing target to follow the new position.

And finally, deleting the tracking target which is subjected to multiple matching identification, if the identification frequency of the tracking target which is not matched is more than 30 times, deleting the target, and if the identification frequency of the tracking target which is not matched is less than 30 times, updating the position of the tracking target which is not matched. And continuously tracking until the video processing is finished, and counting the types and the number of the tracking targets in the tracking process.

In the embodiment, training data of the target to be detected is collected, a target detector is trained, the detector generally uses a deep learning model, and the precision is high. And acquiring target data to be classified by using RefineDet, training a classifier, detecting by using SENet, and outputting the data F and the class data of the feature expression layer by using a classification network in the reasoning process. And firstly, detecting and classifying the target to be tracked by using a detector for each frame to obtain a detection target list D which comprises a detection frame, a classification category and feature expression F information of the target.

the next frame position of each target in the tracking target list is marked as the current presumed position (x)₀,y₀). In this embodiment, the frame center position of the next frame of each target in the inference tracking list T is recorded as the current inference position (x)₀,y₀). Tracking the position information of the target 2 times before updating in real time, and setting the position of the target of the previous time as (x)₁,y₁) Number of frames n₁The position of the previous secondary target is (x)₂,y₂) Number of frames n₂Then, the velocity vector of the target motion is calculated as:

s0 is the speed of the tracked target moving in the X-axis direction, i.e. the speed in the horizontal direction, and S1 is the speed of the tracked target moving in the Y-axis direction, i.e. the speed in the vertical direction.

Knowing the current frame number n₀Calculating the tracked target current inference position P (x)₀,y₀)：

The tracking algorithm uses a texture feature and distance similarity fusion judgment rule, and has a better tracking effect on targets with similar or identical appearances.

In this embodiment, the positions of the tracked target expressions are all the center points of the target rectangular frame, and the width and height of the rectangle are kept unchanged in the inference process.

representing a rectangle intersection area value;

wherein k is an empirical value of 0.4;

In the present embodiment, the detection target list D (D)_nN target, N =0,1_mTargets in the mth target, M =0,1.., M) match two by two.

（1）d_nIs characterized by the expression of A, t_mWherein A, B is a one-dimensional vector feature. Then the characteristic cosine distances FDist of the tracking target and the detection target can be calculated using the following formula:

（2）d_ntarget rectangular position P₁，t_mIs deduced for the rectangular position P₂Calculating the distance similarity DDist:

wherein

Representing an object d_nAnd t_mThe value of the smaller area is such that,

representing the rectangle intersection area value.

(3) Calculating the similarity Like (d) of the matching fusion features_n, t_m):

Where k is an empirical value taken to be 0.4.

The optimal matching of the weight is carried out by utilizing the similarity of pairwise matching,

in one embodiment, the best match algorithm comprises a KM algorithm.

In this embodiment, the KM algorithm is used to perform weight optimal matching on pairwise matching similarities,

wherein M is_mDetecting a frame position, P, for a previous frame of the tracked object_nThe last frame of the target detects the frame positionPlacing;

wherein Q is_mIs the current position of the target.

In the present embodiment, for the tracking target T on the unmatched_m. Setting the position of the detection frame of the previous frame as M_mThe location of the target may be updated using the following method:

(1) searching a target D set in which the previous frame and the current frame in the tracking target list are successfully matched;

(2) finding the distance between the center of the target and the target T in the previous frame in D_mRecent target K_nRecord K_nThe position of the previous frame is P_nThe current position is Q_mObtaining the relative position Diff = M_m-P_n；

(3) Update current Position = Q_m+Diff；

As shown in table 1, by performing the method of the embodiment, a time-consuming statistic for static merchandise identification statistics is obtained,

TABLE 1 time-consuming statistics of the methods of the examples

Therefore, the commodity statistical method is relatively quick.

The embodiment of the invention provides a static commodity counting device which is applied to the static commodity counting device in any one embodiment.

For the specific definition of the static commodity counting device, reference may be made to the definition above, and details are not repeated here. The modules in the static commodity counting device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

Referring to fig. 3, an embodiment of the invention provides a computer terminal device, which includes one or more processors and a memory. The memory is coupled to the processor for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the static merchandise statistics method as in any one of the embodiments above.

The processor is used for controlling the overall operation of the computer terminal equipment so as to complete all or part of the steps of the static commodity counting method. The memory is used to store various types of data to support the operation at the computer terminal device, which data may include, for example, instructions for any application or method operating on the computer terminal device, as well as application-related data. The Memory may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk.

In an exemplary embodiment, the computer terminal Device may be implemented by one or more Application Specific 1 integrated circuits (AS 1C), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor or other electronic components, for performing the above static merchandise statistics method and achieving technical effects consistent with the above method.

In another exemplary embodiment, there is also provided a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the static merchandise statistics method in any one of the above embodiments. For example, the computer readable storage medium may be the above-mentioned memory including program instructions executable by the processor of the computer terminal device to perform the above-mentioned static commodity counting method, and achieve the technical effects consistent with the above-mentioned method.

In summary, in the static commodity statistical method provided by the invention, the types and the number of commodities are accurately counted in real time and the positions of the commodities are confirmed by video dynamic tracking and identification. Due to the fact that the texture feature and distance similarity fusion judgment rule is used, the target tracking effect is better for targets with similar or identical appearances, and the speed is higher than that of a common deep learning tracking method. Still can track to the static target leaving the video picture, judge the target classification with the current multiframe, fault-tolerant rate is high to the classifier classification; and the number of the statistical targets of the single picture in comparison shooting is superior in that the shot scene is large, the shooting is not influenced by a single angle, the comparison splicing recognition shooting freedom degree is high, and the recognition accuracy rate is high.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A static commodity statistical method, comprising:

2. The static merchandise statistic method according to claim 1, further comprising training a target detection model according to a target detection algorithm and a data set of merchandise to obtain said target detector.

3. The static merchandise statistic method according to claim 2, wherein said object detection algorithm includes Faster Rcnn, SSD and RefineDet.

4. The static commodity counting method according to claim 1, wherein the calculating a velocity vector of the movement of the tracking target according to the tracking target list and the calculating the estimated position of the tracking target according to the velocity vector are specifically:

calculating the velocity vector of the movement of the tracking target, wherein the calculation formula is as follows:

wherein, S [0]]For the speed of movement of the tracked object in the X-axis direction, S1]For the velocity of the tracked target moving in the Y-axis direction, (x)₁,y₁) Previous target position, n, updated in real time for the tracked target₁(x) the number of video frames corresponding to the previous target position₂,y₂) First and second target positions, n, updated in real time for the tracked target₂The video frame number corresponds to the previous and secondary target positions;

calculating the current estimated position P (x) of the tracked target₀,y₀) The formula is as follows:

the next frame position of each target in the tracking target list is marked as the current presumed position (x)₀,y₀)，n₀The number of video frames corresponding to the current presumed location.

5. The static commodity counting method according to claim 4, wherein the position of the tracking target is the center point of a target rectangular frame, and the width and the height of the rectangular frame are kept unchanged.

6. The static commodity statistical method according to claim 1, wherein the calculating of the feature similarity between the detection target and the tracking target according to the detection target list and the tracking target list and the performing of weight optimal matching according to the feature similarity specifically comprises:

representing an object d_nAnd t_mValue of smaller area，p₁Is d_nTarget rectangular position, p₂Is t_mPresuming a rectangular position at the presumed position P (x)₀，y₀) The position of the rectangular frame as the center,

representing a rectangle intersection area value;

wherein k is an empirical value of 0.4;

7. The static merchandise statistics method of claim 6, wherein the best match algorithm comprises a KM algorithm.

8. The static commodity counting method according to claim 6, wherein the updating of the current position for the tracking target specifically comprises:

searching a target with the center of the target of the previous frame closest to the tracking target in the target set, taking the target as a nearby target, and calculating the relative position of the nearby target and the tracking target, wherein the formula is as follows:

wherein M is_mDetecting a frame position, P, for a previous frame of the tracked object_nDetecting a frame position for a previous frame of the adjacent target;

wherein Q is_mIs the current position of the nearby object.

9. A static merchandise statistic device, comprising:

10. A computer terminal device, comprising:

one or more processors;

a memory coupled to the processor for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the static merchandise statistics method of any of claims 1-8.