CN118470746A - Method, device, equipment and storage medium for detecting by using mobile phone - Google Patents
Method, device, equipment and storage medium for detecting by using mobile phone Download PDFInfo
- Publication number
- CN118470746A CN118470746A CN202410636481.9A CN202410636481A CN118470746A CN 118470746 A CN118470746 A CN 118470746A CN 202410636481 A CN202410636481 A CN 202410636481A CN 118470746 A CN118470746 A CN 118470746A
- Authority
- CN
- China
- Prior art keywords
- model
- mobile phone
- detection model
- detection
- human body
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000001514 detection method Methods 0.000 claims abstract description 205
- 238000012549 training Methods 0.000 claims description 38
- 230000006870 function Effects 0.000 claims description 23
- 238000012544 monitoring process Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 16
- 238000010276 construction Methods 0.000 claims description 7
- 210000000746 body region Anatomy 0.000 claims description 6
- 230000006399 behavior Effects 0.000 abstract description 14
- 238000005516 engineering process Methods 0.000 abstract description 6
- 230000008569 process Effects 0.000 description 11
- 238000004891 communication Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000003993 interaction Effects 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 3
- 238000010200 validation analysis Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000000547 structure data Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Landscapes
- Telephone Function (AREA)
Abstract
The invention discloses a detection method, a detection device, detection equipment and a storage medium using a mobile phone. Comprising the following steps: acquiring a human body image dataset, and constructing a detection model based on the human body image dataset, wherein the detection model comprises a first detection model and a second detection model; acquiring image data to be detected, and determining node position information according to the image data to be detected and a detection model; and constructing a relation reasoning model, and detecting by using a mobile phone according to the node position information and the relation reasoning model. Two detection models with different detection precision are constructed through a human body image dataset, the detection models are combined with a multi-target tracking technology, the recognition precision and the real-time performance of the mobile phone use behaviors in a complex scene are greatly improved, the node position information of an image to be detected can be determined through the detection models, and the situation of using the mobile phone can be accurately detected through a dependency relationship reasoning model, so that the method is very important for the situation that the mobile phone use behaviors of personnel in a specific scene need to be monitored and managed.
Description
Technical Field
The present invention relates to the field of machine vision, and in particular, to a method, an apparatus, a device, and a storage medium for detecting using a mobile phone.
Background
With the popularization and the continuous enhancement of functions of mobile phones, mobile phones play an increasingly important role in life and work of people. However, in certain specific situations, such as schools, examination rooms, workplaces, etc., the illicit use of a cell phone can present a series of problems, such as affecting learning and work efficiency, revealing confidential information, interfering with normal order, etc.
In the prior art, mobile phone detection is performed through electronic equipment and software. The method comprises the step of detecting the existence and the use condition of the mobile phone through specific equipment or software by utilizing the characteristics of wireless signals, sensors and the like of the mobile phone. For example, whether the mobile phone is used nearby is judged by detecting wireless signals sent by the mobile phone, such as bluetooth, etc., but the signal detection mode in the prior art is easy to be interfered and has higher false alarm rate.
Disclosure of Invention
The invention provides a method, a device, equipment and a storage medium for detecting by using a mobile phone, which are used for judging whether a person is calling or operating the mobile phone by using an advanced interaction model, and provide powerful technical support for the fields of intelligent monitoring, man-machine interaction research and the like.
According to an aspect of the present invention, there is provided a detection method using a mobile phone, the method comprising:
Acquiring a human body image dataset, and constructing a detection model based on the human body image dataset, wherein the detection model comprises a first detection model and a second detection model;
acquiring image data to be detected, and determining node position information according to the image data to be detected and a detection model;
and constructing a relation reasoning model, and detecting by using a mobile phone according to the node position information and the relation reasoning model.
Optionally, acquiring the human body image dataset includes: acquiring historical monitoring data, and acquiring user annotation data based on the historical monitoring data, wherein the user annotation data comprises a human body target position, a human head target position, a hand target position and a mobile phone target position; performing data enhancement on the user annotation data to generate enhanced user annotation data; taking the enhanced human body target position as a first data set, and taking the enhanced human head target position, the enhanced hand target position and the enhanced mobile phone target position as a second data set; the first data set and the second data set are used as human body image data sets.
Optionally, constructing the detection model based on the human body image dataset includes: constructing an initial detection model, wherein the initial detection model comprises a self-adaptive attention module; training the initial detection model based on the first data set, and generating a first detection model when a preset training standard is met; training the initial detection model based on a second data set, and generating a second detection model when a preset training standard is met; the first detection model and the second detection model are combined to generate a detection model.
Optionally, determining node location information according to the image data to be detected and the detection model includes: inputting image data to be detected into a first detection model to obtain each human body identification area output by the first detection model; sequentially taking each human body identification area as a target human body area; performing target tracking on the target human body area through a preset multi-target tracking algorithm to determine a target user identifier corresponding to the target human body area; and determining node position information corresponding to the target user identifier according to the target human body area and the second detection model.
Optionally, determining node location information corresponding to the target user identifier according to the target human body area and the second detection model includes: inputting the target human body area into a second detection model to obtain a human head position coordinate, a hand position coordinate and a mobile phone position coordinate which are output by the second detection model; and taking the head position coordinates, the hand position coordinates and the mobile phone position coordinates as node position information corresponding to the target user identification.
Optionally, constructing the relational inference model includes: acquiring sample position information; setting up an initial network structure of a relation reasoning model, and determining initial model parameters corresponding to the initial network structure; inputting the sample position information into an initial network structure to obtain an output sample position relationship; determining a real position relation of the sample position information, and determining a loss function according to the real position relation and the sample position relation; judging whether the loss function converges or not, if so, taking a network structure corresponding to the initial model parameter as a relation recommendation model; otherwise, the initial model parameters are adjusted based on the loss function to obtain adjusted model parameters, and the network structure corresponding to the adjusted model parameters is used as a relation recommendation model.
Optionally, detecting using a mobile phone according to the node position information and the relationship inference model includes: inputting the node position information into a relationship reasoning model to obtain a target position relationship output by the relationship reasoning model, wherein the target position relationship comprises a used mobile phone and an unused mobile phone; judging whether the target position relation meets preset conditions, if so, determining that the detection result is that a mobile phone is used, and generating alarm prompt information according to the position relation; otherwise, determining that the detection result is that the mobile phone is not used.
According to another aspect of the present invention, there is provided a detection apparatus using a mobile phone, the apparatus comprising:
The detection model construction module is used for acquiring a human body image data set and constructing a detection model based on the human body image data set, wherein the detection model comprises a first detection model and a second detection model;
the node position information determining module is used for acquiring image data to be detected and determining node position information according to the image data to be detected and the detection model;
and the mobile phone detection result determining module is used for constructing a relation reasoning model and detecting the mobile phone according to the node position information and the relation reasoning model.
According to another aspect of the present invention, there is provided an electronic apparatus including:
At least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform a method of using a mobile phone detection as described in any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to execute a method for detecting use of a mobile phone according to any one of the embodiments of the present invention.
According to the technical scheme provided by the embodiment of the invention, two detection models with different detection precision are constructed through the human body image dataset, and the detection models are combined with the multi-target tracking technology, so that the recognition precision and the real-time performance of the mobile phone use behaviors in a complex scene are greatly improved, the node position information of the image to be detected can be determined through the detection models, and the situation of whether the mobile phone use exists can be accurately detected through the dependency relationship reasoning model, and the method is very important for the situation that the mobile phone use behaviors of personnel in a specific scene need to be monitored and managed.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for detecting a mobile phone according to a first embodiment of the present invention;
FIG. 2 is a flowchart of another method for detecting a mobile phone according to an embodiment of the present invention;
FIG. 3 is a flowchart of another method for detecting a mobile phone according to a second embodiment of the present invention;
fig. 4 is a schematic structural diagram of a detecting device using a mobile phone according to a third embodiment of the present invention;
Fig. 5 is a schematic structural diagram of an electronic device using a mobile phone detection method according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of a method for detecting a user using a mobile phone according to an embodiment of the present invention, where the method may be performed by using a mobile phone detection device, and the mobile phone detection device may be implemented in hardware and/or software, and the mobile phone detection device may be configured in a computer controller. As shown in fig. 1, the method includes:
s110, acquiring a human body image dataset, and constructing a detection model based on the human body image dataset, wherein the detection model comprises a first detection model and a second detection model.
Wherein, the human body image dataset refers to a set consisting of a plurality of images containing human bodies, which are used for training and constructing a detection model. The detection model refers to a model capable of analyzing and recognizing input image data to detect a specific object or feature. The detection model comprises a first detection model and a second detection model, wherein the first detection model is used for detecting a human body, and the second detection model is used for further detecting the head, the hand and the mobile phone in the human body.
In particular, acquiring a human body image dataset may be accomplished in a variety of ways, such as collection from an existing image database, or by self-capturing and sorting related human body images. The human body image data set provides a basic material for the construction of a subsequent model, and the controller constructs a detection model based on the acquired human body image data set, wherein the controller refers to a computer controller for detecting by using a mobile phone.
Optionally, acquiring the human body image dataset includes: acquiring historical monitoring data, and acquiring user annotation data based on the historical monitoring data, wherein the user annotation data comprises a human body target position, a human head target position, a hand target position and a mobile phone target position; performing data enhancement on the user annotation data to generate enhanced user annotation data; taking the enhanced human body target position as a first data set, and taking the enhanced human head target position, the enhanced hand target position and the enhanced mobile phone target position as a second data set; the first data set and the second data set are used as human body image data sets.
The historical monitoring data refer to historical image or video data collected through monitoring equipment. The user annotation data refers to data for marking and annotating the image by a user, and specifically includes annotation information of a human body target position, a human head target position, a hand target position and a mobile phone target position. The human body target position refers to a target frame of an approximate region range where a human body is located in an image. The target position of the human head refers to a target frame of the specific position of the human head in the image. The hand target position refers to a target frame of a specific position of the hand in the image. The mobile phone target position refers to a target frame of the mobile phone where the mobile phone appears in the image. The first data set refers to a data set composed solely of the enhanced human target locations. The second data set is a data set formed by combining the enhanced human head target position, the enhanced hand target position and the enhanced mobile phone target position.
Specifically, the historical monitoring data contains a large amount of human body image information in different scenes, different times and different environments. By way of example, the monitoring scene can be mainly referred to by collecting data of the playing mobile phone and the playing mobile phone in various scenes. And then, the history monitoring data is displayed to a user to obtain user labeling data, the user can label each target frame such as a human body, a human head, a hand, a mobile phone and the like, and a user identifier such as an ID, and the user can further label the relationship description of the human head target frame and the mobile phone target frame, for example, the relationship between the human head ID_1 and the mobile phone ID_1 is a mobile phone playing relationship, so that the target frames and the relationship on each graph are obtained.
Further, the controller performs data enhancement operation on the obtained user annotation data. The data enhancement includes: 1. multi-scale cutting: and randomly selecting a cutting scale from a plurality of scales, calculating to obtain the specific cutting starting position and width and height, and cutting out a fixed area from the original image. 2. Random overturning: and (3) turning over the images in the image sequence at random angles. Mixup: and two or more than two images are spliced to form a new picture, so that the anti-interference capability of the network in space-time can be effectively improved. Through the data enhancement technology, more abundant and various enhanced user annotation data can be generated, and better data support is provided for training and optimization of a subsequent model.
Specifically, the controller uses the enhanced target position of the human body as a first data set for analyzing the general form and the position relation of the human body. The controller takes the enhanced head target position, hand target position and mobile phone target position as a second data set, wherein the second data set contains more detail information for researching head, hand and mobile phone related behaviors.
Optionally, constructing the detection model based on the human body image dataset includes: constructing an initial detection model, wherein the initial detection model comprises a self-adaptive attention module; training the initial detection model based on the first data set, and generating a first detection model when a preset training standard is met; training the initial detection model based on a second data set, and generating a second detection model when a preset training standard is met; the first detection model and the second detection model are combined to generate a detection model.
The detection model can be yolov detection model, the capability of extracting key features can be enhanced by adding a SELayer attention mechanism module into the model, the feature extraction intensity of a key region (such as a contact area between a hand and a mobile phone) is dynamically adjusted, the interference of background noise is reduced, the sensitivity of the model to micro gesture changes is enhanced, and the method is particularly suitable for distinguishing nuances of calling and playing the mobile phone. Therefore, the detection precision is remarkably improved in a complex scene, and the detection capability of the detection model on target frames such as mobile phones is improved.
Specifically, the controller will develop training for the initial detection model using the first data set. During this training process, the model is continually learned and adjusted to better accommodate the human target location information in the first dataset. The preset training standard means that the number of iterations or the loss value between the real label and the predicted value reaches a relatively small value, and the model training is completed when the test precision meets the requirement.
It should be noted that the second detection model is consistent with the training steps and processes of the first detection model, but the second detection model is based on the second data set, that is, the training is mainly aimed at the information such as the target position of the head, the target position of the hand, the target position of the mobile phone, and the like.
It can be known that the training process of the detection model adopts a mode of dynamically adjusting the learning rate, specifically, the learning rate is adjusted through a cosine annealing strategy, so that the learning rate can be gradually reduced from an initial value to a minimum value according to the periodicity of a cosine function. The method can help the model to learn quickly in the initial stage of training, gradually slow down learning rate in the later stage, and help the model to converge better in the final stage of training.
S120, obtaining image data to be detected, and determining node position information according to the image data to be detected and the detection model.
Specifically, the node position information refers to position coordinates of position points with specific meanings on the human body, which are determined by the detection model, and the position coordinates include a human head, a hand and a mobile phone. The image data to be measured may be acquired in real time by the target monitoring device or acquired from a specific source. And the node position information can be determined by interacting and analyzing the image data to be detected with the constructed detection model.
S130, constructing a relation reasoning model, and detecting by using the mobile phone according to the node position information and the relation reasoning model.
The relationship inference model refers to a model for inferring and understanding relationships between different nodes, so as to perform further analysis and judgment according to the relationships.
Fig. 2 is a flowchart of a method for detecting a mobile phone according to an embodiment of the invention, and step S130 mainly includes steps S131 to S135 as follows:
S131, constructing a relation reasoning model.
Optionally, constructing the relational inference model includes: acquiring sample position information; setting up an initial network structure of a relation reasoning model, and determining initial model parameters corresponding to the initial network structure; inputting the sample position information into an initial network structure to obtain an output sample position relationship; determining a real position relation of the sample position information, and determining a loss function according to the real position relation and the sample position relation; judging whether the loss function converges or not, if so, taking a network structure corresponding to the initial model parameter as a relation recommendation model; otherwise, the initial model parameters are adjusted based on the loss function to obtain adjusted model parameters, and the network structure corresponding to the adjusted model parameters is used as a relation recommendation model.
Specifically, the controller prepares data, the sample position information refers to graph structure data constructed according to an application scene, wherein nodes represent human body parts (head and hand) and mobile phones, and edges represent the connection between entities, namely, mobile phone playing and mobile phone playing. The controller builds an initial network structure of the relational inference model, such as the GNN model, when performing model training. And corresponding initial model parameters are determined. For example, the loss function: a loss function is defined to measure the gap between model predictions and real labels, for example, cross entropy loss can be used as the loss function. Super parameters including learning rate, batch size, training rounds, etc. are set. An optimizer, such as Adam, is selected for updating model parameters during the training process.
In a specific embodiment, the training process of the model includes: forward propagation: and inputting graph data, and updating node characteristics of the GNN model through multi-layer iteration until the number of the preset layers is reached. Relationship reasoning: and deducing the relation or behavior among the nodes through a relation reasoning module by utilizing the updated node characteristics, such as judging the relative positions of the mobile phone and the human body part. Back propagation and parameter update: and calculating gradients according to the loss function, updating model parameters through back propagation, and optimizing the prediction capability of the model. And (3) verification and test: model performance is periodically assessed on the validation set, and model parameters are adjusted or training is stopped (early-stop method) according to the validation results. Finally, the generalization capability of the model is evaluated on the test set, so that the model is ensured to perform well on training data and can be effective on unseen data. And (3) model tuning: depending on the validation and test results, adjustments to the model structure, hyper-parameters, or training strategies may be required to further enhance model performance.
S132, inputting the node position information into a relation reasoning model to acquire a target position relation output by the relation reasoning model, wherein the target position relation comprises a used mobile phone and an unused mobile phone.
S133, judging whether the target position relation meets the preset condition, if so, executing S134, otherwise, executing S135.
S134, determining that the detection result is that the mobile phone is used, and generating alarm prompt information according to the position relation.
S135, determining that the detection result is that the mobile phone is not used.
Specifically, when the mobile phone is used for detection, the controller can transmit node position information to the relation reasoning model as input content. The corresponding target position relation can be output through the deduction of the relation reasoning model. Then the controller carefully judges the target position relation, namely whether the target position relation meets preset conditions set in advance. If the target position relationship meets the preset condition, the detection result can be determined to be that the mobile phone is used. At this time, alarm prompt information can be generated according to the position relation. The alarm prompt information can be in the form of sound, light, text prompt and the like, and can inform related personnel about the situation of using the mobile phone in time. However, if the target position relationship does not meet the preset condition, it can be determined that the detection result is that the mobile phone is not used, that is, the mobile phone using behavior is not detected in the current monitoring scene.
According to the technical scheme provided by the embodiment of the invention, two detection models with different detection precision are constructed through the human body image dataset, and the detection models are combined with the multi-target tracking technology, so that the recognition precision and the real-time performance of the mobile phone use behaviors in a complex scene are greatly improved, the node position information of the image to be detected can be determined through the detection models, and the situation of whether the mobile phone use exists can be accurately detected through the dependency relationship reasoning model, and the method is very important for the situation that the mobile phone use behaviors of personnel in a specific scene need to be monitored and managed.
Example two
Fig. 3 is a flowchart of a mobile phone detection method according to a second embodiment of the present invention, and a specific process of using a mobile phone to detect according to node location information and a relationship inference model is added to the first embodiment of the present invention. The specific contents of steps S210 and S260 are substantially the same as steps S110 and S130 in the first embodiment, so that a detailed description is omitted in this embodiment. As shown in fig. 3, the method includes:
s210, acquiring a human body image dataset, and constructing a detection model based on the human body image dataset, wherein the detection model comprises a first detection model and a second detection model.
Optionally, acquiring the human body image dataset includes: acquiring historical monitoring data, and acquiring user annotation data based on the historical monitoring data, wherein the user annotation data comprises a human body target position, a human head target position, a hand target position and a mobile phone target position; performing data enhancement on the user annotation data to generate enhanced user annotation data; taking the enhanced human body target position as a first data set, and taking the enhanced human head target position, the enhanced hand target position and the enhanced mobile phone target position as a second data set; the first data set and the second data set are used as human body image data sets.
Optionally, constructing the detection model based on the human body image dataset includes: constructing an initial detection model; training the initial detection model based on the first data set, and generating a first detection model when a preset training standard is met; training the initial detection model based on a second data set, and generating a second detection model when a preset training standard is met; the first detection model and the second detection model are combined to generate a detection model.
S220, obtaining image data to be detected, and inputting the image data to be detected into the first detection model to obtain each human body identification area output by the first detection model.
And S230, sequentially taking each human body recognition area as a target human body area.
S240, performing target tracking on the target human body area through a preset multi-target tracking algorithm to determine a target user identifier corresponding to the target human body area.
Specifically, the controller will first input the image data to be tested into the first detection model. The first detection model outputs different human body identification areas, wherein the human body identification areas comprise specific ranges of detected human bodies in the image. Then the controller takes the human body identification areas as target human body areas in turn, and performs accurate target tracking on each target human body area by using a preset multi-target tracking algorithm. Through the tracking process, a target user identification corresponding to each target human body region may be determined. The target user identification is a unique mark for each specific person to distinguish between different persons.
S250, determining node position information corresponding to the target user identification according to the target human body area and the second detection model.
Optionally, determining node location information corresponding to the target user identifier according to the target human body area and the second detection model includes: inputting the target human body area into a second detection model to obtain a human head position coordinate, a hand position coordinate and a mobile phone position coordinate which are output by the second detection model; and taking the head position coordinates, the hand position coordinates and the mobile phone position coordinates as node position information corresponding to the target user identification.
Specifically, the controller will input data of the target human body region into the second detection model. The second detection model outputs specific head position coordinates, hand position coordinates and mobile phone position coordinates. And finally, integrating the head position coordinates, the hand position coordinates and the mobile phone position coordinates, and taking the head position coordinates, the hand position coordinates and the mobile phone position coordinates as node position information corresponding to the target user identification. That is, through the coordinate information, the relevant body part of the target user and the specific position of the held mobile phone in the specific space can be determined, so that further analysis and judgment on the mobile phone using behavior of the target user are realized.
The specific application scene is as follows: a human body region is detected by first adopting a first detection model. Cutting out the human body area, adding a multi-target tracking model in the reasoning process, and distributing unique user identification to each human body detection target. And inputting the cut human body region image into a second detection model, detecting the target frames of the head, the hands and the mobile phone, inputting the coordinates and the category of the target frames into a GNN relation reasoning model, and reasoning the relation between the mobile phone and the human body, and whether the mobile phone is played or not. Finally, each human body ID can be judged in continuous frames, for example, if one human body ID is continuous for 10 frames and more than 6 frames are output to play or play the mobile phone, an alarm is given.
S260, constructing a relation reasoning model, and detecting by using the mobile phone according to the node position information and the relation reasoning model.
Optionally, constructing the relational inference model includes: acquiring sample position information; setting up an initial network structure of a relation reasoning model, and determining initial model parameters corresponding to the initial network structure; inputting the sample position information into an initial network structure to obtain an output sample position relationship; determining a real position relation of the sample position information, and determining a loss function according to the real position relation and the sample position relation; judging whether the loss function converges or not, if so, taking a network structure corresponding to the initial model parameter as a relation recommendation model; otherwise, the initial model parameters are adjusted based on the loss function to obtain adjusted model parameters, and the network structure corresponding to the adjusted model parameters is used as a relation recommendation model.
Optionally, detecting using a mobile phone according to the node position information and the relationship inference model includes: inputting the node position information into a relationship reasoning model to obtain a target position relationship output by the relationship reasoning model, wherein the target position relationship comprises a used mobile phone and an unused mobile phone; judging whether the target position relation meets preset conditions, if so, determining that the detection result is that a mobile phone is used, and generating alarm prompt information according to the position relation; otherwise, determining that the detection result is that the mobile phone is not used.
According to the technical scheme provided by the embodiment of the invention, two detection models with different detection precision are constructed through the human body image dataset, and the detection models are combined with the multi-target tracking technology, so that the recognition precision and the real-time performance of the mobile phone use behaviors in a complex scene are greatly improved, the node position information of the image to be detected can be determined through the detection models, and the situation of whether the mobile phone use exists can be accurately detected through the dependency relationship reasoning model, and the method is very important for the situation that the mobile phone use behaviors of personnel in a specific scene need to be monitored and managed.
Example III
Fig. 4 is a schematic structural diagram of a mobile phone detection device according to a third embodiment of the present invention. As shown in fig. 4, the apparatus includes: a detection model construction module 310, configured to acquire a human body image dataset, and construct a detection model based on the human body image dataset, where the detection model includes a first detection model and a second detection model; the node position information determining module 320 is configured to obtain image data to be detected, and determine node position information according to the image data to be detected and the detection model; the mobile phone detection result determining module 330 is configured to construct a relationship inference model, and perform mobile phone detection according to the node location information and the relationship inference model.
Optionally, the detection model construction module 310 specifically includes: a human body image dataset acquisition unit configured to: acquiring historical monitoring data, and acquiring user annotation data based on the historical monitoring data, wherein the user annotation data comprises a human body target position, a human head target position, a hand target position and a mobile phone target position; performing data enhancement on the user annotation data to generate enhanced user annotation data; taking the enhanced human body target position as a first data set, and taking the enhanced human head target position, the enhanced hand target position and the enhanced mobile phone target position as a second data set; the first data set and the second data set are used as human body image data sets.
Optionally, the detection model construction module 310 specifically includes: the detection model construction unit is used for: constructing an initial detection model; training the initial detection model based on the first data set, and generating a first detection model when a preset training standard is met; training the initial detection model based on a second data set, and generating a second detection model when a preset training standard is met; the first detection model and the second detection model are combined to generate a detection model.
Optionally, the node location information determining module 320 specifically includes: a human body recognition area determining unit for: inputting image data to be detected into a first detection model to obtain each human body identification area output by the first detection model; a target human body region determining unit for: sequentially taking each human body identification area as a target human body area; a target user identification determining unit, configured to: performing target tracking on the target human body area through a preset multi-target tracking algorithm to determine a target user identifier corresponding to the target human body area; a node position information determination unit configured to: and determining node position information corresponding to the target user identifier according to the target human body area and the second detection model.
Optionally, the node location information determining unit is specifically configured to: inputting the target human body area into a second detection model to obtain a human head position coordinate, a hand position coordinate and a mobile phone position coordinate which are output by the second detection model; and taking the head position coordinates, the hand position coordinates and the mobile phone position coordinates as node position information corresponding to the target user identification.
Optionally, the mobile phone detection result determining module 330 specifically includes: the relation reasoning model building unit is used for: acquiring sample position information; setting up an initial network structure of a relation reasoning model, and determining initial model parameters corresponding to the initial network structure; inputting the sample position information into an initial network structure to obtain an output sample position relationship; determining a real position relation of the sample position information, and determining a loss function according to the real position relation and the sample position relation; judging whether the loss function converges or not, if so, taking a network structure corresponding to the initial model parameter as a relation recommendation model; otherwise, the initial model parameters are adjusted based on the loss function to obtain adjusted model parameters, and the network structure corresponding to the adjusted model parameters is used as a relation recommendation model.
Optionally, the mobile phone detection result determining module 330 specifically includes: a detection result determining unit configured to: inputting the node position information into a relationship reasoning model to obtain a target position relationship output by the relationship reasoning model, wherein the target position relationship comprises a used mobile phone and an unused mobile phone; judging whether the target position relation meets preset conditions, if so, determining that the detection result is that a mobile phone is used, and generating alarm prompt information according to the position relation; otherwise, determining that the detection result is that the mobile phone is not used.
According to the technical scheme provided by the embodiment of the invention, two detection models with different detection precision are constructed through the human body image dataset, and the detection models are combined with the multi-target tracking technology, so that the recognition precision and the real-time performance of the mobile phone use behaviors in a complex scene are greatly improved, the node position information of the image to be detected can be determined through the detection models, and the situation of whether the mobile phone use exists can be accurately detected through the dependency relationship reasoning model, and the method is very important for the situation that the mobile phone use behaviors of personnel in a specific scene need to be monitored and managed.
The mobile phone detection device can be used for executing the mobile phone detection method provided by any embodiment of the invention, and the mobile phone detection device has the corresponding functional modules and beneficial effects of the execution method.
Example IV
Fig. 5 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 5, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as one using a cell phone detection method.
In some embodiments, a method of detecting using a mobile phone may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more of the steps of one of the methods of using a cell phone detection described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform a method of detecting using a mobile phone in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.
Claims (10)
1. A method for detecting by using a mobile phone, comprising:
Acquiring a human body image dataset, and constructing a detection model based on the human body image dataset, wherein the detection model comprises a first detection model and a second detection model;
acquiring image data to be detected, and determining node position information according to the image data to be detected and the detection model;
And constructing a relation reasoning model, and detecting by using a mobile phone according to the node position information and the relation reasoning model.
2. The method of claim 1, wherein the acquiring a human image dataset comprises:
Acquiring historical monitoring data, and acquiring user annotation data based on the historical monitoring data, wherein the user annotation data comprises a human body target position, a human head target position, a hand target position and a mobile phone target position;
Performing data enhancement on the user annotation data to generate enhanced user annotation data;
taking the enhanced human body target position as a first data set, and taking the enhanced human head target position, the enhanced hand target position and the enhanced mobile phone target position as a second data set;
The first data set and the second data set are used as the human body image data set.
3. The method of claim 2, wherein the constructing a detection model based on the human image dataset comprises:
Constructing an initial detection model, wherein the initial detection model comprises a self-adaptive attention module;
training the initial detection model based on the first data set, and generating a first detection model when a preset training standard is met;
training the initial detection model based on the second data set, and generating a second detection model when a preset training standard is met;
combining the first detection model and the second detection model to generate the detection model.
4. The method of claim 1, wherein said determining node location information from said image data to be measured and said detection model comprises:
inputting the image data to be detected into the first detection model to obtain each human body identification area output by the first detection model;
sequentially taking each human body identification area as a target human body area;
Performing target tracking on the target human body area through a preset multi-target tracking algorithm to determine a target user identifier corresponding to the target human body area;
And determining node position information corresponding to the target user identifier according to the target human body area and the second detection model.
5. The method of claim 4, wherein the determining node location information corresponding to the target user identification based on the target human body region and the second detection model comprises:
Inputting the target human body area into the second detection model to obtain a human head position coordinate, a hand position coordinate and a mobile phone position coordinate which are output by the second detection model;
And taking the head position coordinate, the hand position coordinate and the mobile phone position coordinate as node position information corresponding to the target user identifier.
6. The method of claim 1, wherein said constructing a relational inference model comprises:
Acquiring sample position information;
setting up an initial network structure of a relation reasoning model, and determining initial model parameters corresponding to the initial network structure;
Inputting the sample position information into the initial network structure to obtain an output sample position relation;
determining a real position relation of the sample position information, and determining a loss function according to the real position relation and the sample position relation;
Judging whether the loss function is converged or not, if so, taking a network structure corresponding to the initial model parameter as the relation recommendation model;
Otherwise, the initial model parameters are adjusted based on the loss function to obtain adjusted model parameters, and the network structure corresponding to the adjusted model parameters is used as the relation recommendation model.
7. The method of claim 6, wherein said using handset detection based on said node location information and said relational inference model comprises:
Inputting the node position information into the relation inference model to acquire a target position relation output by the relation inference model, wherein the target position relation comprises a used mobile phone and an unused mobile phone;
Judging whether the target position relation meets a preset condition, if so, determining that the detection result is that a mobile phone is used, and generating alarm prompt information according to the position relation;
otherwise, determining that the detection result is that the mobile phone is not used.
8. A detection device using a mobile phone, comprising:
The detection model construction module is used for acquiring a human body image data set and constructing a detection model based on the human body image data set, wherein the detection model comprises a first detection model and a second detection model;
the node position information determining module is used for acquiring image data to be detected and determining node position information according to the image data to be detected and the detection model;
and the mobile phone detection result determining module is used for constructing a relation reasoning model and detecting the mobile phone according to the node position information and the relation reasoning model.
9. An electronic device, the electronic device comprising:
At least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
10. A computer storage medium storing computer instructions for causing a processor to perform the method of any one of claims 1-7 when executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410636481.9A CN118470746A (en) | 2024-05-22 | 2024-05-22 | Method, device, equipment and storage medium for detecting by using mobile phone |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410636481.9A CN118470746A (en) | 2024-05-22 | 2024-05-22 | Method, device, equipment and storage medium for detecting by using mobile phone |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118470746A true CN118470746A (en) | 2024-08-09 |
Family
ID=92163227
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410636481.9A Pending CN118470746A (en) | 2024-05-22 | 2024-05-22 | Method, device, equipment and storage medium for detecting by using mobile phone |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118470746A (en) |
-
2024
- 2024-05-22 CN CN202410636481.9A patent/CN118470746A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112857268B (en) | Object area measuring method, device, electronic equipment and storage medium | |
CN113642431B (en) | Training method and device of target detection model, electronic equipment and storage medium | |
CN114402356A (en) | Network model training method, image processing method and device and electronic equipment | |
CN109345553A (en) | A kind of palm and its critical point detection method, apparatus and terminal device | |
CN113705628B (en) | Determination method and device of pre-training model, electronic equipment and storage medium | |
CN116071608B (en) | Target detection method, device, equipment and storage medium | |
US20230252295A1 (en) | Method of generating multimodal set of samples for intelligent inspection, and training method | |
CN114445667A (en) | Image detection method and method for training image detection model | |
CN115454466A (en) | Method, apparatus, device and medium for automatic updating of machine learning model | |
CN118210670A (en) | Log abnormality detection method and device, electronic equipment and storage medium | |
CN116824609B (en) | Document format detection method and device and electronic equipment | |
CN113111139A (en) | Alarm detection method and device based on Internet of things sensor | |
CN114445711B (en) | Image detection method, image detection device, electronic equipment and storage medium | |
CN118470746A (en) | Method, device, equipment and storage medium for detecting by using mobile phone | |
CN115600607A (en) | Log detection method and device, electronic equipment and medium | |
CN112905743B (en) | Text object detection method, device, electronic equipment and storage medium | |
CN116259083A (en) | Image quality recognition model determining method and related device | |
CN113807391A (en) | Task model training method and device, electronic equipment and storage medium | |
CN114265757A (en) | Equipment anomaly detection method and device, storage medium and equipment | |
CN118432952B (en) | Abnormality detection method under zero trust environment, electronic equipment and storage medium | |
CN114140851B (en) | Image detection method and method for training image detection model | |
CN115357461B (en) | Abnormality detection method, abnormality detection device, electronic device, and computer-readable storage medium | |
CN117112445B (en) | Machine learning model stability detection method, device, equipment and medium | |
CN116910682B (en) | Event detection method and device, electronic equipment and storage medium | |
CN117854152A (en) | Climbing behavior identification method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |