CN116704244A

CN116704244A - Course domain schematic diagram object detection method, system, equipment and storage medium

Info

Publication number: CN116704244A
Application number: CN202310593041.5A
Authority: CN
Inventors: 张玲玲; 任铭; 张新宇; 武亚强; 王璐妍; 刘均; 魏笔凡; 郑庆华
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2023-05-24
Filing date: 2023-05-24
Publication date: 2023-09-05

Abstract

The application discloses a course field schematic diagram object detection method, a system, equipment and a storage medium, which belong to the technical field of image target detection, wherein the object detection method comprises the steps of extracting image global characteristic information and local characteristic information of an object to be detected and fusing to obtain fused characteristic information; obtaining a prediction probability corresponding to the fusion characteristic information, and forming a key value pair by the local characteristic information of the object to be detected and the prediction probability; writing the key value pairs into a memory network, updating the key value pairs, training an object detection network model through a transfer loss function, inputting a course domain diagram to the trained object detection network model, and outputting objects contained in the course domain diagram. The method and the device can solve the problems of large difference of visual characteristic information and scarcity of sample height of the schematic diagram in the course field, enrich the characteristic information, strengthen visual information representation of local characteristic information and improve detection performance of the schematic diagram object in the course field.

Description

Course domain schematic diagram object detection method, system, equipment and storage medium

Technical Field

The application belongs to the technical field of image target detection, and particularly relates to a method, a system, equipment and a storage medium for detecting schematic diagram objects in the course field.

Background

With the development of the information age, knowledge resources on the network are explosively increased, and learning courseware, technical documents and teaching videos are increasingly abundant, wherein the learning courseware, the technical documents and the teaching videos comprise curriculum field schematic diagrams created by human beings. The method solves the problem that the target detection task on the schematic diagram in the course field can be used for determining the position information of various targets on the schematic diagram, is an important basis for the cross-media knowledge intensive task such as knowledge fusion and intelligent answering, and is a necessary way for promoting the further development of intelligent education.

However, research objects in the current schematic research field are mainly concentrated in the fields of building drawings, plain tracing drawings, hand-drawn pictures and the like, and lack of research work on schematic diagrams in the course field. Moreover, the current mainstream target detection model is difficult to adapt to the target detection task of the course domain schematic diagram due to the large difference of visual characteristic information of the course domain schematic diagram and the scarcity of sample height.

Disclosure of Invention

The application aims to solve the problems in the prior art and provide a method, a system, equipment and a storage medium for detecting objects in a course domain, which are used for reliably detecting objects contained in the course domain by solving the problems of large difference of visual characteristic information and rare sample height of the course domain.

In order to achieve the above purpose, the present application has the following technical scheme:

a course domain schematic diagram object detection method comprises the following steps:

extracting global feature information of the image and local feature information of an object to be detected, and fusing to obtain fused feature information;

obtaining a prediction probability corresponding to the fusion characteristic information, and forming a key value pair by the local characteristic information of the object to be detected and the prediction probability;

writing the key value pairs into a memory network, updating the key value pairs, training an object detection network model through a transfer loss function, inputting a course domain diagram to the trained object detection network model, and outputting objects contained in the course domain diagram.

As a preferable scheme, the object detection network model adopts a Faster R-CNN network model.

As a preferred solution, the step of extracting global feature information of the image and local feature information of the object to be detected and fusing to obtain fused feature information includes:

obtaining image global feature information s by using a pre-trained fast R-CNN network model;

detecting objects existing in the course field image by utilizing a pre-trained fast R-CNN network model to obtain local feature information { o } of n objects ₁ ,o ₂ ,…,o _n }；

Combining the global feature information s of the image with the local feature information o of n objects ₁ ,o ₂ ,…,o _n Fusion is carried out to obtain fusion characteristic information { v }, and ₁ ,v ₂ ,…,v _n }。

as a preferable scheme, in the step of obtaining the prediction probability corresponding to the fused feature information, the obtained fused feature information { v } ₁ ,v ₂ ,…,v _n Inputting the pre-trained fast R-CNN network model to obtain a corresponding prediction probability { p } ₁ ,p ₂ ,…,p _n }。

As a preferable mode, in the step of writing the key value pair into the Memory network, a Memory Bank module based on the feature information and the prediction probability is provided in the Memory network, and the expression of the key value pair is (K _L ,V _L ) Wherein K and V are respectively used for storing characteristic information and prediction probability, and L represents the number of object categories;

in the step of combining the local feature information of the object to be detected and the prediction probability into key value pairs, the local feature information { o } of the object ₁ ,o ₂ ,…,o _n Sum of predictive probabilities { p } ₁ ,p ₂ ,…,p _n Corresponding feature information and predictive probability in } constitute key value pairs { (k) ₁ ,l ₁ ),(k ₂ ,l ₂ ),...,(k _n ,l _n ) The generated key value pair is transferred to the Memory Bank module as input to the Memory network.

As a preferable scheme, in the step of writing the key value pairs into the Memory network, each time new feature information and prediction probability are acquired, judging whether a Memory Bank module of a current class is full, if not, executing a Memory writing stage, and if so, executing a Memory updating stage; the memory writing stage directly uses the newly acquired characteristic information and the predictive probability key value pair (k _i ,l _i ) To the end of the Memory Bank module.

As a preferred solution, in the step of updating the key value pair and performing training of the object detection network model through the transfer loss function, feature information with highest similarity in the current Memory Bank module is selected for fusion, and a calculation expression of a similarity calculation method and a feature information fusion method is as follows:

in the formula, o represents the local characteristic information of the object, k represents the characteristic information stored in the current Memory Bank module, delta is used for ensuring that the denominator is not 0, k _neq Representing new feature information, k, after fusion _{maxsimilarity} Representing the feature information with highest similarity with the newly acquired feature information in the current Memory Bank module;

each time new feature information is learned, the corresponding Memory Bank module also generates a probability prediction for the current feature information, and the calculation expression is as follows:

in the ideal case of a combination of the above-mentioned,the loss function of the memory network is as follows:

a curriculum domain schematic object detection system, comprising:

the feature fusion module is used for extracting image global feature information and local feature information of an object to be detected and fusing the image global feature information and the local feature information to obtain fused feature information;

the key value pair acquisition module is used for acquiring the prediction probability corresponding to the fusion characteristic information, and forming a key value pair by the local characteristic information of the object to be detected and the prediction probability;

the model training module is used for writing the key value pairs into the memory network, updating the key value pairs and training the object detection network model through the transfer loss function;

and the object output module is used for inputting the course domain diagram to the trained object detection network model and outputting the objects contained in the course domain diagram.

An electronic device, comprising:

a memory storing at least one instruction; a kind of electronic device with high-pressure air-conditioning system

And the processor executes the instructions stored in the memory to realize the curriculum domain schematic diagram object detection method.

A computer readable storage medium storing a computer program which when executed by a processor implements the curriculum domain sketch object detection method.

Compared with the prior art, the application has at least the following beneficial effects:

aiming at the problems of large difference of visual characteristic information and scarcity of sample height of the schematic diagram in the course field, the method respectively solves two main differences of the schematic diagram in the course field and a natural scene image from the characteristic information level and the sample number level, and enriches the characteristic information and enhances the visual information representation of the local characteristic information by fusing the global characteristic information of the schematic diagram image with the local characteristic information of an object to be detected. The learned characteristic information and the prediction probability form key value pairs to be memorized, the memory network is used for continuously updating in the subsequent learning process, and the training of the object detection network model is guided by a transmission loss function mode, so that the characteristic information of the schematic diagram is stored, and the detection performance of the schematic diagram object in the course field is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application, and that other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for detecting objects in a class diagram of an embodiment of the present application;

FIG. 2 is a schematic diagram of a model structure diagram of an object detection method in the course domain according to an embodiment of the present application;

FIG. 3 is a block diagram of a Memory Bank according to an embodiment of the present application;

FIG. 4 is a probability computation graph of a Memory Bank module according to an embodiment of the present application;

FIG. 5 is a schematic diagram of experimental results of a model in the course domain after adding an information fusion module according to the embodiment of the application;

FIG. 6 is a schematic diagram of the experimental results of the model in the course domain after adding the memory network module according to the embodiment of the application;

FIG. 7 is a schematic diagram of the experimental results of the model in the course domain after adding the information fusion and memory network according to the embodiment of the application;

FIG. 8 is a graph illustrating the results of a target detection task on a course domain schematic dataset for different models in accordance with an embodiment of the present application;

FIG. 9 is a schematic diagram of the experimental results of the target detection task on the course domain data set for different memory slot parameters in the memory network according to the embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. Based on the embodiments of the present application, one of ordinary skill in the art may also obtain other embodiments without undue burden.

The method for detecting the schematic diagram object in the course field mainly comprises three stages: 1) an image feature extraction and fusion stage, 2) a feature information classification stage, and 3) a writing, updating and transferring stage of a memory network.

Fig. 1 shows a flow of a method for detecting objects in a class diagram according to an embodiment of the present application, and fig. 2 shows a model structure of a method for detecting objects in a class diagram according to an embodiment of the present application, specifically including the following steps:

1) In the image extraction and fusion stage, firstly, utilizing a feature extraction module in a pre-trained object detection network Faster R-CNN model to obtain image global feature information s; secondly, detecting objects existing in the course field image by using a pre-trained object detection network Faster R-CNN to obtain positions of n objects in the image and local feature information { o } of the objects ₁ ,o ₂ ,…,o _n -a }; finally, the characteristic information of the previous two steps is fused to obtain new characteristic information { v } ₁ ,v ₂ ,…,v _n And then the characteristic information { v } is calculated as shown in expression (1) ₁ ,v ₂ ,…,v _n And (3) sending the characteristic information into a characteristic information classification stage.

v _i ＝Concat(o _i ,s) (1)

2) In the feature information classification stage, first, feature information { v } obtained in the previous stage is classified ₁ ,v ₂ ,…,v _n Sending the model into a classifier module in a pre-trained object detection network Faster R-CNN model to obtain corresponding prediction probability { p } ₁ ,p ₂ ,…,p _n The local characteristic information and the corresponding prediction probability are formed into key value pairs to be stored, and when key value pairs are generated, the embodiment of the application considers that the characteristic information among different categories is prevented from being influenced mutually, designs a Memory Bank mechanism based on the characteristic information and the prediction probability, and is marked as (K _L ，V _L ) As shown in fig. 3, where K and V are used to store feature information and prediction probabilities, respectively, and L represents the number of object categories. Local feature information { o } of object ₁ ，o ₂ ，...，o _n Sum of predictive probabilities { p } ₁ ，p ₂ ，...，p _n Corresponding feature information and predictive probability in } constitute key value pairs { (k) ₁ ，l ₁ )，(k ₂ ，l ₂ )，...，(k _n ，l _n ) The generated key value pair is passed to the Memory Bank module as an input to the Memory network.

3) In the writing, updating and transferring stages of the Memory network, firstly, a writing mode is selected according to whether the corresponding Memory Bank module is fully written or not, and each timeWhen new feature information and prediction probability are acquired, whether the Memory Bank module of the current category is full or not needs to be judged, if not, a Memory writing stage is executed, and if so, a Memory updating stage is executed. The memory writing stage directly uses the newly acquired characteristic information and the predictive probability key value pair (k _i ，l _i ) Added to the end of the Memory Bank module as shown in expression 2.

And selecting the characteristic information with highest similarity in the current Memory Bank module for fusion, wherein a similarity calculation method and a characteristic information fusion method are respectively shown in an expression 3 and an expression 4.

Wherein o represents object local feature information, k represents feature information stored in the current Memory Bank, delta is used for ensuring that denominator is not 0, and generally about 1e-5 is taken, k _new Representing new feature information, k, after fusion _{max similarity} The feature information that indicates the highest similarity with the newly acquired feature information in the current Memory Bank is shown in expression 5.

The Memory network loss considers that each time new feature information is learned, the corresponding Memory Bank also generates a probability prediction for the current feature information, the calculation process is shown in fig. 4, and the calculation method is shown in expression 6.

According to the analysis of the ideal case,the value of (2) should be 1, which indicates that the current Memory Bank has fully learned the feature information, so that the Memory network loss is designed as shown in expression 7, and is used for measuring the difference between the feature learning condition and the ideal condition of the current Memory Bank, calculating the loss and returning to the first two stages for training.

Finally, the model is trained by continuously updating the parameters.

According to the embodiment of the application, the model parameters are independently trained aiming at each training sample, namely, the sample batch size is 1, the parameters in the model are updated by adopting an SGD (random gradient descent) algorithm with an initial learning rate of 1e-3, and the learning rate is updated to be 1/10 of the previous learning rate after every 10 epochs. And storing the optimal model data to obtain a trained model.

And inputting a schematic diagram in the course field, and outputting a detection result through the three-stage extraction, fusion, classification and model training of the image characteristics so as to realize target detection of objects contained in the schematic diagram.

In order to explore the effectiveness of the application, a picture in test data is selected for testing, and the capability of two solutions provided by the application for processing a schematic diagram target detection task in the course field is explored. The results of the three groups of pictures obtained after only the information fusion module, the memory network module and the information fusion and memory network module are added are respectively shown in fig. 5, 6 and 7.

It can be observed that the information fusion module improves the learning ability of the model to the characteristics of the sample data, and the memory network module improves the learning ability of the model to the low-frequency sample data. As shown in fig. 5 and 7, the former can learn the low-frequency sample only by adding the memory network module, but it is still difficult to distinguish some objects with similar characteristics; the information fusion module is further added on the basis of the information fusion module, and similar objects, such as crab and frog, can be distinguished. The two modules provided by the embodiment of the application are effective in the schematic diagram target detection task facing the course field.

The experimental performance of the embodiment of the application under two backbolts, VGG and ResNet, is shown in FIG. 8.

1) By combining global characteristic information with a fast-R-CNN model, a target object detection method for multi-granularity visual information fusion, namely Information fusion, is designed, and the model performance can be seen to be improved from 2% -3% mAP to about 15% mAP, so that the detection performance of the model is greatly enhanced.

2) It can be seen that in all cases, the addition of the Memory Bank module improves the model performance, and the optimal performance can basically reach about 19%.

3) After the information fusion method and the memory enhancement method are combined, experimental verification is performed on the schematic diagram data set again, so that the experimental result shown in the figure can be obtained, and the model with the best performance can be seen to reach about 22% mAP.

Through the ablation experiment, the information fusion and memory enhancement methods provided by the application can be better represented, and the two methods are proved to have the learning ability of an enhancement model in a schematic low-frequency object target detection task.

The experimental performance of the present application on different memory slot parameters in a memory network is shown in figure 9. It can be seen that the number of memory slots in the memory network is a relatively important super-parameter that represents the memory capacity of the current memory network for the learning result. The number of the memory cells was 3, 4, 5, and 6, respectively, to obtain experimental results of 19.5%, 20%, 22%, and 21%, respectively. The larger number of memory slots means larger and larger memory capacity, and conversely means smaller and smaller memory capacity.

Another embodiment of the present application further provides a curriculum domain schematic object detection system, including:

Another embodiment of the present application also proposes an electronic device, including: a memory storing at least one instruction; and the processor executes the instructions stored in the memory to realize the curriculum domain schematic diagram object detection method.

Another embodiment of the present application further proposes a computer readable storage medium, where a computer program is stored, where the computer program when executed by a processor implements the method for detecting objects in a class domain schematic.

The instructions stored in the memory may be divided into one or more modules/units, which are stored in a computer-readable storage medium and executed by the processor to perform the curriculum domain schematic object detection methods of the present application. The one or more modules/units may be a series of computer readable instruction segments capable of performing a specified function, which describes the execution of the computer program in a server.

The electronic equipment can be a smart phone, a notebook computer, a palm computer, a cloud server and other computing equipment. The electronic device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that the electronic device may also include more or fewer components, or may combine certain components, or different components, e.g., the electronic device may also include input and output devices, network access devices, buses, etc.

The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may be an internal storage unit of the server, such as a hard disk or a memory of the server. The memory may also be an external storage device of the server, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the server. Further, the memory may also include both an internal storage unit and an external storage device of the server. The memory is used to store the computer readable instructions and other programs and data required by the server. The memory may also be used to temporarily store data that has been output or is to be output.

It should be noted that, because the content of information interaction and execution process between the above module units is based on the same concept as the method embodiment, specific functions and technical effects thereof may be referred to in the method embodiment section, and details thereof are not repeated herein.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. The course domain schematic diagram object detection method is characterized by comprising the following steps:

2. The method for detecting objects in a curriculum domain schematic according to claim 1, wherein said object detection network model uses a fast R-CNN network model.

3. The method for detecting a curriculum domain sketch object according to claim 2, wherein the step of extracting global feature information of an image and local feature information of an object to be detected and fusing the extracted global feature information and the local feature information to obtain fused feature information comprises:

Combining the global feature information s of the image with the local feature information o of n objects ₁ ,o ₂ ,…,o _n MeltingCombining to obtain the fusion characteristic information { v } ₁ ,v ₂ ,…,v _n }。

4. The method for detecting a class domain sketch object according to claim 3, wherein in said step of obtaining a prediction probability corresponding to the fused feature information, the obtained fused feature information { v } ₁ ,v ₂ ,…,v _n Inputting the pre-trained fast R-CNN network model to obtain a corresponding prediction probability { p } ₁ ,p ₂ ,…,p _n }。

5. The method according to claim 4, wherein in the step of writing the key value pair into the Memory network, a Memory Bank module based on the feature information and the prediction probability is provided in the Memory network, and the expression of the key value pair is (K _L ,V _L ) Wherein K and V are respectively used for storing characteristic information and prediction probability, and L represents the number of object categories;

6. The method according to claim 5, wherein in the step of writing the key value pair into the Memory network, each time new feature information and prediction probability are acquired, it is determined whether a Memory Bank module of a current class is full, and if not, a Memory writing stage is executed, and if so, a Memory updating stage is executed; the memory writing stage directly uses the newly acquired characteristic information and the predictive probability key value pair (k _i ,l _i ) To the end of the Memory Bank module.

7. The method for detecting objects in a curriculum domain schematic according to claim 6, wherein in the step of updating key value pairs and performing training of an object detection network model through a transfer loss function, feature information with highest similarity in a current Memory Bank module is selected for fusion, and a calculation expression of a similarity calculation method and a feature information fusion method is as follows:

in the formula, o represents the local characteristic information of the object, k represents the characteristic information stored in the current Memory Bank module, delta is used for ensuring that the denominator is not 0, k _new Representing new feature information, k, after fusion _{maxsimilarity} Representing the feature information with highest similarity with the newly acquired feature information in the current Memory Bank module;

8. a class domain schematic object detection system, comprising:

9. An electronic device, comprising:

A processor executing instructions stored in the memory to implement the curriculum domain schematic object detection method of any of claims 1-7.

10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the curriculum domain schematic object detection method of any one of claims 1 to 7.