CN111985505A - Interest visual relationship detection method and device based on interest propagation network - Google Patents

Interest visual relationship detection method and device based on interest propagation network Download PDF

Info

Publication number
CN111985505A
CN111985505A CN202010848981.0A CN202010848981A CN111985505A CN 111985505 A CN111985505 A CN 111985505A CN 202010848981 A CN202010848981 A CN 202010848981A CN 111985505 A CN111985505 A CN 111985505A
Authority
CN
China
Prior art keywords
interest
visual
relation
predicate
pair
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010848981.0A
Other languages
Chinese (zh)
Other versions
CN111985505B (en
Inventor
任桐炜
武港山
王浩楠
于凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202010848981.0A priority Critical patent/CN111985505B/en
Publication of CN111985505A publication Critical patent/CN111985505A/en
Application granted granted Critical
Publication of CN111985505B publication Critical patent/CN111985505B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Strategic Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Game Theory and Decision Science (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Development Economics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Image Analysis (AREA)

Abstract

An object is extracted from an input image, pairwise combination is carried out to obtain an object pair, corresponding object features and joint features are calculated, the visual features, semantic features and position features of the object and the object pair are generated, interest features of the object and the object pair are obtained through meridian transformation, accordingly, interest degree of the object pair is predicted, interest features of a relation predicate are obtained through meridian transformation of the visual features, the semantic features and the position features of the object pair relation predicate, and interest degree of the relation predicate among the objects is predicted; and finally, combining the interest degree of the object with the interest degree of the relational predicate to obtain a visual relation interest degree, wherein the visual relation with high interest degree is the finally detected interest visual relation. The invention can predict the relation interest degree more reasonably by taking the semantic importance as the standard in the process of detecting the visual relation, finds out the interest visual relation capable of accurately transmitting the main content of the image and has good universality and practicability.

Description

Interest visual relationship detection method and device based on interest propagation network
Technical Field
The invention belongs to the technical field of computer vision, relates to visual relation detection in images, and particularly relates to an interest visual relation detection method based on an interest propagation network.
Technical Field
As a bridge between visual and natural languages, visual relationship detection aims to describe objects in an image and interactions between objects in the form of relationship triples of < subject, relationship predicate, object >. Where subjects and objects are generally represented by the borders and categories of the object, relational predicates typically have verbs (e.g., "raise", "ride", "see"), orientation words (e.g., "beside", "in front", "above"), and verb phrases (e.g., "stand beside", "sit on", "walk through"). The visual relation detection can help a machine to understand and analyze the content of the image or the video, and can be widely applied to scenes such as image retrieval, video analysis and the like.
Conventional visual relationship detection methods are directed to detecting all visual relationships in an image. In fact, due to the explosive combination of subjects, relational predicates, and objects, conventional methods typically detect a very rich visual relationship, as shown in fig. 2. Although the image content can be described more fully, too much detail may mislead the understanding of the machine to the image main content, resulting in the loss of precision in the scene of image retrieval and the like, which is not favorable for the accurate analysis of the image or video by the machine.
Intuitively, not all detected visual relationships are semantically really 'interesting', that is, not all visual relationships express the main content of an image, and often only a small part of the relationships have important significance for conveying the main content of the image, and such relationships are interesting visual relationships. The goal of interest visual relationship detection is to detect the visual relationship that is really important for conveying the main content of the image, i.e., the "interesting" visual relationship.
At present, no research work attempts to detect interest visual relationships, and only some related works measure the visual significance of the relationships through an attention module, so that the significance weight of the relationships is determined, and the significant visual relationships are found out. However, such a method only takes into account the visual significance of the relationship, and does not take into account the semantic importance of the relationship, and the obtained interest relationship is not necessarily really "interesting".
Disclosure of Invention
The invention aims to solve the problems that: machine understanding deviation easily caused by too rich visual relationship in the image needs to be detected, and the interest visual relationship capable of accurately conveying the content of the image main body needs to be detected so as to help a machine to more accurately understand and analyze the image or the video.
The technical scheme of the invention is as follows: an interest visual relationship detection method based on an interest propagation network comprises the steps of establishing the interest propagation network, inputting images and outputting interest visual relationships in the images, wherein the interest propagation network comprises a panoramic object detection module, an object interest prediction module and a relationship predicate interest prediction module; firstly, extracting objects from an input image through a panoramic object detection module, combining every two objects into object pairs, calculating object characteristics of the objects and joint characteristics of the object pairs, generating visual characteristics, semantic characteristics and position characteristics of the objects and the object pairs in an object pair interest prediction module, and respectively obtaining interest characteristics of the objects and the object pairs so as to predict interest degrees of the object pairs; meanwhile, the relation predicate interest prediction module obtains interest characteristics of the relation predicates according to the visual characteristics, the semantic characteristics and the position characteristics of the object pair relation predicates, and predicts the relation predicate interest degrees among the objects by using semi-supervised learning; and finally, combining the interest degree of the object with the interest degree of the relational predicate to obtain a visual relation interest degree, wherein the visual relation with high interest degree is the finally detected interest visual relation.
Further, the invention comprises the following steps:
1) extracting frames and categories of all objects from the input image, calculating characteristics in the frames of the n objects as object characteristics, combining the n objects pairwise to form n (n-1) object pairs, calculating characteristics in the frames of a subject and an object in each object pair as joint characteristics;
2) for each object, pre-training a GloVe model to obtain a word embedding feature of the class name of the object, taking the object feature of the object as a visual feature, taking the word embedding feature of the class name as a semantic feature, taking the position of the object relative to the whole image as a position feature, and combining the three features to obtain an interest feature of the object; for each object pair, respectively calculating three characteristics of the subject and the object in the same way, calculating the three characteristics of the object pair, and combining to obtain the interest characteristics of the object pair; inputting the object and the interest characteristics of the object pair into a graph convolution neural network, and predicting the interest degree of the object pair;
3) for each object pair, calculating the visual characteristics, semantic characteristics and position characteristics of the relational predicates to obtain interest characteristics of the object pair relational predicates, and for each relational predicate, predicting the probability that the relational predicates are interesting under the condition that the object pairs are interesting by using semi-supervised learning, namely, obtaining relational predicate interestingness;
4) adding the loss predicted by the object type in the step 1), the loss predicted by the object and the object with the interest degree in the step 2) and the loss predicted by the relational predicate interest degree in the step 3) to obtain total loss, combining the object interest degree and the relational predicate interest degree obtained by minimizing the total loss to obtain visual relational interest degrees, and sequencing all the visual relations according to the interest degrees, wherein the visual relation with high interest degree is the finally detected interest visual relation.
The invention also provides an interest visual relationship detection device based on the interest propagation network, which is configured with a computer program, wherein the computer program correspondingly realizes the interest propagation network and realizes the interest visual relationship detection method.
The effective benefits of the invention are: the method and the device have the advantages that the problem of machine understanding deviation caused by excessively rich visual relations in the images is solved, the semantic features, the position features and the visual features of objects and object pairs are considered, the relation interest degree can be predicted more reasonably by taking the semantic importance as a standard in the process of detecting the visual relations, and the interest visual relations capable of accurately conveying the main contents of the images are found. The method has good universality and practicability.
Drawings
FIG. 1 is a flow chart of the architecture and interest visual relationship detection of the interest propagation network of the present invention.
Fig. 2 shows the effect of detecting an excessively rich visual relationship by the conventional method.
FIG. 3 is a diagram illustrating the results of the interest visual relationship detection method of the present invention.
Detailed Description
The interest visual relationship detection method based on the interest propagation network, which is related by the invention, provides a solution for the problem of machine understanding deviation caused by excessively rich visual relationship in an image, realizes that the interest characteristics of an object, an object pair and a relationship predicate are obtained by carrying out linear transformation on semantic characteristics, position characteristics and visual characteristics of an input image, a combined object and an object pair, reasonably predicts the interest degree of the visual relationship by taking semantic importance as a standard, and produces an interest visual relationship result capable of accurately conveying the main content of the image.
The practice of the present invention is described in detail below.
As shown in FIG. 1, the invention establishes an interest propagation network, inputs images and outputs interest visual relationships in the images, wherein the interest propagation network comprises a panoramic object detection module, an object interest prediction module and a relation predicate interest prediction module; the panoramic object detection module performs panoramic segmentation (panoramic segmentation) on the image, and the content in the image can be divided into categories of thins and stuff according to whether a fixed shape exists, wherein objects with fixed shapes such as people and cars belong to the categories of thins (namely, numerical names usually belong to things); objects with no fixed shape such as sky, grass, etc. belong to the stuff category (i.e., the non-countable term belongs to stuff). The method comprises the steps of obtaining an object in an image through panoramic segmentation, obtaining object features of the object and joint features of object pairs through an instance encoder, respectively inputting an object pair interest prediction module and a relation predicate interest prediction module, obtaining semantic features, visual features and position features of the object, the object pairs and the object pair relation predicates through a semantic encoder, a visual encoder and a position encoder, obtaining interest features of the object, the object pairs and the object pair relation predicates through meridian transformation of the three features, obtaining interest degrees of the object, the object pairs and the object pair relation predicates through supervised learning and semi-supervised learning prediction of the interest features, obtaining visual relation interest degrees through combination of the two interest degrees, and sequencing and outputting interest visual relations according to the visual relation interest degrees.
On the basis of the interest propagation network, firstly, objects are extracted from an input image through a panoramic object detection module, pairwise combination is carried out to form object pairs, object features of the objects and combined features of the object pairs are calculated, visual features, semantic features and position features of the objects and the object pairs are generated, then interest features of the objects and the object pairs are obtained through an object interest prediction module respectively, and accordingly interest degrees of the objects are predicted; meanwhile, the relation predicate interest prediction module obtains interest characteristics of the relation predicates according to the visual characteristics, the semantic characteristics and the position characteristics of the object pair relation predicates, and predicts the relation predicate interest degrees among the objects by using semi-supervised learning; and finally, combining the interest degree of the object with the interest degree of the relational predicate to obtain a visual relation interest degree, wherein the visual relation with high interest degree is the finally detected interest visual relation.
The following describes the implementation of the present invention in detail. The invention comprises the following steps:
1) for the input image, a panoramic object detection module of the interest propagation network is adopted to calculate object features and joint features:
1.1) extracting the frames and the categories of all objects in the graph;
1.2) calculating the characteristics in the n object frames in the step 1.1) as object characteristics;
1.3) combining the n objects in the step 1.1) pairwise to form n (n-1) object pairs, and calculating the characteristics in the subject and object union frame in each object pair as combined characteristics.
2) For the objects extracted in the step 1) and the object pairs formed by the objects, calculating the interest degree of the objects by adopting an object interest prediction module of an interest propagation network:
2.1) for each object extracted in the step 1), using the object characteristics as visual characteristics, pre-training by a GloVe model to obtain word embedding characteristics of the class name, using the word embedding characteristics of the class name as semantic characteristics, using the position of the object relative to the whole image as position characteristics, and combining the three characteristics to obtain the interest characteristics of the object. The method for calculating the position characteristics of the object comprises the following steps:
Figure BDA0002644087460000041
wherein, LociIs a characteristic of the position of the object i,
Figure BDA0002644087460000042
it is shown that the operation of juxtaposition,
Figure BDA0002644087460000043
respectively, the coordinates of the left, upper, right, and lower boundaries of the object, and w, h, respectively, the width and height of the input image.
2.2) calculating three characteristics of the subject and the object respectively in a similar mode for each object pair formed in the step 1), and then calculating the three characteristics of the object pair to jointly obtain the interest characteristics of the object pair. The method for calculating the position characteristics of the object pair comprises the following steps:
Figure BDA0002644087460000044
Locpis a positional characteristic of the object to p, LociIs a position characteristic of the object i, sp、opAnd the U represents the juxtaposition operation of the object level.
The method for calculating the visual characteristics of the object pair comprises the following steps:
Figure BDA0002644087460000045
wherein FpIs the view of an object on pThe characteristics of the sense of sight,
Figure BDA0002644087460000046
respectively representing the subject and object characteristics of the object pair,
Figure BDA0002644087460000047
representing the combined characteristics of the subject and object of the object pair.
2.3) inputting the two interest characteristics in the step 2.1) and the step 2.2) into a graph convolution neural network, and predicting the interest degree of the object.
3) Calculating a relational predicate interest degree by adopting a relational predicate interest prediction module of the interest propagation network for the object pair formed in the step 1):
3.1) calculating the visual characteristics, semantic characteristics and position characteristics of the object pair relational predicates for each object pair formed in the step 1), and jointly obtaining the interest characteristics of the relational predicates. The calculation method of the relational predicate position characteristics of the object pairs comprises the following steps:
Figure BDA0002644087460000051
wherein Loc'pIs the relation predicate position characteristic of the object pair p, and w 'and h' are the width and height of the subject and object union frame in the object pair respectively. The calculation of the visual features is the same as the calculation of the visual features by an object.
3.2) for each relational predicate, the semi-supervised learning is used to predict the probability that the relational predicate is also interesting under the condition that the object pair is interesting, namely the relational predicate interestingness. The loss of semi-supervised learning is calculated as follows:
Figure BDA0002644087460000052
wherein L isrelaIs the loss of the relational predicate interest prediction Module, lrelaIs a function of the loss as a function of,
Figure BDA0002644087460000053
respectively representing the prediction results of marked data and unmarked data,
Figure BDA0002644087460000054
the real results of marked data and unmarked data are respectively shown, and beta is the loss weight of the unmarked data.
4) Minimizing the total loss of the interest propagation network, predicting the interest visual relationship:
4.1) adding the loss of the object type prediction in the step 1), the loss of the object and the object interest prediction in the step 2) and the loss of the relation predicate interest prediction in the step 3) to obtain the total loss of the interest propagation network, and combining the object interest degree and the relation predicate interest degree obtained by minimizing the total loss to obtain the visual relation interest degree. The total loss of the interest propagation network is calculated as follows:
Lpos=-(1-ppos)2log ppos
Lneg=-pneg log(1-pneg)
Figure BDA0002644087460000055
wherein L ispos、LnegRepresenting the loss of positive and negative samples, p, respectivelypos、pnegRespectively representing the probability scores of positive and negative samples, LtotalIs the total loss of the interest propagation network, LclassIs a loss of prediction of the object class,
Figure BDA0002644087460000056
respectively representing positive and negative losses of the object interest prediction,
Figure BDA0002644087460000057
respectively representing positive and negative losses of the object to the prediction of interest,
Figure BDA0002644087460000058
respectively representing relational predicate interest predictionsPositive losses and negative losses.
4.2) sequencing all visual relations according to the interest degree, wherein the visual relation with high interest degree is the finally detected interest visual relation. The interestingness of the visual relationship is calculated as follows:
Ispo=Eso·Iso·Pspo
wherein, IspoIs the degree of interest of the visual relationship, Iso、PspoRespectively representing the interest-degree of the object-pair and the relational predicate, EsoIs a binary parameter, E when the subject and object in the object pair are the same objectsoGet 0, otherwise Eso1 is taken.
The method of the present invention can be implemented by a computer program, and therefore, an interest visual relationship detecting apparatus based on an interest propagation network is also provided, wherein the apparatus is configured with a computer program and when executed, implements the interest visual relationship detecting method of the present invention.
The method is implemented on the MSCOCO image data set, and compared with the result of the traditional visual relation detection method. Fig. 2 and 3 are comparative examples of the results of conventional visual relationship detection and the results of the present invention. Fig. 2(a) and 3(a) are input images, and objects related to the visual relationship detection result are marked. Fig. 2(b) is the result of conventional visual relationship detection, which includes up to 24 visual relationships, and most of the relationships are weakly associated with the main content of the input image. Fig. 3(b) is the result of the interest visual relationship detection of the present invention, which includes only 5 visual relationships, and all of which are strongly associated with the main content of the input image.

Claims (9)

1. An interest visual relationship detection method based on an interest propagation network is characterized in that the interest visual relationship detection method is characterized in that the interest propagation network is established, images are input, and interest visual relationships in the images are output, and the interest propagation network comprises a panoramic object detection module, an object interest prediction module and a relationship predicate interest prediction module; firstly, extracting objects from an input image through a panoramic object detection module, combining every two objects into object pairs, calculating object characteristics of the objects and joint characteristics of the object pairs, generating visual characteristics, semantic characteristics and position characteristics of the objects and the object pairs in an object pair interest prediction module, and obtaining interest characteristics of the objects and the object pairs through linear transformation so as to predict interest degrees of the object pairs; meanwhile, the relation predicate interest prediction module obtains interest characteristics of the relation predicates through linear transformation of the visual characteristics, the semantic characteristics and the position characteristics of the object pair relation predicates, and predicts the relation predicate interest degrees among the objects by using semi-supervised learning; and finally, combining the interest degree of the object with the interest degree of the relational predicate to obtain a visual relation interest degree, wherein the visual relation with high interest degree is the finally detected interest visual relation.
2. The interest visual relationship detection method based on the interest propagation network as claimed in claim 1, characterized by comprising the following steps:
1) extracting frames and categories of all objects from the input image, calculating characteristics in the frames of the n objects as object characteristics, combining the n objects pairwise to form n (n-1) object pairs, calculating characteristics in the frames of a subject and an object in each object pair as joint characteristics;
2) for each object, pre-training a GloVe model to obtain a word embedding feature of the class name of the object, taking the object feature of the object as a visual feature, taking the word embedding feature of the class name as a semantic feature, taking the position of the object relative to the whole image as a position feature, and combining the three features to obtain an interest feature of the object; for each object pair, respectively calculating three characteristics of the subject and the object in the same way, calculating the three characteristics of the object pair, and combining to obtain the interest characteristics of the object pair; inputting the object and the interest characteristics of the object pair into a graph convolution neural network, and predicting the interest degree of the object pair;
3) for each object pair, calculating the visual characteristics, semantic characteristics and position characteristics of the relational predicates to obtain interest characteristics of the object pair relational predicates, and for each relational predicate, predicting the probability that the relational predicates are interesting under the condition that the object pairs are interesting by using semi-supervised learning, namely, obtaining relational predicate interestingness;
4) adding the loss predicted by the object type in the step 1), the loss predicted by the object and the object with the interest degree in the step 2) and the loss predicted by the relational predicate interest degree in the step 3) to obtain total loss, combining the object interest degree and the relational predicate interest degree obtained by minimizing the total loss to obtain visual relational interest degrees, and sequencing all the visual relations according to the interest degrees, wherein the visual relation with high interest degree is the finally detected interest visual relation.
3. The interest visual relationship detection method based on the interest propagation network as claimed in claim 2, wherein in the step 2), the position characteristic of the object is calculated by:
Figure FDA0002644087450000011
wherein, LociIs a characteristic of the position of the object i,
Figure FDA0002644087450000021
it is shown that the operation of juxtaposition,
Figure FDA0002644087450000022
respectively the coordinates of the left, upper, right and lower boundary of the object i, and w, h respectively the width and height of the input image.
4. The interest visual relationship detection method based on the interest propagation network as claimed in claim 2, wherein in step 2), the calculation method of the position features of the object pairs comprises:
Figure FDA0002644087450000023
wherein, LocpIs a positional characteristic of the object to p, LociIs a position characteristic of the object i, sp、opAnd the U represents the juxtaposition operation of the object level.
5. The interest visual relationship detection method based on the interest propagation network as claimed in claim 2, wherein the visual characteristics of the object pair are calculated by:
Figure FDA0002644087450000024
wherein FpIs the visual characteristic of the object pair p,
Figure FDA0002644087450000025
respectively representing the subject and object characteristics of the object pair,
Figure FDA0002644087450000026
representing the combined characteristics of the subject and object of the object pair.
6. The interest visual relationship detection method based on the interest propagation network as claimed in claim 2, wherein in step 3), the calculation method of the relational predicate position characteristics of the object pairs comprises:
Figure FDA0002644087450000027
wherein Loc'pIs the object-to-p relational predicate location feature,
Figure FDA0002644087450000028
it is shown that the operation of juxtaposition,
Figure FDA0002644087450000029
respectively the coordinates, s, of the left, upper, right, and lower boundaries of the object ip、opRespectively representing a main body and an object of the object pair, U represents the juxtaposition operation of the object level, and w 'and h' are the widths of a frame of a union set of the main body and the object in the object pairAnd a height.
7. The interest visual relationship detection method based on the interest propagation network as claimed in claim 2, wherein in the semi-supervised learning predicted relationship predicate interestingness of step 3), the calculation method of the prediction loss is as follows:
Figure FDA00026440874500000210
wherein L isrelaIs the loss of relational predicate interestingness prediction, lrelaIs a function of the loss as a function of,
Figure FDA00026440874500000211
respectively representing the prediction results of marked data and unmarked data,
Figure FDA00026440874500000212
the real results of marked data and unmarked data are respectively shown, and beta is the loss weight of the unmarked data.
8. The interest visual relationship detection method based on the interest propagation network as claimed in claim 2, wherein the total loss in step 4) is calculated by:
Lpos=-(1-ppos)2logppos
Lneg=-pneglog(1-pneg)
Figure FDA0002644087450000031
wherein L ispos、LnegRepresenting the loss of positive and negative samples, p, respectivelypos、pnegRespectively representing the probability scores of positive and negative samples, LtotalIs the total loss, LclassIs a loss of prediction of the object class,
Figure FDA0002644087450000032
respectively representing positive and negative losses of the object interest prediction,
Figure FDA0002644087450000033
respectively representing positive and negative losses of the object to the interestingness prediction,
Figure FDA0002644087450000034
representing positive and negative losses, respectively, of the relational predicate interestingness prediction.
9. An interest visual relationship detection device based on an interest propagation network, characterized in that the device is configured with a computer program, the computer program corresponds to the interest propagation network of claim 1, and when executed, the interest visual relationship detection method of claim 1 is realized.
CN202010848981.0A 2020-08-21 2020-08-21 Interest visual relation detection method and device based on interest propagation network Active CN111985505B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010848981.0A CN111985505B (en) 2020-08-21 2020-08-21 Interest visual relation detection method and device based on interest propagation network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010848981.0A CN111985505B (en) 2020-08-21 2020-08-21 Interest visual relation detection method and device based on interest propagation network

Publications (2)

Publication Number Publication Date
CN111985505A true CN111985505A (en) 2020-11-24
CN111985505B CN111985505B (en) 2024-02-13

Family

ID=73442732

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010848981.0A Active CN111985505B (en) 2020-08-21 2020-08-21 Interest visual relation detection method and device based on interest propagation network

Country Status (1)

Country Link
CN (1) CN111985505B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100278420A1 (en) * 2009-04-02 2010-11-04 Siemens Corporation Predicate Logic based Image Grammars for Complex Visual Pattern Recognition
CN105045907A (en) * 2015-08-10 2015-11-11 北京工业大学 Method for constructing visual attention-label-user interest tree for personalized social image recommendation
US20160314597A1 (en) * 2007-07-03 2016-10-27 Shoppertrak Rct Corporation System and process for detecting, tracking and counting human objects of interest
CN108229272A (en) * 2017-02-23 2018-06-29 北京市商汤科技开发有限公司 Vision relationship detection method and device and vision relationship detection training method and device
CN108229491A (en) * 2017-02-28 2018-06-29 北京市商汤科技开发有限公司 The method, apparatus and equipment of detection object relationship from picture
CN108229477A (en) * 2018-01-25 2018-06-29 深圳市商汤科技有限公司 For visual correlation recognition methods, device, equipment and the storage medium of image
WO2019035771A1 (en) * 2017-08-17 2019-02-21 National University Of Singapore Video visual relation detection methods and systems
CN110796472A (en) * 2019-09-02 2020-02-14 腾讯科技(深圳)有限公司 Information pushing method and device, computer readable storage medium and computer equipment
CN110889397A (en) * 2018-12-28 2020-03-17 南京大学 Visual relation segmentation method taking human as main body
CN111125406A (en) * 2019-12-23 2020-05-08 天津大学 Visual relation detection method based on self-adaptive cluster learning
CN111325279A (en) * 2020-02-26 2020-06-23 福州大学 Pedestrian and personal sensitive article tracking method fusing visual relationship
CN111325243A (en) * 2020-02-03 2020-06-23 天津大学 Visual relation detection method based on regional attention learning mechanism
CN111368829A (en) * 2020-02-28 2020-07-03 北京理工大学 Visual semantic relation detection method based on RGB-D image
CN116089732A (en) * 2023-04-11 2023-05-09 江西时刻互动科技股份有限公司 User preference identification method and system based on advertisement click data
CN116628052A (en) * 2022-02-18 2023-08-22 罗伯特·博世有限公司 Apparatus and computer-implemented method for adding quantity facts to a knowledge base

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160314597A1 (en) * 2007-07-03 2016-10-27 Shoppertrak Rct Corporation System and process for detecting, tracking and counting human objects of interest
US20100278420A1 (en) * 2009-04-02 2010-11-04 Siemens Corporation Predicate Logic based Image Grammars for Complex Visual Pattern Recognition
CN105045907A (en) * 2015-08-10 2015-11-11 北京工业大学 Method for constructing visual attention-label-user interest tree for personalized social image recommendation
CN108229272A (en) * 2017-02-23 2018-06-29 北京市商汤科技开发有限公司 Vision relationship detection method and device and vision relationship detection training method and device
CN108229491A (en) * 2017-02-28 2018-06-29 北京市商汤科技开发有限公司 The method, apparatus and equipment of detection object relationship from picture
WO2019035771A1 (en) * 2017-08-17 2019-02-21 National University Of Singapore Video visual relation detection methods and systems
CN108229477A (en) * 2018-01-25 2018-06-29 深圳市商汤科技有限公司 For visual correlation recognition methods, device, equipment and the storage medium of image
CN110889397A (en) * 2018-12-28 2020-03-17 南京大学 Visual relation segmentation method taking human as main body
CN110796472A (en) * 2019-09-02 2020-02-14 腾讯科技(深圳)有限公司 Information pushing method and device, computer readable storage medium and computer equipment
CN111125406A (en) * 2019-12-23 2020-05-08 天津大学 Visual relation detection method based on self-adaptive cluster learning
CN111325243A (en) * 2020-02-03 2020-06-23 天津大学 Visual relation detection method based on regional attention learning mechanism
CN111325279A (en) * 2020-02-26 2020-06-23 福州大学 Pedestrian and personal sensitive article tracking method fusing visual relationship
CN111368829A (en) * 2020-02-28 2020-07-03 北京理工大学 Visual semantic relation detection method based on RGB-D image
CN116628052A (en) * 2022-02-18 2023-08-22 罗伯特·博世有限公司 Apparatus and computer-implemented method for adding quantity facts to a knowledge base
CN116089732A (en) * 2023-04-11 2023-05-09 江西时刻互动科技股份有限公司 User preference identification method and system based on advertisement click data

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
YU, FAN,等: "Visual Relation of Interest Detection", 《MM \'20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA》, pages 1386 - 1394 *
ZHOU, HAO,等: "Visual Relationship Detection with Relative Location Mining", 《PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM\'19) 》, pages 30 - 38 *
吴建超,等: "视频群体行为识别综述", 《软件学报》, vol. 34, no. 2, pages 964 - 984 *
陈方芳: "基于目标对筛选和联合谓语识别的视觉关系检测", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 8, pages 138 - 657 *

Also Published As

Publication number Publication date
CN111985505B (en) 2024-02-13

Similar Documents

Publication Publication Date Title
CN110516067B (en) Public opinion monitoring method, system and storage medium based on topic detection
CN112182166B (en) Text matching method and device, electronic equipment and storage medium
US11514244B2 (en) Structured knowledge modeling and extraction from images
US9183467B2 (en) Sketch segmentation
WO2020248391A1 (en) Case brief classification method and apparatus, computer device, and storage medium
CN110598005A (en) Public safety event-oriented multi-source heterogeneous data knowledge graph construction method
CN111259940A (en) Target detection method based on space attention map
CN113822224A (en) Rumor detection method and device integrating multi-modal learning and multi-granularity structure learning
CN114663915A (en) Image human-object interaction positioning method and system based on Transformer model
CN114429566A (en) Image semantic understanding method, device, equipment and storage medium
CN116610778A (en) Bidirectional image-text matching method based on cross-modal global and local attention mechanism
US20200364259A1 (en) Image retrieval
CN113902764A (en) Semantic-based image-text cross-modal retrieval method
CN111159411B (en) Knowledge graph fused text position analysis method, system and storage medium
US20230290118A1 (en) Automatic classification method and system of teaching videos based on different presentation forms
CN111985505B (en) Interest visual relation detection method and device based on interest propagation network
CN112069898A (en) Method and device for recognizing human face group attribute based on transfer learning
Shf et al. Review on deep based object detection
Liu et al. RDBN: Visual relationship detection with inaccurate RGB-D images
CN111368829A (en) Visual semantic relation detection method based on RGB-D image
WO2023173552A1 (en) Establishment method for target detection model, application method for target detection model, and device, apparatus and medium
CN110750673A (en) Image processing method, device, equipment and storage medium
CN115292533A (en) Cross-modal pedestrian retrieval method driven by visual positioning
CN113159071B (en) Cross-modal image-text association anomaly detection method
He et al. Investigating YOLO Models Towards Outdoor Obstacle Detection For Visually Impaired People

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant