CN108845983B - Semantic evaluation method based on scene description - Google Patents

Semantic evaluation method based on scene description Download PDF

Info

Publication number
CN108845983B
CN108845983B CN201810429509.6A CN201810429509A CN108845983B CN 108845983 B CN108845983 B CN 108845983B CN 201810429509 A CN201810429509 A CN 201810429509A CN 108845983 B CN108845983 B CN 108845983B
Authority
CN
China
Prior art keywords
english sentences
sentence
similarity
words
english
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810429509.6A
Other languages
Chinese (zh)
Other versions
CN108845983A (en
Inventor
马苗
王伯龙
武杰
郭敏
吴琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Siyuan Integrated Circuit And Intelligent Technology Research Institute Co ltd
Original Assignee
Shaanxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi Normal University filed Critical Shaanxi Normal University
Priority to CN201810429509.6A priority Critical patent/CN108845983B/en
Publication of CN108845983A publication Critical patent/CN108845983A/en
Application granted granted Critical
Publication of CN108845983B publication Critical patent/CN108845983B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

A semantic evaluation method based on scene description comprises the steps of analyzing the part of speech of English sentences, counting the number of related words by using a synonym library, and determining the similarity between 5 English sentences and generated sentences. According to the method, the similarity of two sentences is determined by extracting keywords of 5 English sentences, associating a synonym library for each keyword and taking the repeated number of the generated sentence keywords and the repeated number of the words of the synonym library corresponding to the 5 English sentences as a reference coefficient. The method has the advantages of reasonable evaluation result, strong practicability, high operation speed and the like, and can be applied to the technical field of scene description evaluation.

Description

Semantic evaluation method based on scene description
Technical Field
The invention belongs to the technical field of intersection of computer vision and natural language processing, and particularly relates to a method for determining similarity between reference sentences and generated sentences.
Background
Description of visual scene information in images or videos in natural language is one of the research hotspots in recent years in computer vision, and relates to the problem of form conversion from images or videos to text sentences, namely, image title and video title technology. With the continuous deepening of researchers at home and abroad in the fields of image titles and video titles, more and more scene description algorithms and evaluation indexes of scene description effects, such as BLEU, CIDER-D, ROUGE and the like, are proposed. However, the determination methods for refining these indexes are all based on the determination of n-tuple or the longest common sequence, that is, when the similarity of two sentences is judged, only the matching degree of the words with identical spelling in the two sentences to be evaluated is considered. The scene description effect in the strict sense is given, the semantic information of the objects and the relations in the scene is not utilized, and the evaluation result is particularly not suitable for the two problems of ' same semantics due to different sentence expressions ' or ' same semantics of sentence n-tuples ' but different semantics '.
Disclosure of Invention
The technical problem to be solved by the invention is to overcome the defects of the prior art and provide a semantic evaluation method based on scene description, which is reasonable in method, strong in practicability and high in operation speed.
The technical scheme adopted for solving the technical problems comprises the following steps:
(1) analyzing part of speech of English sentence
1) Selecting 5 English sentences in an original image to be described in a scene from an MSCOCO image data set, wherein the 5 English sentences are marked as S1,S2,S3,S4,S5
2) And according to different text description generation models, carrying out scene description on the selected original image to obtain a generation sentence Sg.
3) Counting the number of keywords in the generated sentence Sg, and dividing all the keywords in the generated sentence Sg into a noun set n according to nouns, verbs, adjectives and adverbs1Verb set v1Set of adjectives and adverbs a1The number of words in each set is respectively expressed as Cn1、Cv1、Ca1
4) Counting the number of keywords in 5 English sentences, and dividing the 5 English sentences S according to nouns, verbs, adjectives and adverbs1,S2,S3,S4,S5The key word in (1) is divided into n2 i、v2 i、a2 iSets, the number of words in each set being respectively represented as Cn2 i、Cv2 i、Ca2 i,i∈[1,5]。
(2) Counting the number of related words by using synonym library
1) Com website, respectively for 5 English sentences S1,S2,S3,S4,S5Set of keywords n2 i、v2 i、a2 iThe synonyms are inquired by the words in (1) to obtain a corresponding synonym Set-niSet-v of synonymsiSet-a of synonymsi
2) Respectively determining a keyword noun set n of the generated sentence Sg1Verb set v1Set of adjectives and adverbs a1Chinese word and 5 English sentences S1,S2,S3,S4,S5Keyword n in (1)2 i、v2 i、a2 iSet-n of words or corresponding synonyms in a Seti、Set-vi、Set-aiThe same number of Chinese words, i.e. determining (n)1∩n2 i)∪(n1∩Set-ni)、(v1∩v2 i)∪(v1∩Set-vi)、(a1∩a2 i)∪(a1∩Set-ai) The number of elements in the three sets is Cn-syn i、Cv-syn i、Ca-syn i,i∈[1,5]。
(3) Determining the similarity between 5 English sentences and the generated sentence Sg
1) Generating sentence Sg and 5 English sentences S1,S2,S3,S4,S5Coefficient k of similarity of parts of speechiComprises the following steps:
Figure BDA0001653039150000021
similarity of parts of speech systemNumber kiValue range [0,1 ]]。
2) Generating sentence Sg and 5 English sentences S1,S2,S3,S4,S5Semantic similarity coefficient j ofiComprises the following steps:
Figure BDA0001653039150000022
semantic similarity coefficient jiValue range [0,1 ]]。
3) Determining the generation sentence Sg and 5 English sentences S1,S2,S3,S4,S5Sentence similarity siComprises the following steps:
Figure BDA0001653039150000023
similarity siValue range [0,1 ]]。
4) The generated sentence Sg and 5 English sentences S are determined according to the following formula1,S2,S3,S4,S5Maximum sentence similarity of (2):
SimilarSyn=max{si} (4)
in the step (2) of counting the number of related words by using the synonym library, the invention respectively counts 5 English sentences S1,S2,S3,S4,S5Set of keywords n2 i、v2 i、a2 iThe synonym query method for the words comprises the following steps: 5 English sentences S1,S2,S3,S4,S5Inputting the English sentences into a Linux system, and enabling the system to convert 5 English sentences S into nouns, verbs, adjectives and adverbs1,S2,S3,S4And dividing all the keywords in the S into 3 sets, querying the 3 sets for a synonym set through a Thesaurus.com website, and returning synonyms of the keywords in the 5 English sentences.
In the step 2) of analyzing parts of speech of the English sentence, the text description generation model is a deep network model under a coding-decoding framework.
Com internet synonym library is adopted, key words in reference sentences are expanded to synonym sets of all words according to three parts of speech, and the key words are correspondingly matched with all words in the generated sentences to be evaluated, so that the semantic level matching problem of the generated sentences and the reference sentences under the condition that the sentences are expressed differently and the semantics are the same or the sentence n-tuples are the same but the semantics are different is effectively solved. The method has the advantages of reasonable method, strong practicability, high operation speed and the like, and can be applied to the technical field of scene description evaluation.
Drawings
FIG. 1 is a schematic flow chart of example 1 of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, but the present invention is not limited to the examples.
Example 1
In this embodiment, a picture with the training set number of 000000425762 from the MSCOCO image dataset is selected, and a semantic evaluation method based on scene description is adopted for 5 english sentences of the image, and the steps are as follows:
(1) analyzing part of speech of English sentence
1) Selecting 5 English sentences and 5 English sentences in an original image to be described in a scene from an MSCOCO image data set
Is marked as S1,S2,S3,S4,S5And 5 English sentences are:
S1:A plate filled with sliced beef a bun and potatoes.
S2:Pull pork sandwich and potatoes sit on a white plate.
S3:A very meaty sandwich with uniquely shaped fries.
S4:A plate of potatoes with a pulled pork sandwich next to it.
S5:This is an image of a meal with meat,bread and potatoes.
2) according to different text description generation models, scene description is carried out on the selected original image, the text description generation model of the embodiment is a 'VGG LSTM' model under a coding-decoding framework, and the 'VGG LSTM' model is already in Donahue J, Hendricks L A, Guadrama S et al Long-term temporal recovery conditional network for visual recovery and description [ C ]. proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR),2015: 677-: a plate of food with mean and vegetables.
3) Counting the number of keywords in the generated sentence Sg, and dividing all the keywords in the generated sentence Sg into a noun set n according to nouns, verbs, adjectives and adverbs1Verb set v1Set of adjectives and adverbs a1The number of words in each set is respectively expressed as Cn1、Cv1、Ca1. Noun set n in this embodiment1Is { plate, food, mean, vegetables }, verb set v1As an empty set, an adjective and adverb set a1The number of words in each set is Cn respectively for the empty set1Is 4, Cv1Is 0, Ca1Is 0.
4) Counting the number of keywords in 5 English sentences, and dividing the 5 English sentences S according to nouns, verbs, adjectives and adverbs1,S2,S3,S4,S5The key word in (1) is divided into n2 i、v2 i、a2 iSets, the number of words in each set being respectively represented as Cn2 i、Cv2 i、Ca2 i,i∈[1,5]. N of the present embodiment2 iIs { plate, beef, bun, potatoes }, v2 iIs an empty set, a2 iIs { filtered }, the number of words in each set is respectively expressed as Cn2 iIs 4, Cv2 iIs 0, Ca2 iIs 1.
(2) Counting the number of related words by using synonym library
1) By ThesaurCom website, for 5 English sentences S respectively1,S2,S3,S4,S5Set of keywords n2 i、v2 i、a2 iThe synonym of the word query in (1), 5 English sentences S of this embodiment1,S2,S3,S4,S5Set of keywords n2 i、v2 i、a2 iThe synonym query method for the words comprises the following steps: 5 English sentences S1,S2,S3,S4,S5Inputting the English sentences into a Linux system, and enabling the system to convert 5 English sentences S into nouns, verbs, adjectives and adverbs1,S2,S3,S4All keywords in S are divided into 3 sets, 3 sets n2 iIs { plate, beef, bun, potatoes }, v2 iIs an empty set, a2 iCom website, query synonym set for 3 sets and return synonyms for keywords in these 5 english sentences.
Obtaining the corresponding synonym Set-niSet-v of synonymsiSet-a of synonymsi. Synonym Set-n of this embodimentiIs { place synonym }. U { beef synonym }. U { Bun synonym }. U { potatoes synonym }, namely { bow, platter, service, casserole, course, help, portion, service, trescher }. U { mean, arm, brawn, fly, force, heftress, light, muscle, phique, power, robustness, sine, steam, stronggth, vigor } { the W, read, doughout, muffin, business, croller, Danish, eclair, sweet roll }. { yam, rphy, plant, task, turbo, Set of thesaureograms }. Set-v synonym }. U, table, root, and U-v synonym }iAs an empty Set, Set-a synonym SetiIs { filed synonym }. U { sliced synonym }, i.e., { brimming, full, repeat, permated }. U { caree, clear, divide, hack, segment, share, shred, coast, slit, stripe, disconnect, disperver, gap, exposure, history, segment, subdivision, subinder, chiv }.
2) Respectively determining a keyword noun set n of the generated sentence Sg1Verb set v1Set of adjectives and adverbs a1Chinese word and 5 English sentences S1,S2,S3,S4,S5Keyword n in (1)2 i、v2 i、a2 iSet-n of words or corresponding synonyms in a SetiSet-v of synonymsiSet-a of synonymsiThe same number of Chinese words, i.e. determining (n)1∩n2 i)∪(n1∩Set-ni)、(v1∩v2 i)∪(v1∩Set-vi)、(a1∩a2 i)∪(a1∩Set-ai) The number of elements in the three sets is Cn-syn i、Cv-syn i、Ca-syn i,i∈[1,5]. C of the present examplen-syn iIs 2, Cv-syn iIs 0, Ca-syn iIs 0.
(3) Determining the similarity between 5 English sentences and the generated sentence Sg
1) Generating sentence Sg and 5 English sentences S1,S2,S3,S4,S5Coefficient k of similarity of parts of speechiComprises the following steps:
Figure BDA0001653039150000051
similarity coefficient of part of speech kiValue range [0,1 ]]. In this embodiment, the part-of-speech similarity coefficient k of 5 english sentences is obtained by equation (1) when i is 1, 2, 3, 4, 5i0.5833, 0.3333, 0.5, 0.6667, 0.9993.
2) Generating sentence Sg and 5 English sentences S1,S2,S3,S4,S5Semantic similarity coefficient j ofiComprises the following steps:
Figure BDA0001653039150000052
semantic similarity coefficient value range [0,1 ]]In the present embodiment, the first and second electrodes are,obtaining semantic similarity coefficients j of 5 English sentences when i is 1, 2, 3, 4 and 5 according to the formula (2)i0.5, 0.25, 0, 0.25, 0.25.
3) Determining the generation sentence Sg and 5 English sentences S1,S2,S3,S4,S5Sentence similarity siComprises the following steps:
Figure BDA0001653039150000061
in the formula siValue range [0,1 ]]. In this embodiment, the phrase similarity s of 5 english phrases is obtained by the following equation (3), where i is 1, 2, 3, 4, 5i0.504, 0.254, 0.025, 0.271, 0.284.
4) The generated sentence Sg and 5 English sentences S are determined according to the following formula1,S2,S3,S4,S5Maximum sentence similarity of (2):
SimilarSyn=max{si} (4)
in this embodiment, the idiom sentence Sg and 5 English sentences S are obtained according to the formula (4)1,S2,S3,S4,S5The maximum sentence similarity of (2) is 0.504.
Example 2
In this embodiment, a picture with the training set number of 000000454956 from the MSCOCO image dataset is selected, and a semantic evaluation method based on scene description is adopted for 5 english sentences of the image, and the steps are as follows:
(1) analyzing part of speech of English sentence
1) Selecting 5 English sentences in an original image to be described in a scene from an MSCOCO image data set, wherein the 5 English sentences are marked as S1,S2,S3,S4,S5And 5 English sentences are:
S1:Two bears can be seen grazing in the grass at the side of the road.
S2:Two black bears are in the grass next to the road.
S3:A couple of bears next to a road.
S4:Two black bears eating grass on the side of the road.
S5:A pair of black bears stand in the grass on the side of the road.
2) according to different text description generation models, performing scene description on a selected original image, where the text description generation model of this embodiment is a "VGG LSTM" model in an encoding-decoding framework, and the "VGG LSTM" model is the same as that in embodiment 1, and a generation sentence Sg is obtained as follows: a bear is walking through the grass near a tree.
3) Counting the number of keywords in the generated sentence Sg, and dividing all the keywords in the generated sentence Sg into a noun set n according to nouns, verbs, adjectives and adverbs1Verb set v1Set of adjectives and adverbs a1The number of words in each set is respectively expressed as Cn1、Cv1、Ca1. Noun set n in this embodiment1Is { bear, grass, tree }, verb set v1As an empty set, an adjective and adverb set a1The number of words in each set is Cn respectively for the empty set1Is 3, Cv1Is 1, Ca1Is 0.
4) Counting the number of keywords in 5 English sentences, and dividing the 5 English sentences S according to nouns, verbs, adjectives and adverbs1,S2,S3,S4,S5The key word in (1) is divided into n2 i、v2 i、a2 iSets, the number of words in each set being respectively represented as Cn2 i、Cv2 i、Ca2 i,i∈[1,5]. N of the present embodiment2 iIs { bear, grass, road }, v2 iIs { grazing, see }, a2 iGiven as { side }, the number of words in each set is denoted Cn2 iIs 3, Cv2 iIs 2, Ca2 iIs 1.
(2) Counting the number of related words by using synonym library
1) Com website, respectively for 5 English sentences S1,S2,S3,S4,S5Set of keywords n2 i、v2 i、a2 iThe method for searching synonyms for words is the same as that in embodiment 1, and a corresponding synonym Set-n is obtainediSet-v of synonymsiSet-a of synonymsi. Synonym Set-n of this embodimentiIs { bear synonym }. U { grass synonym }. U { tree synonym }, i.e., { bararian, bear, boob, brute, buffoon, cad, churl, dork, goon, lout, oaf, peadant, philistine, rube, vulgarian } { mean, hay, tutu, turf, sod, verdure, brarley, grama } { sampling, shru, wood, forest, timeber, wood, pulp, stock, seedling, softwood, hardwood, topoiary }, synonym Set-viIs { sizing synonym } { section synonym }, i.e., { bagging, biting, clamping, cropping, observing, feeding, formatting, gnawing, mapping, multicasting, unicoding, passivating, uploading } { detect, extract, identify, hook, look, notice, object, record, replay, spot, view, watch, wireless, beam, book, record, distribute, distinguishment, copy, eye, flash, gap, gawout, gazeto, glaze, trim, text, survey, record, trace, surveyiIs { side synonym }, i.e., { incidenal, lateral, oblique, potern, roundabout, secondary, skerting, subordinate, subspace, andillary, indirect, lesser, margin, not the main, off-center, sildelong, sideward, sideways, sidewise, supericial }.
2) Respectively determining a keyword noun set n of the generated sentence Sg1Verb set v1Set of adjectives and adverbs a1Chinese word and 5 English sentences S1,S2,S3,S4,S5Keyword n in (1)2 i、v2 i、a2 iSet-n of words or corresponding synonyms in a SetiSet-v of synonymsiSet-a of synonymsiThe same number of Chinese words, i.e. determining (n)1∩n2 i)∪(n1∩Set-ni)、(v1∩v2 i)∪(v1∩Set-vi)、(a1∩a2 i)∪(a1∩Set-ai) The number of elements in the three sets is Cn-syn i、Cv-syn i、Ca-syn i,i∈[1,5]. C of the present examplen-syn iIs 2, Cv-syn iIs 0, Ca-syn iIs 0.
(3) Determining the similarity between 5 English sentences and the generated sentence Sg
1) Generating sentence Sg and 5 English sentences S1,S2,S3,S4,S5Coefficient k of similarity of parts of speechiComprises the following steps:
Figure BDA0001653039150000081
similarity coefficient of part of speech kiValue range [0,1 ]]. In this embodiment, the part-of-speech similarity coefficient k of 5 english sentences is obtained by equation (1) when i is 1, 2, 3, 4, 5iIs 0.5, 0.6667, 0.6667, 0.6667, 0.5833.
2) Generating sentence Sg and 5 English sentences S1,S2,S3,S4,S5Semantic similarity coefficient j ofiComprises the following steps:
Figure BDA0001653039150000082
semantic similarity coefficient value range [0,1 ]]In this embodiment, the semantic similarity coefficient j of 5 english sentences is obtained according to equation (2), where i is 1, 2, 3, 4, 5iIs 0.5, 0.5, 0.25, 0.5, 0.5.
3) Determine the generating sentence Sg and5 English sentences S1,S2,S3,S4,S5Sentence similarity siComprises the following steps:
Figure BDA0001653039150000083
in the formula siValue range [0,1 ]]. In this embodiment, the phrase similarity s of 5 english phrases is obtained by the following equation (3), where i is 1, 2, 3, 4, 5i0.5, 0.508, 0.254, 0.508, 0.504.
4) The generated sentence Sg and 5 English sentences S are determined according to the following formula1,S2,S3,S4,S5Maximum sentence similarity of (2):
SimilarSyn=max{si} (4)
in this embodiment, the idiom sentence Sg and 5 English sentences S are obtained according to the formula (4)1,S2,S3,S4,S5The maximum sentence similarity of (2) is 0.508.
Example 3
In this embodiment, a training set picture from an MSCOCO image dataset is selected, and a semantic evaluation method based on scene description is adopted for 5 english sentences of an image, the steps of which are as follows:
(1) analyzing part of speech of English sentence
1) Selecting 5 English sentences in an original image to be described in a scene from an MSCOCO image data set, wherein the 5 English sentences are marked as S1,S2,S3,S4,S5And 5 English sentences are:
S1:A young girl standing on top of a tennis court.
S2:A young girl standing on top of a tennis court holding a racquet.
S3:A kid holding a racket ready to kick the ball.
S4:A kid is standing on a tennis court with a racket.
S5:A young girl playing tennis at a tennis court.
2) according to different text description generation models, performing scene description on a selected original image, where the text description generation model of this embodiment is a "VGG LSTM" model in an encoding-decoding framework, and the "VGG LSTM" model is the same as that in embodiment 1, and a generation sentence Sg is obtained as follows: a gifffe holding on top of a green field.
3) Counting the number of keywords in the generated sentence Sg, and dividing all the keywords in the generated sentence Sg into a noun set n according to nouns, verbs, adjectives and adverbs1Verb set v1Set of adjectives and adverbs a1The number of words in each set is respectively expressed as Cn1、Cv1、Ca1. Noun set n in this embodiment1Is { giraffe, top, field }, verb set v1Is { standing }, set of adjectives and adverbs a1Is { green }, the number of words in each set is Cn1Is 3, Cv1Is 1, Ca1Is 1.
4) Counting the number of keywords in 5 English sentences, and dividing the 5 English sentences S according to nouns, verbs, adjectives and adverbs1,S2,S3,S4,S5The key word in (1) is divided into n2 i、v2 i、a2 iSets, the number of words in each set being respectively represented as Cn2 i、Cv2 i、Ca2 i,i∈[1,5]. N of the present embodiment2 iIs { girl, top, tenis court }, v2 iIs { standing }, a2 iIs { you ng }, the number of words in each set is respectively expressed as Cn2 iIs 3, Cv2 iIs 1, Ca2 iIs 1.
(2) Counting the number of related words by using synonym library
1) Com website, respectively for 5 English sentences S1,S2,S3,S4,S5Set of keywords n2 i、v2 i、a2 iThe method for searching synonyms for words is the same as that in embodiment 1, and a corresponding synonym Set-n is obtainediSet-v of synonymsiSet-a of synonymsi. Synonym Set-n of this embodimentiIs { giraffe synonym }. U { top synonym }. U { field synonym }, namely { buffalo, camel, cattle, cow, der, elephant, hippopotamus, hog, horse, lama, pig, rhinoceros, swing, tapir }. { acme, apex, apege, cap, captial, ceiling, comining, climax, corrk, cover, gate, crop, crown, cumming, cup, face, failure, fine, head, height, hippoint, light, limit, maximum, meridian, pinacle, point, roolf, spire, store, surfer, surfeit, zeeci, surfeit, surficia, map, graph, map, plot, map, broadcast, survey, surficia, broadcastiIs { standing synonym }, i.e., { existing, restraining, fixed, regular, predicted, permanent }, synonym Set-aiIs { your synonym }, i.e., { bundling, inexperienced, new, youthful, adolescent, blooming, blossoming, loud, developping, fledging, green, growing, infarnent, preferor, junior, junvene, little, modeler, newborn, sink, raw, recent, tender, tendefoot, boyyish, boyliike, burgeoning, calaow, gilise, early, fresh, girish, gilike, halvelf-slope, innorant, new, noble, pubesent, unelated, undispensed, unected, unexposed, found, empty, green, empty, or empty.
2) Respectively determining a keyword noun set n of the generated sentence Sg1Verb set v1Set of adjectives and adverbs a1Chinese word and 5 English sentences S1,S2,S3,S4,S5Keyword n in (1)2 i、v2 i、a2 iSet-n of words or corresponding synonyms in a SetiSet-v of synonymsiSet-a of synonymsiThe same number of Chinese words, i.e. determining (n)1∩n2 i)∪(n1∩Set-ni)、(v1∩v2 i)∪(v1∩Set-vi)、(a1∩a2 i)∪(a1∩Set-ai) The number of elements in the three sets is Cn-syn i、Cv-syn i、Ca-syn i,i∈[1,5]. C of the present examplen-syn iIs 1, Cv-syn iIs 1, Ca-syn iIs 0.
(3) Determining the similarity between 5 English sentences and the generated sentence Sg
1) Generating sentence Sg and 5 English sentences S1,S2,S3,S4,S5Coefficient k of similarity of parts of speechiComprises the following steps:
Figure BDA0001653039150000101
similarity coefficient of part of speech kiValue range [0,1 ]]. In this embodiment, the part-of-speech similarity coefficient k of 5 english sentences is obtained by equation (1) when i is 1, 2, 3, 4, 5iIs 1, 0.75, 0.8333, 0.6667, 0.9333.
2) Generating sentence Sg and 5 English sentences S1,S2,S3,S4,S5Semantic similarity coefficient j ofiComprises the following steps:
Figure BDA0001653039150000102
semantic similarity coefficient value range [0,1 ]]In this embodiment, the semantic similarity coefficient j of 5 english sentences is obtained according to equation (2), where i is 1, 2, 3, 4, 5i0.4, 0.4, 0, 0.3, 0.
3) Determining the generation sentence Sg and 5 English sentences S1,S2,S3,S4,S5Sentence similarity siComprises the following steps:
Figure BDA0001653039150000111
in the formula siValue range [0,1 ]]. In this embodiment, the phrase similarity s of 5 english phrases is obtained by the following equation (3), where i is 1, 2, 3, 4, 5i0.43, 0.418, 0.042, 0.223, 0.25.
4) The generated sentence Sg and 5 English sentences S are determined according to the following formula1,S2,S3,S4,S5Maximum sentence similarity of (2):
SimilarSyn=max{si} (4)
in this embodiment, the idiom sentence Sg and 5 English sentences S are obtained according to the formula (4)1,S2,S3,S4,S5The maximum sentence similarity of (2) is 0.43.

Claims (3)

1. A semantic evaluation method based on scene description is characterized by comprising the following steps:
(1) analyzing part of speech of English sentence
1) Selecting 5 English sentences in an original image to be described in a scene from an MSCOCO image data set, wherein the 5 English sentences are marked as S1,S2,S3,S4,S5
2) According to different text description generation models, carrying out scene description on the selected original image to obtain a generation sentence Sg;
3) counting the number of keywords in the generated sentence Sg, and dividing all the keywords in the generated sentence Sg into a noun set n according to nouns, verbs, adjectives and adverbs1Verb set v1Set of adjectives and adverbs a1The number of words in each set is respectively expressed as Cn1、Cv1、Ca1
4) Counting the number of keywords in 5 English sentences, and dividing the 5 English sentences S according to nouns, verbs, adjectives and adverbs1,S2,S3,S4,S5The key word in (1) is divided into n2 i、v2 i、a2 iSets, the number of words in each set being respectively represented as Cn2 i、Cv2 i、Ca2 i,i∈[1,5];
(2) Counting the number of related words by using synonym library
1) Com website, respectively for 5 English sentences S1,S2,S3,S4,S5Set of keywords n2 i、v2 i、a2 iThe synonyms are inquired by the words in (1) to obtain a corresponding synonym Set-niSet-v of synonymsiSet-a of synonymsi
2) Respectively determining a keyword noun set n of the generated sentence Sg1Verb set v1Set of adjectives and adverbs a1Chinese word and 5 English sentences S1,S2,S3,S4,S5Keyword n in (1)2 i、v2 i、a2 iSet-n of words or corresponding synonyms in a Seti、Set-vi、Set-aiThe same number of Chinese words, i.e. determining (n)1∩n2 i)∪(n1∩Set-ni)、(v1∩v2 i)∪(v1∩Set-vi)、(a1∩a2 i)∪(a1∩Set-ai) The number of elements in the three sets is Cn-syn i、Cv-syn i、Ca-syn i,i∈[1,5];
(3) Determining the similarity between 5 English sentences and the generated sentence Sg
1) Generating sentence Sg and 5 English sentences S1,S2,S3,S4,S5Coefficient k of similarity of parts of speechiComprises the following steps:
Figure FDA0001653039140000011
similarity coefficient of part of speech kiValue range [0,1 ]];
2) Generating sentence Sg and 5 English sentences S1,S2,S3,S4,S5Semantic similarity coefficient j ofiComprises the following steps:
Figure FDA0001653039140000021
semantic similarity coefficient jiValue range [0,1 ]];
3) Determining the generation sentence Sg and 5 English sentences S1,S2,S3,S4,S5Sentence similarity siComprises the following steps:
Figure FDA0001653039140000022
similarity siValue range [0,1 ]];
4) The generated sentence Sg and 5 English sentences S are determined according to the following formula1,S2,S3,S4,S5Maximum sentence similarity of (2):
SimilarSyn=max{si} (4)。
2. the semantic evaluation method based on scene description according to claim 1, wherein in the step (2) of counting the number of related words by using the thesaurus, said step of counting 5 English sentences S1,S2,S3,S4,S5Set of keywords n2 i、v2 i、a2 iThe synonym query method for the words comprises the following steps: 5 English sentences S1,S2,S3,S4,S5Inputting the English sentences into a Linux system, and enabling the system to convert 5 English sentences S into nouns, verbs, adjectives and adverbs1,S2,S3,S4And dividing all keywords in the S into 3 sets, and performing similarity analysis on the 3 sets through the ThesaurusAnd (5) combining the query synonym sets and returning synonyms of the keywords in the 5 English sentences.
3. The scene description-based semantic evaluation method according to claim 1, characterized in that: in the step 2) of analyzing parts of speech of the english sentence, the text description generation model is a deep network model under an encoding-decoding framework.
CN201810429509.6A 2018-05-08 2018-05-08 Semantic evaluation method based on scene description Active CN108845983B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810429509.6A CN108845983B (en) 2018-05-08 2018-05-08 Semantic evaluation method based on scene description

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810429509.6A CN108845983B (en) 2018-05-08 2018-05-08 Semantic evaluation method based on scene description

Publications (2)

Publication Number Publication Date
CN108845983A CN108845983A (en) 2018-11-20
CN108845983B true CN108845983B (en) 2021-11-05

Family

ID=64212696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810429509.6A Active CN108845983B (en) 2018-05-08 2018-05-08 Semantic evaluation method based on scene description

Country Status (1)

Country Link
CN (1) CN108845983B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688916A (en) * 2019-09-12 2020-01-14 武汉理工大学 Video description method and device based on entity relationship extraction

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104182386A (en) * 2013-05-27 2014-12-03 华东师范大学 Word pair relation similarity calculation method
CN105677634A (en) * 2015-07-18 2016-06-15 孙维国 Method for extracting sentences with similar meanings and standard grammar from academic documents
CN107480144A (en) * 2017-08-03 2017-12-15 中国人民大学 Possess the image natural language description generation method and device across language learning ability

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9082040B2 (en) * 2011-05-13 2015-07-14 Microsoft Technology Licensing, Llc Identifying visual contextual synonyms

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104182386A (en) * 2013-05-27 2014-12-03 华东师范大学 Word pair relation similarity calculation method
CN105677634A (en) * 2015-07-18 2016-06-15 孙维国 Method for extracting sentences with similar meanings and standard grammar from academic documents
CN107480144A (en) * 2017-08-03 2017-12-15 中国人民大学 Possess the image natural language description generation method and device across language learning ability

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
《Instance-aware image and sentence matching with selective multimodal LSTM》;Huang Y 等,;《 Computer Vision and Pattern Recognition》;20161231;7254-7262 *
An automatic metric for MT evaluation with improved correlation with human judgments;Banerjee S 等,;《he 43rd Annual Meeting on Association for Computational Linguistics》;20051231;65-72 *
Exploring Nearest Neighbor Approaches for Image Captioning;Jacob Devlin 等;《https://arxiv.org/abs/1505.04467》;20150717;1-6 *
Re-evaluating Automatic Metrics for Image Captioning;Mert Kilickaya 等;《https://arxiv.org/abs/1612.07600》;20161222;1-11 *
Semantic propositional image caption evaluation;Anderson P 等;《Computer Vision》;20161231;382-398 *
融合图像场景及物体先验知识的图像描述生成模型;汤鹏杰等;《中国图象图形学报》;20170916(第09期);1251-1260 *

Also Published As

Publication number Publication date
CN108845983A (en) 2018-11-20

Similar Documents

Publication Publication Date Title
Krishna et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations
US10430689B2 (en) Training a classifier algorithm used for automatically generating tags to be applied to images
Ghoshal et al. Hidden Markov models for automatic annotation and content-based retrieval of images and video
Divvala et al. Learning everything about anything: Webly-supervised visual concept learning
Le et al. Tuhoi: Trento universal human object interaction dataset
Agirre et al. Unsupervised WSD based on automatically retrieved examples: The importance of bias
Rui et al. Bipartite graph reinforcement model for web image annotation
Larkey et al. Language-specific models in multilingual topic tracking
CN104408115B (en) The heterogeneous resource based on semantic interlink recommends method and apparatus on a kind of TV platform
KR20090017830A (en) Apparatus for providing aspect-based documents clustering that raises reliability and method therefor
CN108845983B (en) Semantic evaluation method based on scene description
JP3847273B2 (en) Word classification device, word classification method, and word classification program
TW201039149A (en) Robust algorithms for video text information extraction and question-answer retrieval
Taneva et al. Gem-based entity-knowledge maintenance
Reddy et al. Obtaining description for simple images using surface realization techniques and natural language processing
CN110413985B (en) Related text segment searching method and device
Tejedor et al. Ontology-based retrieval of human speech
Browne et al. Dublin City University video track experiments for TREC 2003
CN108763229B (en) Machine translation method and device based on characteristic sentence stem extraction
Demirtas et al. Automatic categorization and summarization of documentaries
Al Harbi et al. Natural language descriptions for human activities in video streams
Liu et al. Cross-Language Information Matching Technology Based on Term Extraction
Zhang et al. A denoising framework for image caption
Ghude et al. Text Generation for Hindi
Gupta CricketLinking: linking event mentions from cricket match reports to ball entities in commentaries

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220728

Address after: 213164 5th floor, Jiangnan modern industry research institute, Wujin science and Education City, Changzhou City, Jiangsu Province

Patentee after: Jiangsu Siyuan integrated circuit and Intelligent Technology Research Institute Co.,Ltd.

Address before: 710062 No. 199 South Changan Road, Shaanxi, Xi'an

Patentee before: Shaanxi Normal University