CN108845983B - Semantic evaluation method based on scene description - Google Patents
Semantic evaluation method based on scene description Download PDFInfo
- Publication number
- CN108845983B CN108845983B CN201810429509.6A CN201810429509A CN108845983B CN 108845983 B CN108845983 B CN 108845983B CN 201810429509 A CN201810429509 A CN 201810429509A CN 108845983 B CN108845983 B CN 108845983B
- Authority
- CN
- China
- Prior art keywords
- english sentences
- sentence
- similarity
- words
- english
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
Abstract
A semantic evaluation method based on scene description comprises the steps of analyzing the part of speech of English sentences, counting the number of related words by using a synonym library, and determining the similarity between 5 English sentences and generated sentences. According to the method, the similarity of two sentences is determined by extracting keywords of 5 English sentences, associating a synonym library for each keyword and taking the repeated number of the generated sentence keywords and the repeated number of the words of the synonym library corresponding to the 5 English sentences as a reference coefficient. The method has the advantages of reasonable evaluation result, strong practicability, high operation speed and the like, and can be applied to the technical field of scene description evaluation.
Description
Technical Field
The invention belongs to the technical field of intersection of computer vision and natural language processing, and particularly relates to a method for determining similarity between reference sentences and generated sentences.
Background
Description of visual scene information in images or videos in natural language is one of the research hotspots in recent years in computer vision, and relates to the problem of form conversion from images or videos to text sentences, namely, image title and video title technology. With the continuous deepening of researchers at home and abroad in the fields of image titles and video titles, more and more scene description algorithms and evaluation indexes of scene description effects, such as BLEU, CIDER-D, ROUGE and the like, are proposed. However, the determination methods for refining these indexes are all based on the determination of n-tuple or the longest common sequence, that is, when the similarity of two sentences is judged, only the matching degree of the words with identical spelling in the two sentences to be evaluated is considered. The scene description effect in the strict sense is given, the semantic information of the objects and the relations in the scene is not utilized, and the evaluation result is particularly not suitable for the two problems of ' same semantics due to different sentence expressions ' or ' same semantics of sentence n-tuples ' but different semantics '.
Disclosure of Invention
The technical problem to be solved by the invention is to overcome the defects of the prior art and provide a semantic evaluation method based on scene description, which is reasonable in method, strong in practicability and high in operation speed.
The technical scheme adopted for solving the technical problems comprises the following steps:
(1) analyzing part of speech of English sentence
1) Selecting 5 English sentences in an original image to be described in a scene from an MSCOCO image data set, wherein the 5 English sentences are marked as S1,S2,S3,S4,S5。
2) And according to different text description generation models, carrying out scene description on the selected original image to obtain a generation sentence Sg.
3) Counting the number of keywords in the generated sentence Sg, and dividing all the keywords in the generated sentence Sg into a noun set n according to nouns, verbs, adjectives and adverbs1Verb set v1Set of adjectives and adverbs a1The number of words in each set is respectively expressed as Cn1、Cv1、Ca1。
4) Counting the number of keywords in 5 English sentences, and dividing the 5 English sentences S according to nouns, verbs, adjectives and adverbs1,S2,S3,S4,S5The key word in (1) is divided into n2 i、v2 i、a2 iSets, the number of words in each set being respectively represented as Cn2 i、Cv2 i、Ca2 i,i∈[1,5]。
(2) Counting the number of related words by using synonym library
1) Com website, respectively for 5 English sentences S1,S2,S3,S4,S5Set of keywords n2 i、v2 i、a2 iThe synonyms are inquired by the words in (1) to obtain a corresponding synonym Set-niSet-v of synonymsiSet-a of synonymsi。
2) Respectively determining a keyword noun set n of the generated sentence Sg1Verb set v1Set of adjectives and adverbs a1Chinese word and 5 English sentences S1,S2,S3,S4,S5Keyword n in (1)2 i、v2 i、a2 iSet-n of words or corresponding synonyms in a Seti、Set-vi、Set-aiThe same number of Chinese words, i.e. determining (n)1∩n2 i)∪(n1∩Set-ni)、(v1∩v2 i)∪(v1∩Set-vi)、(a1∩a2 i)∪(a1∩Set-ai) The number of elements in the three sets is Cn-syn i、Cv-syn i、Ca-syn i,i∈[1,5]。
(3) Determining the similarity between 5 English sentences and the generated sentence Sg
1) Generating sentence Sg and 5 English sentences S1,S2,S3,S4,S5Coefficient k of similarity of parts of speechiComprises the following steps:
similarity of parts of speech systemNumber kiValue range [0,1 ]]。
2) Generating sentence Sg and 5 English sentences S1,S2,S3,S4,S5Semantic similarity coefficient j ofiComprises the following steps:
semantic similarity coefficient jiValue range [0,1 ]]。
3) Determining the generation sentence Sg and 5 English sentences S1,S2,S3,S4,S5Sentence similarity siComprises the following steps:
similarity siValue range [0,1 ]]。
4) The generated sentence Sg and 5 English sentences S are determined according to the following formula1,S2,S3,S4,S5Maximum sentence similarity of (2):
SimilarSyn=max{si} (4)
in the step (2) of counting the number of related words by using the synonym library, the invention respectively counts 5 English sentences S1,S2,S3,S4,S5Set of keywords n2 i、v2 i、a2 iThe synonym query method for the words comprises the following steps: 5 English sentences S1,S2,S3,S4,S5Inputting the English sentences into a Linux system, and enabling the system to convert 5 English sentences S into nouns, verbs, adjectives and adverbs1,S2,S3,S4And dividing all the keywords in the S into 3 sets, querying the 3 sets for a synonym set through a Thesaurus.com website, and returning synonyms of the keywords in the 5 English sentences.
In the step 2) of analyzing parts of speech of the English sentence, the text description generation model is a deep network model under a coding-decoding framework.
Com internet synonym library is adopted, key words in reference sentences are expanded to synonym sets of all words according to three parts of speech, and the key words are correspondingly matched with all words in the generated sentences to be evaluated, so that the semantic level matching problem of the generated sentences and the reference sentences under the condition that the sentences are expressed differently and the semantics are the same or the sentence n-tuples are the same but the semantics are different is effectively solved. The method has the advantages of reasonable method, strong practicability, high operation speed and the like, and can be applied to the technical field of scene description evaluation.
Drawings
FIG. 1 is a schematic flow chart of example 1 of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, but the present invention is not limited to the examples.
Example 1
In this embodiment, a picture with the training set number of 000000425762 from the MSCOCO image dataset is selected, and a semantic evaluation method based on scene description is adopted for 5 english sentences of the image, and the steps are as follows:
(1) analyzing part of speech of English sentence
1) Selecting 5 English sentences and 5 English sentences in an original image to be described in a scene from an MSCOCO image data set
Is marked as S1,S2,S3,S4,S5And 5 English sentences are:
S1:A plate filled with sliced beef a bun and potatoes.
S2:Pull pork sandwich and potatoes sit on a white plate.
S3:A very meaty sandwich with uniquely shaped fries.
S4:A plate of potatoes with a pulled pork sandwich next to it.
S5:This is an image of a meal with meat,bread and potatoes.
2) according to different text description generation models, scene description is carried out on the selected original image, the text description generation model of the embodiment is a 'VGG LSTM' model under a coding-decoding framework, and the 'VGG LSTM' model is already in Donahue J, Hendricks L A, Guadrama S et al Long-term temporal recovery conditional network for visual recovery and description [ C ]. proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR),2015: 677-: a plate of food with mean and vegetables.
3) Counting the number of keywords in the generated sentence Sg, and dividing all the keywords in the generated sentence Sg into a noun set n according to nouns, verbs, adjectives and adverbs1Verb set v1Set of adjectives and adverbs a1The number of words in each set is respectively expressed as Cn1、Cv1、Ca1. Noun set n in this embodiment1Is { plate, food, mean, vegetables }, verb set v1As an empty set, an adjective and adverb set a1The number of words in each set is Cn respectively for the empty set1Is 4, Cv1Is 0, Ca1Is 0.
4) Counting the number of keywords in 5 English sentences, and dividing the 5 English sentences S according to nouns, verbs, adjectives and adverbs1,S2,S3,S4,S5The key word in (1) is divided into n2 i、v2 i、a2 iSets, the number of words in each set being respectively represented as Cn2 i、Cv2 i、Ca2 i,i∈[1,5]. N of the present embodiment2 iIs { plate, beef, bun, potatoes }, v2 iIs an empty set, a2 iIs { filtered }, the number of words in each set is respectively expressed as Cn2 iIs 4, Cv2 iIs 0, Ca2 iIs 1.
(2) Counting the number of related words by using synonym library
1) By ThesaurCom website, for 5 English sentences S respectively1,S2,S3,S4,S5Set of keywords n2 i、v2 i、a2 iThe synonym of the word query in (1), 5 English sentences S of this embodiment1,S2,S3,S4,S5Set of keywords n2 i、v2 i、a2 iThe synonym query method for the words comprises the following steps: 5 English sentences S1,S2,S3,S4,S5Inputting the English sentences into a Linux system, and enabling the system to convert 5 English sentences S into nouns, verbs, adjectives and adverbs1,S2,S3,S4All keywords in S are divided into 3 sets, 3 sets n2 iIs { plate, beef, bun, potatoes }, v2 iIs an empty set, a2 iCom website, query synonym set for 3 sets and return synonyms for keywords in these 5 english sentences.
Obtaining the corresponding synonym Set-niSet-v of synonymsiSet-a of synonymsi. Synonym Set-n of this embodimentiIs { place synonym }. U { beef synonym }. U { Bun synonym }. U { potatoes synonym }, namely { bow, platter, service, casserole, course, help, portion, service, trescher }. U { mean, arm, brawn, fly, force, heftress, light, muscle, phique, power, robustness, sine, steam, stronggth, vigor } { the W, read, doughout, muffin, business, croller, Danish, eclair, sweet roll }. { yam, rphy, plant, task, turbo, Set of thesaureograms }. Set-v synonym }. U, table, root, and U-v synonym }iAs an empty Set, Set-a synonym SetiIs { filed synonym }. U { sliced synonym }, i.e., { brimming, full, repeat, permated }. U { caree, clear, divide, hack, segment, share, shred, coast, slit, stripe, disconnect, disperver, gap, exposure, history, segment, subdivision, subinder, chiv }.
2) Respectively determining a keyword noun set n of the generated sentence Sg1Verb set v1Set of adjectives and adverbs a1Chinese word and 5 English sentences S1,S2,S3,S4,S5Keyword n in (1)2 i、v2 i、a2 iSet-n of words or corresponding synonyms in a SetiSet-v of synonymsiSet-a of synonymsiThe same number of Chinese words, i.e. determining (n)1∩n2 i)∪(n1∩Set-ni)、(v1∩v2 i)∪(v1∩Set-vi)、(a1∩a2 i)∪(a1∩Set-ai) The number of elements in the three sets is Cn-syn i、Cv-syn i、Ca-syn i,i∈[1,5]. C of the present examplen-syn iIs 2, Cv-syn iIs 0, Ca-syn iIs 0.
(3) Determining the similarity between 5 English sentences and the generated sentence Sg
1) Generating sentence Sg and 5 English sentences S1,S2,S3,S4,S5Coefficient k of similarity of parts of speechiComprises the following steps:
similarity coefficient of part of speech kiValue range [0,1 ]]. In this embodiment, the part-of-speech similarity coefficient k of 5 english sentences is obtained by equation (1) when i is 1, 2, 3, 4, 5i0.5833, 0.3333, 0.5, 0.6667, 0.9993.
2) Generating sentence Sg and 5 English sentences S1,S2,S3,S4,S5Semantic similarity coefficient j ofiComprises the following steps:
semantic similarity coefficient value range [0,1 ]]In the present embodiment, the first and second electrodes are,obtaining semantic similarity coefficients j of 5 English sentences when i is 1, 2, 3, 4 and 5 according to the formula (2)i0.5, 0.25, 0, 0.25, 0.25.
3) Determining the generation sentence Sg and 5 English sentences S1,S2,S3,S4,S5Sentence similarity siComprises the following steps:
in the formula siValue range [0,1 ]]. In this embodiment, the phrase similarity s of 5 english phrases is obtained by the following equation (3), where i is 1, 2, 3, 4, 5i0.504, 0.254, 0.025, 0.271, 0.284.
4) The generated sentence Sg and 5 English sentences S are determined according to the following formula1,S2,S3,S4,S5Maximum sentence similarity of (2):
SimilarSyn=max{si} (4)
in this embodiment, the idiom sentence Sg and 5 English sentences S are obtained according to the formula (4)1,S2,S3,S4,S5The maximum sentence similarity of (2) is 0.504.
Example 2
In this embodiment, a picture with the training set number of 000000454956 from the MSCOCO image dataset is selected, and a semantic evaluation method based on scene description is adopted for 5 english sentences of the image, and the steps are as follows:
(1) analyzing part of speech of English sentence
1) Selecting 5 English sentences in an original image to be described in a scene from an MSCOCO image data set, wherein the 5 English sentences are marked as S1,S2,S3,S4,S5And 5 English sentences are:
S1:Two bears can be seen grazing in the grass at the side of the road.
S2:Two black bears are in the grass next to the road.
S3:A couple of bears next to a road.
S4:Two black bears eating grass on the side of the road.
S5:A pair of black bears stand in the grass on the side of the road.
2) according to different text description generation models, performing scene description on a selected original image, where the text description generation model of this embodiment is a "VGG LSTM" model in an encoding-decoding framework, and the "VGG LSTM" model is the same as that in embodiment 1, and a generation sentence Sg is obtained as follows: a bear is walking through the grass near a tree.
3) Counting the number of keywords in the generated sentence Sg, and dividing all the keywords in the generated sentence Sg into a noun set n according to nouns, verbs, adjectives and adverbs1Verb set v1Set of adjectives and adverbs a1The number of words in each set is respectively expressed as Cn1、Cv1、Ca1. Noun set n in this embodiment1Is { bear, grass, tree }, verb set v1As an empty set, an adjective and adverb set a1The number of words in each set is Cn respectively for the empty set1Is 3, Cv1Is 1, Ca1Is 0.
4) Counting the number of keywords in 5 English sentences, and dividing the 5 English sentences S according to nouns, verbs, adjectives and adverbs1,S2,S3,S4,S5The key word in (1) is divided into n2 i、v2 i、a2 iSets, the number of words in each set being respectively represented as Cn2 i、Cv2 i、Ca2 i,i∈[1,5]. N of the present embodiment2 iIs { bear, grass, road }, v2 iIs { grazing, see }, a2 iGiven as { side }, the number of words in each set is denoted Cn2 iIs 3, Cv2 iIs 2, Ca2 iIs 1.
(2) Counting the number of related words by using synonym library
1) Com website, respectively for 5 English sentences S1,S2,S3,S4,S5Set of keywords n2 i、v2 i、a2 iThe method for searching synonyms for words is the same as that in embodiment 1, and a corresponding synonym Set-n is obtainediSet-v of synonymsiSet-a of synonymsi. Synonym Set-n of this embodimentiIs { bear synonym }. U { grass synonym }. U { tree synonym }, i.e., { bararian, bear, boob, brute, buffoon, cad, churl, dork, goon, lout, oaf, peadant, philistine, rube, vulgarian } { mean, hay, tutu, turf, sod, verdure, brarley, grama } { sampling, shru, wood, forest, timeber, wood, pulp, stock, seedling, softwood, hardwood, topoiary }, synonym Set-viIs { sizing synonym } { section synonym }, i.e., { bagging, biting, clamping, cropping, observing, feeding, formatting, gnawing, mapping, multicasting, unicoding, passivating, uploading } { detect, extract, identify, hook, look, notice, object, record, replay, spot, view, watch, wireless, beam, book, record, distribute, distinguishment, copy, eye, flash, gap, gawout, gazeto, glaze, trim, text, survey, record, trace, surveyiIs { side synonym }, i.e., { incidenal, lateral, oblique, potern, roundabout, secondary, skerting, subordinate, subspace, andillary, indirect, lesser, margin, not the main, off-center, sildelong, sideward, sideways, sidewise, supericial }.
2) Respectively determining a keyword noun set n of the generated sentence Sg1Verb set v1Set of adjectives and adverbs a1Chinese word and 5 English sentences S1,S2,S3,S4,S5Keyword n in (1)2 i、v2 i、a2 iSet-n of words or corresponding synonyms in a SetiSet-v of synonymsiSet-a of synonymsiThe same number of Chinese words, i.e. determining (n)1∩n2 i)∪(n1∩Set-ni)、(v1∩v2 i)∪(v1∩Set-vi)、(a1∩a2 i)∪(a1∩Set-ai) The number of elements in the three sets is Cn-syn i、Cv-syn i、Ca-syn i,i∈[1,5]. C of the present examplen-syn iIs 2, Cv-syn iIs 0, Ca-syn iIs 0.
(3) Determining the similarity between 5 English sentences and the generated sentence Sg
1) Generating sentence Sg and 5 English sentences S1,S2,S3,S4,S5Coefficient k of similarity of parts of speechiComprises the following steps:
similarity coefficient of part of speech kiValue range [0,1 ]]. In this embodiment, the part-of-speech similarity coefficient k of 5 english sentences is obtained by equation (1) when i is 1, 2, 3, 4, 5iIs 0.5, 0.6667, 0.6667, 0.6667, 0.5833.
2) Generating sentence Sg and 5 English sentences S1,S2,S3,S4,S5Semantic similarity coefficient j ofiComprises the following steps:
semantic similarity coefficient value range [0,1 ]]In this embodiment, the semantic similarity coefficient j of 5 english sentences is obtained according to equation (2), where i is 1, 2, 3, 4, 5iIs 0.5, 0.5, 0.25, 0.5, 0.5.
3) Determine the generating sentence Sg and5 English sentences S1,S2,S3,S4,S5Sentence similarity siComprises the following steps:
in the formula siValue range [0,1 ]]. In this embodiment, the phrase similarity s of 5 english phrases is obtained by the following equation (3), where i is 1, 2, 3, 4, 5i0.5, 0.508, 0.254, 0.508, 0.504.
4) The generated sentence Sg and 5 English sentences S are determined according to the following formula1,S2,S3,S4,S5Maximum sentence similarity of (2):
SimilarSyn=max{si} (4)
in this embodiment, the idiom sentence Sg and 5 English sentences S are obtained according to the formula (4)1,S2,S3,S4,S5The maximum sentence similarity of (2) is 0.508.
Example 3
In this embodiment, a training set picture from an MSCOCO image dataset is selected, and a semantic evaluation method based on scene description is adopted for 5 english sentences of an image, the steps of which are as follows:
(1) analyzing part of speech of English sentence
1) Selecting 5 English sentences in an original image to be described in a scene from an MSCOCO image data set, wherein the 5 English sentences are marked as S1,S2,S3,S4,S5And 5 English sentences are:
S1:A young girl standing on top of a tennis court.
S2:A young girl standing on top of a tennis court holding a racquet.
S3:A kid holding a racket ready to kick the ball.
S4:A kid is standing on a tennis court with a racket.
S5:A young girl playing tennis at a tennis court.
2) according to different text description generation models, performing scene description on a selected original image, where the text description generation model of this embodiment is a "VGG LSTM" model in an encoding-decoding framework, and the "VGG LSTM" model is the same as that in embodiment 1, and a generation sentence Sg is obtained as follows: a gifffe holding on top of a green field.
3) Counting the number of keywords in the generated sentence Sg, and dividing all the keywords in the generated sentence Sg into a noun set n according to nouns, verbs, adjectives and adverbs1Verb set v1Set of adjectives and adverbs a1The number of words in each set is respectively expressed as Cn1、Cv1、Ca1. Noun set n in this embodiment1Is { giraffe, top, field }, verb set v1Is { standing }, set of adjectives and adverbs a1Is { green }, the number of words in each set is Cn1Is 3, Cv1Is 1, Ca1Is 1.
4) Counting the number of keywords in 5 English sentences, and dividing the 5 English sentences S according to nouns, verbs, adjectives and adverbs1,S2,S3,S4,S5The key word in (1) is divided into n2 i、v2 i、a2 iSets, the number of words in each set being respectively represented as Cn2 i、Cv2 i、Ca2 i,i∈[1,5]. N of the present embodiment2 iIs { girl, top, tenis court }, v2 iIs { standing }, a2 iIs { you ng }, the number of words in each set is respectively expressed as Cn2 iIs 3, Cv2 iIs 1, Ca2 iIs 1.
(2) Counting the number of related words by using synonym library
1) Com website, respectively for 5 English sentences S1,S2,S3,S4,S5Set of keywords n2 i、v2 i、a2 iThe method for searching synonyms for words is the same as that in embodiment 1, and a corresponding synonym Set-n is obtainediSet-v of synonymsiSet-a of synonymsi. Synonym Set-n of this embodimentiIs { giraffe synonym }. U { top synonym }. U { field synonym }, namely { buffalo, camel, cattle, cow, der, elephant, hippopotamus, hog, horse, lama, pig, rhinoceros, swing, tapir }. { acme, apex, apege, cap, captial, ceiling, comining, climax, corrk, cover, gate, crop, crown, cumming, cup, face, failure, fine, head, height, hippoint, light, limit, maximum, meridian, pinacle, point, roolf, spire, store, surfer, surfeit, zeeci, surfeit, surficia, map, graph, map, plot, map, broadcast, survey, surficia, broadcastiIs { standing synonym }, i.e., { existing, restraining, fixed, regular, predicted, permanent }, synonym Set-aiIs { your synonym }, i.e., { bundling, inexperienced, new, youthful, adolescent, blooming, blossoming, loud, developping, fledging, green, growing, infarnent, preferor, junior, junvene, little, modeler, newborn, sink, raw, recent, tender, tendefoot, boyyish, boyliike, burgeoning, calaow, gilise, early, fresh, girish, gilike, halvelf-slope, innorant, new, noble, pubesent, unelated, undispensed, unected, unexposed, found, empty, green, empty, or empty.
2) Respectively determining a keyword noun set n of the generated sentence Sg1Verb set v1Set of adjectives and adverbs a1Chinese word and 5 English sentences S1,S2,S3,S4,S5Keyword n in (1)2 i、v2 i、a2 iSet-n of words or corresponding synonyms in a SetiSet-v of synonymsiSet-a of synonymsiThe same number of Chinese words, i.e. determining (n)1∩n2 i)∪(n1∩Set-ni)、(v1∩v2 i)∪(v1∩Set-vi)、(a1∩a2 i)∪(a1∩Set-ai) The number of elements in the three sets is Cn-syn i、Cv-syn i、Ca-syn i,i∈[1,5]. C of the present examplen-syn iIs 1, Cv-syn iIs 1, Ca-syn iIs 0.
(3) Determining the similarity between 5 English sentences and the generated sentence Sg
1) Generating sentence Sg and 5 English sentences S1,S2,S3,S4,S5Coefficient k of similarity of parts of speechiComprises the following steps:
similarity coefficient of part of speech kiValue range [0,1 ]]. In this embodiment, the part-of-speech similarity coefficient k of 5 english sentences is obtained by equation (1) when i is 1, 2, 3, 4, 5iIs 1, 0.75, 0.8333, 0.6667, 0.9333.
2) Generating sentence Sg and 5 English sentences S1,S2,S3,S4,S5Semantic similarity coefficient j ofiComprises the following steps:
semantic similarity coefficient value range [0,1 ]]In this embodiment, the semantic similarity coefficient j of 5 english sentences is obtained according to equation (2), where i is 1, 2, 3, 4, 5i0.4, 0.4, 0, 0.3, 0.
3) Determining the generation sentence Sg and 5 English sentences S1,S2,S3,S4,S5Sentence similarity siComprises the following steps:
in the formula siValue range [0,1 ]]. In this embodiment, the phrase similarity s of 5 english phrases is obtained by the following equation (3), where i is 1, 2, 3, 4, 5i0.43, 0.418, 0.042, 0.223, 0.25.
4) The generated sentence Sg and 5 English sentences S are determined according to the following formula1,S2,S3,S4,S5Maximum sentence similarity of (2):
SimilarSyn=max{si} (4)
in this embodiment, the idiom sentence Sg and 5 English sentences S are obtained according to the formula (4)1,S2,S3,S4,S5The maximum sentence similarity of (2) is 0.43.
Claims (3)
1. A semantic evaluation method based on scene description is characterized by comprising the following steps:
(1) analyzing part of speech of English sentence
1) Selecting 5 English sentences in an original image to be described in a scene from an MSCOCO image data set, wherein the 5 English sentences are marked as S1,S2,S3,S4,S5;
2) According to different text description generation models, carrying out scene description on the selected original image to obtain a generation sentence Sg;
3) counting the number of keywords in the generated sentence Sg, and dividing all the keywords in the generated sentence Sg into a noun set n according to nouns, verbs, adjectives and adverbs1Verb set v1Set of adjectives and adverbs a1The number of words in each set is respectively expressed as Cn1、Cv1、Ca1;
4) Counting the number of keywords in 5 English sentences, and dividing the 5 English sentences S according to nouns, verbs, adjectives and adverbs1,S2,S3,S4,S5The key word in (1) is divided into n2 i、v2 i、a2 iSets, the number of words in each set being respectively represented as Cn2 i、Cv2 i、Ca2 i,i∈[1,5];
(2) Counting the number of related words by using synonym library
1) Com website, respectively for 5 English sentences S1,S2,S3,S4,S5Set of keywords n2 i、v2 i、a2 iThe synonyms are inquired by the words in (1) to obtain a corresponding synonym Set-niSet-v of synonymsiSet-a of synonymsi;
2) Respectively determining a keyword noun set n of the generated sentence Sg1Verb set v1Set of adjectives and adverbs a1Chinese word and 5 English sentences S1,S2,S3,S4,S5Keyword n in (1)2 i、v2 i、a2 iSet-n of words or corresponding synonyms in a Seti、Set-vi、Set-aiThe same number of Chinese words, i.e. determining (n)1∩n2 i)∪(n1∩Set-ni)、(v1∩v2 i)∪(v1∩Set-vi)、(a1∩a2 i)∪(a1∩Set-ai) The number of elements in the three sets is Cn-syn i、Cv-syn i、Ca-syn i,i∈[1,5];
(3) Determining the similarity between 5 English sentences and the generated sentence Sg
1) Generating sentence Sg and 5 English sentences S1,S2,S3,S4,S5Coefficient k of similarity of parts of speechiComprises the following steps:
similarity coefficient of part of speech kiValue range [0,1 ]];
2) Generating sentence Sg and 5 English sentences S1,S2,S3,S4,S5Semantic similarity coefficient j ofiComprises the following steps:
semantic similarity coefficient jiValue range [0,1 ]];
3) Determining the generation sentence Sg and 5 English sentences S1,S2,S3,S4,S5Sentence similarity siComprises the following steps:
similarity siValue range [0,1 ]];
4) The generated sentence Sg and 5 English sentences S are determined according to the following formula1,S2,S3,S4,S5Maximum sentence similarity of (2):
SimilarSyn=max{si} (4)。
2. the semantic evaluation method based on scene description according to claim 1, wherein in the step (2) of counting the number of related words by using the thesaurus, said step of counting 5 English sentences S1,S2,S3,S4,S5Set of keywords n2 i、v2 i、a2 iThe synonym query method for the words comprises the following steps: 5 English sentences S1,S2,S3,S4,S5Inputting the English sentences into a Linux system, and enabling the system to convert 5 English sentences S into nouns, verbs, adjectives and adverbs1,S2,S3,S4And dividing all keywords in the S into 3 sets, and performing similarity analysis on the 3 sets through the ThesaurusAnd (5) combining the query synonym sets and returning synonyms of the keywords in the 5 English sentences.
3. The scene description-based semantic evaluation method according to claim 1, characterized in that: in the step 2) of analyzing parts of speech of the english sentence, the text description generation model is a deep network model under an encoding-decoding framework.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810429509.6A CN108845983B (en) | 2018-05-08 | 2018-05-08 | Semantic evaluation method based on scene description |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810429509.6A CN108845983B (en) | 2018-05-08 | 2018-05-08 | Semantic evaluation method based on scene description |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108845983A CN108845983A (en) | 2018-11-20 |
CN108845983B true CN108845983B (en) | 2021-11-05 |
Family
ID=64212696
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810429509.6A Active CN108845983B (en) | 2018-05-08 | 2018-05-08 | Semantic evaluation method based on scene description |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108845983B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110688916A (en) * | 2019-09-12 | 2020-01-14 | 武汉理工大学 | Video description method and device based on entity relationship extraction |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104182386A (en) * | 2013-05-27 | 2014-12-03 | 华东师范大学 | Word pair relation similarity calculation method |
CN105677634A (en) * | 2015-07-18 | 2016-06-15 | 孙维国 | Method for extracting sentences with similar meanings and standard grammar from academic documents |
CN107480144A (en) * | 2017-08-03 | 2017-12-15 | 中国人民大学 | Possess the image natural language description generation method and device across language learning ability |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9082040B2 (en) * | 2011-05-13 | 2015-07-14 | Microsoft Technology Licensing, Llc | Identifying visual contextual synonyms |
-
2018
- 2018-05-08 CN CN201810429509.6A patent/CN108845983B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104182386A (en) * | 2013-05-27 | 2014-12-03 | 华东师范大学 | Word pair relation similarity calculation method |
CN105677634A (en) * | 2015-07-18 | 2016-06-15 | 孙维国 | Method for extracting sentences with similar meanings and standard grammar from academic documents |
CN107480144A (en) * | 2017-08-03 | 2017-12-15 | 中国人民大学 | Possess the image natural language description generation method and device across language learning ability |
Non-Patent Citations (6)
Title |
---|
《Instance-aware image and sentence matching with selective multimodal LSTM》;Huang Y 等,;《 Computer Vision and Pattern Recognition》;20161231;7254-7262 * |
An automatic metric for MT evaluation with improved correlation with human judgments;Banerjee S 等,;《he 43rd Annual Meeting on Association for Computational Linguistics》;20051231;65-72 * |
Exploring Nearest Neighbor Approaches for Image Captioning;Jacob Devlin 等;《https://arxiv.org/abs/1505.04467》;20150717;1-6 * |
Re-evaluating Automatic Metrics for Image Captioning;Mert Kilickaya 等;《https://arxiv.org/abs/1612.07600》;20161222;1-11 * |
Semantic propositional image caption evaluation;Anderson P 等;《Computer Vision》;20161231;382-398 * |
融合图像场景及物体先验知识的图像描述生成模型;汤鹏杰等;《中国图象图形学报》;20170916(第09期);1251-1260 * |
Also Published As
Publication number | Publication date |
---|---|
CN108845983A (en) | 2018-11-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Krishna et al. | Visual genome: Connecting language and vision using crowdsourced dense image annotations | |
US10430689B2 (en) | Training a classifier algorithm used for automatically generating tags to be applied to images | |
Ghoshal et al. | Hidden Markov models for automatic annotation and content-based retrieval of images and video | |
Divvala et al. | Learning everything about anything: Webly-supervised visual concept learning | |
Le et al. | Tuhoi: Trento universal human object interaction dataset | |
Agirre et al. | Unsupervised WSD based on automatically retrieved examples: The importance of bias | |
Rui et al. | Bipartite graph reinforcement model for web image annotation | |
Larkey et al. | Language-specific models in multilingual topic tracking | |
CN104408115B (en) | The heterogeneous resource based on semantic interlink recommends method and apparatus on a kind of TV platform | |
KR20090017830A (en) | Apparatus for providing aspect-based documents clustering that raises reliability and method therefor | |
CN108845983B (en) | Semantic evaluation method based on scene description | |
JP3847273B2 (en) | Word classification device, word classification method, and word classification program | |
TW201039149A (en) | Robust algorithms for video text information extraction and question-answer retrieval | |
Taneva et al. | Gem-based entity-knowledge maintenance | |
Reddy et al. | Obtaining description for simple images using surface realization techniques and natural language processing | |
CN110413985B (en) | Related text segment searching method and device | |
Tejedor et al. | Ontology-based retrieval of human speech | |
Browne et al. | Dublin City University video track experiments for TREC 2003 | |
CN108763229B (en) | Machine translation method and device based on characteristic sentence stem extraction | |
Demirtas et al. | Automatic categorization and summarization of documentaries | |
Al Harbi et al. | Natural language descriptions for human activities in video streams | |
Liu et al. | Cross-Language Information Matching Technology Based on Term Extraction | |
Zhang et al. | A denoising framework for image caption | |
Ghude et al. | Text Generation for Hindi | |
Gupta | CricketLinking: linking event mentions from cricket match reports to ball entities in commentaries |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220728 Address after: 213164 5th floor, Jiangnan modern industry research institute, Wujin science and Education City, Changzhou City, Jiangsu Province Patentee after: Jiangsu Siyuan integrated circuit and Intelligent Technology Research Institute Co.,Ltd. Address before: 710062 No. 199 South Changan Road, Shaanxi, Xi'an Patentee before: Shaanxi Normal University |