CN108845983B

CN108845983B - Semantic evaluation method based on scene description

Info

Publication number: CN108845983B
Application number: CN201810429509.6A
Authority: CN
Inventors: 马苗; 王伯龙; 武杰; 郭敏; 吴琦
Original assignee: Shaanxi Normal University
Current assignee: Jiangsu Siyuan Integrated Circuit And Intelligent Technology Research Institute Co ltd
Priority date: 2018-05-08
Filing date: 2018-05-08
Publication date: 2021-11-05
Anticipated expiration: 2038-05-08
Also published as: CN108845983A

Abstract

A semantic evaluation method based on scene description comprises the steps of analyzing the part of speech of English sentences, counting the number of related words by using a synonym library, and determining the similarity between 5 English sentences and generated sentences. According to the method, the similarity of two sentences is determined by extracting keywords of 5 English sentences, associating a synonym library for each keyword and taking the repeated number of the generated sentence keywords and the repeated number of the words of the synonym library corresponding to the 5 English sentences as a reference coefficient. The method has the advantages of reasonable evaluation result, strong practicability, high operation speed and the like, and can be applied to the technical field of scene description evaluation.

Description

Semantic evaluation method based on scene description

Technical Field

The invention belongs to the technical field of intersection of computer vision and natural language processing, and particularly relates to a method for determining similarity between reference sentences and generated sentences.

Background

Description of visual scene information in images or videos in natural language is one of the research hotspots in recent years in computer vision, and relates to the problem of form conversion from images or videos to text sentences, namely, image title and video title technology. With the continuous deepening of researchers at home and abroad in the fields of image titles and video titles, more and more scene description algorithms and evaluation indexes of scene description effects, such as BLEU, CIDER-D, ROUGE and the like, are proposed. However, the determination methods for refining these indexes are all based on the determination of n-tuple or the longest common sequence, that is, when the similarity of two sentences is judged, only the matching degree of the words with identical spelling in the two sentences to be evaluated is considered. The scene description effect in the strict sense is given, the semantic information of the objects and the relations in the scene is not utilized, and the evaluation result is particularly not suitable for the two problems of ' same semantics due to different sentence expressions ' or ' same semantics of sentence n-tuples ' but different semantics '.

Disclosure of Invention

The technical problem to be solved by the invention is to overcome the defects of the prior art and provide a semantic evaluation method based on scene description, which is reasonable in method, strong in practicability and high in operation speed.

The technical scheme adopted for solving the technical problems comprises the following steps:

(1) analyzing part of speech of English sentence

1) Selecting 5 English sentences in an original image to be described in a scene from an MSCOCO image data set, wherein the 5 English sentences are marked as S₁,S₂,S₃,S₄,S₅。

2) And according to different text description generation models, carrying out scene description on the selected original image to obtain a generation sentence Sg.

3) Counting the number of keywords in the generated sentence Sg, and dividing all the keywords in the generated sentence Sg into a noun set n according to nouns, verbs, adjectives and adverbs₁Verb set v₁Set of adjectives and adverbs a₁The number of words in each set is respectively expressed as Cn₁、Cv₁、Ca₁。

4) Counting the number of keywords in 5 English sentences, and dividing the 5 English sentences S according to nouns, verbs, adjectives and adverbs₁,S₂,S₃,S₄,S₅The key word in (1) is divided into n₂ ⁱ、v₂ ⁱ、a₂ ⁱSets, the number of words in each set being respectively represented as Cn₂ ⁱ、Cv₂ ⁱ、Ca₂ ⁱ，i∈[1,5]。

(2) Counting the number of related words by using synonym library

1) Com website, respectively for 5 English sentences S₁,S₂,S₃,S₄,S₅Set of keywords n₂ ⁱ、v₂ ⁱ、a₂ ⁱThe synonyms are inquired by the words in (1) to obtain a corresponding synonym Set-nⁱSet-v of synonymsⁱSet-a of synonymsⁱ。

2) Respectively determining a keyword noun set n of the generated sentence Sg₁Verb set v₁Set of adjectives and adverbs a₁Chinese word and 5 English sentences S₁,S₂,S₃,S₄,S₅Keyword n in (1)₂ ⁱ、v₂ ⁱ、a₂ ⁱSet-n of words or corresponding synonyms in a Setⁱ、Set-vⁱ、Set-aⁱThe same number of Chinese words, i.e. determining (n)₁∩n₂ ⁱ)∪(n₁∩Set-nⁱ)、(v₁∩v₂ ⁱ)∪(v₁∩Set-vⁱ)、(a₁∩a₂ ⁱ)∪(a₁∩Set-aⁱ) The number of elements in the three sets is C_n-syn ⁱ、C_v-syn ⁱ、C_a-syn ⁱ，i∈[1,5]。

(3) Determining the similarity between 5 English sentences and the generated sentence Sg

1) Generating sentence Sg and 5 English sentences S₁,S₂,S₃,S₄,S₅Coefficient k of similarity of parts of speech_iComprises the following steps:

similarity of parts of speech systemNumber k_iValue range [0,1 ]]。

2) Generating sentence Sg and 5 English sentences S₁,S₂,S₃,S₄,S₅Semantic similarity coefficient j of_iComprises the following steps:

semantic similarity coefficient j_iValue range [0,1 ]]。

3) Determining the generation sentence Sg and 5 English sentences S₁,S₂,S₃,S₄,S₅Sentence similarity s_iComprises the following steps:

similarity s_iValue range [0,1 ]]。

4) The generated sentence Sg and 5 English sentences S are determined according to the following formula₁,S₂,S₃,S₄,S₅Maximum sentence similarity of (2):

SimilarSyn＝max{s_i} (4)

in the step (2) of counting the number of related words by using the synonym library, the invention respectively counts 5 English sentences S₁,S₂,S₃,S₄,S₅Set of keywords n₂ ⁱ、v₂ ⁱ、a₂ ⁱThe synonym query method for the words comprises the following steps: 5 English sentences S₁,S₂,S₃,S₄,S₅Inputting the English sentences into a Linux system, and enabling the system to convert 5 English sentences S into nouns, verbs, adjectives and adverbs₁,S₂,S₃,S₄And dividing all the keywords in the S into 3 sets, querying the 3 sets for a synonym set through a Thesaurus.com website, and returning synonyms of the keywords in the 5 English sentences.

In the step 2) of analyzing parts of speech of the English sentence, the text description generation model is a deep network model under a coding-decoding framework.

Com internet synonym library is adopted, key words in reference sentences are expanded to synonym sets of all words according to three parts of speech, and the key words are correspondingly matched with all words in the generated sentences to be evaluated, so that the semantic level matching problem of the generated sentences and the reference sentences under the condition that the sentences are expressed differently and the semantics are the same or the sentence n-tuples are the same but the semantics are different is effectively solved. The method has the advantages of reasonable method, strong practicability, high operation speed and the like, and can be applied to the technical field of scene description evaluation.

Drawings

FIG. 1 is a schematic flow chart of example 1 of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, but the present invention is not limited to the examples.

Example 1

In this embodiment, a picture with the training set number of 000000425762 from the MSCOCO image dataset is selected, and a semantic evaluation method based on scene description is adopted for 5 english sentences of the image, and the steps are as follows:

(1) analyzing part of speech of English sentence

1) Selecting 5 English sentences and 5 English sentences in an original image to be described in a scene from an MSCOCO image data set

Is marked as S₁,S₂,S₃,S₄,S₅And 5 English sentences are:

S₁：A plate filled with sliced beef a bun and potatoes.

S₂：Pull pork sandwich and potatoes sit on a white plate.

S₃：A very meaty sandwich with uniquely shaped fries.

S₄：A plate of potatoes with a pulled pork sandwich next to it.

S₅：This is an image of a meal with meat,bread and potatoes.

2) according to different text description generation models, scene description is carried out on the selected original image, the text description generation model of the embodiment is a 'VGG LSTM' model under a coding-decoding framework, and the 'VGG LSTM' model is already in Donahue J, Hendricks L A, Guadrama S et al Long-term temporal recovery conditional network for visual recovery and description [ C ]. proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR),2015: 677-: a plate of food with mean and vegetables.

3) Counting the number of keywords in the generated sentence Sg, and dividing all the keywords in the generated sentence Sg into a noun set n according to nouns, verbs, adjectives and adverbs₁Verb set v₁Set of adjectives and adverbs a₁The number of words in each set is respectively expressed as Cn₁、Cv₁、Ca₁. Noun set n in this embodiment₁Is { plate, food, mean, vegetables }, verb set v₁As an empty set, an adjective and adverb set a₁The number of words in each set is Cn respectively for the empty set₁Is 4, Cv₁Is 0, Ca₁Is 0.

4) Counting the number of keywords in 5 English sentences, and dividing the 5 English sentences S according to nouns, verbs, adjectives and adverbs₁,S₂,S₃,S₄,S₅The key word in (1) is divided into n₂ ⁱ、v₂ ⁱ、a₂ ⁱSets, the number of words in each set being respectively represented as Cn₂ ⁱ、Cv₂ ⁱ、Ca₂ ⁱ，i∈[1,5]. N of the present embodiment₂ ⁱIs { plate, beef, bun, potatoes }, v₂ ⁱIs an empty set, a₂ ⁱIs { filtered }, the number of words in each set is respectively expressed as Cn₂ ⁱIs 4, Cv₂ ⁱIs 0, Ca₂ ⁱIs 1.

(2) Counting the number of related words by using synonym library

1) By ThesaurCom website, for 5 English sentences S respectively₁,S₂,S₃,S₄,S₅Set of keywords n₂ ⁱ、v₂ ⁱ、a₂ ⁱThe synonym of the word query in (1), 5 English sentences S of this embodiment₁,S₂,S₃,S₄,S₅Set of keywords n₂ ⁱ、v₂ ⁱ、a₂ ⁱThe synonym query method for the words comprises the following steps: 5 English sentences S₁,S₂,S₃,S₄,S₅Inputting the English sentences into a Linux system, and enabling the system to convert 5 English sentences S into nouns, verbs, adjectives and adverbs₁,S₂,S₃,S₄All keywords in S are divided into 3 sets, 3 sets n₂ ⁱIs { plate, beef, bun, potatoes }, v₂ ⁱIs an empty set, a₂ ⁱCom website, query synonym set for 3 sets and return synonyms for keywords in these 5 english sentences.

Obtaining the corresponding synonym Set-nⁱSet-v of synonymsⁱSet-a of synonymsⁱ. Synonym Set-n of this embodimentⁱIs { place synonym }. U { beef synonym }. U { Bun synonym }. U { potatoes synonym }, namely { bow, platter, service, casserole, course, help, portion, service, trescher }. U { mean, arm, brawn, fly, force, heftress, light, muscle, phique, power, robustness, sine, steam, stronggth, vigor } { the W, read, doughout, muffin, business, croller, Danish, eclair, sweet roll }. { yam, rphy, plant, task, turbo, Set of thesaureograms }. Set-v synonym }. U, table, root, and U-v synonym }ⁱAs an empty Set, Set-a synonym SetⁱIs { filed synonym }. U { sliced synonym }, i.e., { brimming, full, repeat, permated }. U { caree, clear, divide, hack, segment, share, shred, coast, slit, stripe, disconnect, disperver, gap, exposure, history, segment, subdivision, subinder, chiv }.

2) Respectively determining a keyword noun set n of the generated sentence Sg₁Verb set v₁Set of adjectives and adverbs a₁Chinese word and 5 English sentences S₁,S₂,S₃,S₄,S₅Keyword n in (1)₂ ⁱ、v₂ ⁱ、a₂ ⁱSet-n of words or corresponding synonyms in a SetⁱSet-v of synonymsⁱSet-a of synonymsⁱThe same number of Chinese words, i.e. determining (n)₁∩n₂ ⁱ)∪(n₁∩Set-nⁱ)、(v₁∩v₂ ⁱ)∪(v₁∩Set-vⁱ)、(a₁∩a₂ ⁱ)∪(a₁∩Set-aⁱ) The number of elements in the three sets is C_n-syn ⁱ、C_v-syn ⁱ、C_a-syn ⁱ，i∈[1,5]. C of the present example_n-syn ⁱIs 2, C_v-syn ⁱIs 0, C_a-syn ⁱIs 0.

similarity coefficient of part of speech k_iValue range [0,1 ]]. In this embodiment, the part-of-speech similarity coefficient k of 5 english sentences is obtained by equation (1) when i is 1, 2, 3, 4, 5_i0.5833, 0.3333, 0.5, 0.6667, 0.9993.

semantic similarity coefficient value range [0,1 ]]In the present embodiment, the first and second electrodes are,obtaining semantic similarity coefficients j of 5 English sentences when i is 1, 2, 3, 4 and 5 according to the formula (2)_i0.5, 0.25, 0, 0.25, 0.25.

in the formula s_iValue range [0,1 ]]. In this embodiment, the phrase similarity s of 5 english phrases is obtained by the following equation (3), where i is 1, 2, 3, 4, 5_i0.504, 0.254, 0.025, 0.271, 0.284.

SimilarSyn＝max{s_i} (4)

in this embodiment, the idiom sentence Sg and 5 English sentences S are obtained according to the formula (4)₁,S₂,S₃,S₄,S₅The maximum sentence similarity of (2) is 0.504.

Example 2

In this embodiment, a picture with the training set number of 000000454956 from the MSCOCO image dataset is selected, and a semantic evaluation method based on scene description is adopted for 5 english sentences of the image, and the steps are as follows:

(1) analyzing part of speech of English sentence

1) Selecting 5 English sentences in an original image to be described in a scene from an MSCOCO image data set, wherein the 5 English sentences are marked as S₁,S₂,S₃,S₄,S₅And 5 English sentences are:

S₁：Two bears can be seen grazing in the grass at the side of the road.

S₂：Two black bears are in the grass next to the road.

S₃：A couple of bears next to a road.

S₄：Two black bears eating grass on the side of the road.

S₅：A pair of black bears stand in the grass on the side of the road.

2) according to different text description generation models, performing scene description on a selected original image, where the text description generation model of this embodiment is a "VGG LSTM" model in an encoding-decoding framework, and the "VGG LSTM" model is the same as that in embodiment 1, and a generation sentence Sg is obtained as follows: a bear is walking through the grass near a tree.

3) Counting the number of keywords in the generated sentence Sg, and dividing all the keywords in the generated sentence Sg into a noun set n according to nouns, verbs, adjectives and adverbs₁Verb set v₁Set of adjectives and adverbs a₁The number of words in each set is respectively expressed as Cn₁、Cv₁、Ca₁. Noun set n in this embodiment₁Is { bear, grass, tree }, verb set v₁As an empty set, an adjective and adverb set a₁The number of words in each set is Cn respectively for the empty set₁Is 3, Cv₁Is 1, Ca₁Is 0.

4) Counting the number of keywords in 5 English sentences, and dividing the 5 English sentences S according to nouns, verbs, adjectives and adverbs₁,S₂,S₃,S₄,S₅The key word in (1) is divided into n₂ ⁱ、v₂ ⁱ、a₂ ⁱSets, the number of words in each set being respectively represented as Cn₂ ⁱ、Cv₂ ⁱ、Ca₂ ⁱ，i∈[1,5]. N of the present embodiment₂ ⁱIs { bear, grass, road }, v₂ ⁱIs { grazing, see }, a₂ ⁱGiven as { side }, the number of words in each set is denoted Cn₂ ⁱIs 3, Cv₂ ⁱIs 2, Ca₂ ⁱIs 1.

(2) Counting the number of related words by using synonym library

1) Com website, respectively for 5 English sentences S₁,S₂,S₃,S₄,S₅Set of keywords n₂ ⁱ、v₂ ⁱ、a₂ ⁱThe method for searching synonyms for words is the same as that in embodiment 1, and a corresponding synonym Set-n is obtainedⁱSet-v of synonymsⁱSet-a of synonymsⁱ. Synonym Set-n of this embodimentⁱIs { bear synonym }. U { grass synonym }. U { tree synonym }, i.e., { bararian, bear, boob, brute, buffoon, cad, churl, dork, goon, lout, oaf, peadant, philistine, rube, vulgarian } { mean, hay, tutu, turf, sod, verdure, brarley, grama } { sampling, shru, wood, forest, timeber, wood, pulp, stock, seedling, softwood, hardwood, topoiary }, synonym Set-vⁱIs { sizing synonym } { section synonym }, i.e., { bagging, biting, clamping, cropping, observing, feeding, formatting, gnawing, mapping, multicasting, unicoding, passivating, uploading } { detect, extract, identify, hook, look, notice, object, record, replay, spot, view, watch, wireless, beam, book, record, distribute, distinguishment, copy, eye, flash, gap, gawout, gazeto, glaze, trim, text, survey, record, trace, surveyⁱIs { side synonym }, i.e., { incidenal, lateral, oblique, potern, roundabout, secondary, skerting, subordinate, subspace, andillary, indirect, lesser, margin, not the main, off-center, sildelong, sideward, sideways, sidewise, supericial }.

similarity coefficient of part of speech k_iValue range [0,1 ]]. In this embodiment, the part-of-speech similarity coefficient k of 5 english sentences is obtained by equation (1) when i is 1, 2, 3, 4, 5_iIs 0.5, 0.6667, 0.6667, 0.6667, 0.5833.

semantic similarity coefficient value range [0,1 ]]In this embodiment, the semantic similarity coefficient j of 5 english sentences is obtained according to equation (2), where i is 1, 2, 3, 4, 5_iIs 0.5, 0.5, 0.25, 0.5, 0.5.

3) Determine the generating sentence Sg and5 English sentences S₁,S₂,S₃,S₄,S₅Sentence similarity s_iComprises the following steps:

in the formula s_iValue range [0,1 ]]. In this embodiment, the phrase similarity s of 5 english phrases is obtained by the following equation (3), where i is 1, 2, 3, 4, 5_i0.5, 0.508, 0.254, 0.508, 0.504.

SimilarSyn＝max{s_i} (4)

in this embodiment, the idiom sentence Sg and 5 English sentences S are obtained according to the formula (4)₁,S₂,S₃,S₄,S₅The maximum sentence similarity of (2) is 0.508.

Example 3

In this embodiment, a training set picture from an MSCOCO image dataset is selected, and a semantic evaluation method based on scene description is adopted for 5 english sentences of an image, the steps of which are as follows:

(1) analyzing part of speech of English sentence

S₁：A young girl standing on top of a tennis court.

S₂：A young girl standing on top of a tennis court holding a racquet.

S₃：A kid holding a racket ready to kick the ball.

S₄：A kid is standing on a tennis court with a racket.

S₅：A young girl playing tennis at a tennis court.

2) according to different text description generation models, performing scene description on a selected original image, where the text description generation model of this embodiment is a "VGG LSTM" model in an encoding-decoding framework, and the "VGG LSTM" model is the same as that in embodiment 1, and a generation sentence Sg is obtained as follows: a gifffe holding on top of a green field.

3) Counting the number of keywords in the generated sentence Sg, and dividing all the keywords in the generated sentence Sg into a noun set n according to nouns, verbs, adjectives and adverbs₁Verb set v₁Set of adjectives and adverbs a₁The number of words in each set is respectively expressed as Cn₁、Cv₁、Ca₁. Noun set n in this embodiment₁Is { giraffe, top, field }, verb set v₁Is { standing }, set of adjectives and adverbs a₁Is { green }, the number of words in each set is Cn₁Is 3, Cv₁Is 1, Ca₁Is 1.

4) Counting the number of keywords in 5 English sentences, and dividing the 5 English sentences S according to nouns, verbs, adjectives and adverbs₁,S₂,S₃,S₄,S₅The key word in (1) is divided into n₂ ⁱ、v₂ ⁱ、a₂ ⁱSets, the number of words in each set being respectively represented as Cn₂ ⁱ、Cv₂ ⁱ、Ca₂ ⁱ，i∈[1,5]. N of the present embodiment₂ ⁱIs { girl, top, tenis court }, v₂ ⁱIs { standing }, a₂ ⁱIs { you ng }, the number of words in each set is respectively expressed as Cn₂ ⁱIs 3, Cv₂ ⁱIs 1, Ca₂ ⁱIs 1.

(2) Counting the number of related words by using synonym library

1) Com website, respectively for 5 English sentences S₁,S₂,S₃,S₄,S₅Set of keywords n₂ ⁱ、v₂ ⁱ、a₂ ⁱThe method for searching synonyms for words is the same as that in embodiment 1, and a corresponding synonym Set-n is obtainedⁱSet-v of synonymsⁱSet-a of synonymsⁱ. Synonym Set-n of this embodimentⁱIs { giraffe synonym }. U { top synonym }. U { field synonym }, namely { buffalo, camel, cattle, cow, der, elephant, hippopotamus, hog, horse, lama, pig, rhinoceros, swing, tapir }. { acme, apex, apege, cap, captial, ceiling, comining, climax, corrk, cover, gate, crop, crown, cumming, cup, face, failure, fine, head, height, hippoint, light, limit, maximum, meridian, pinacle, point, roolf, spire, store, surfer, surfeit, zeeci, surfeit, surficia, map, graph, map, plot, map, broadcast, survey, surficia, broadcastⁱIs { standing synonym }, i.e., { existing, restraining, fixed, regular, predicted, permanent }, synonym Set-aⁱIs { your synonym }, i.e., { bundling, inexperienced, new, youthful, adolescent, blooming, blossoming, loud, developping, fledging, green, growing, infarnent, preferor, junior, junvene, little, modeler, newborn, sink, raw, recent, tender, tendefoot, boyyish, boyliike, burgeoning, calaow, gilise, early, fresh, girish, gilike, halvelf-slope, innorant, new, noble, pubesent, unelated, undispensed, unected, unexposed, found, empty, green, empty, or empty.

2) Respectively determining a keyword noun set n of the generated sentence Sg₁Verb set v₁Set of adjectives and adverbs a₁Chinese word and 5 English sentences S₁,S₂,S₃,S₄,S₅Keyword n in (1)₂ ⁱ、v₂ ⁱ、a₂ ⁱSet-n of words or corresponding synonyms in a SetⁱSet-v of synonymsⁱSet-a of synonymsⁱThe same number of Chinese words, i.e. determining (n)₁∩n₂ ⁱ)∪(n₁∩Set-nⁱ)、(v₁∩v₂ ⁱ)∪(v₁∩Set-vⁱ)、(a₁∩a₂ ⁱ)∪(a₁∩Set-aⁱ) The number of elements in the three sets is C_n-syn ⁱ、C_v-syn ⁱ、C_a-syn ⁱ，i∈[1,5]. C of the present example_n-syn ⁱIs 1, C_v-syn ⁱIs 1, C_a-syn ⁱIs 0.

similarity coefficient of part of speech k_iValue range [0,1 ]]. In this embodiment, the part-of-speech similarity coefficient k of 5 english sentences is obtained by equation (1) when i is 1, 2, 3, 4, 5_iIs 1, 0.75, 0.8333, 0.6667, 0.9333.

semantic similarity coefficient value range [0,1 ]]In this embodiment, the semantic similarity coefficient j of 5 english sentences is obtained according to equation (2), where i is 1, 2, 3, 4, 5_i0.4, 0.4, 0, 0.3, 0.

in the formula s_iValue range [0,1 ]]. In this embodiment, the phrase similarity s of 5 english phrases is obtained by the following equation (3), where i is 1, 2, 3, 4, 5_i0.43, 0.418, 0.042, 0.223, 0.25.

SimilarSyn＝max{s_i} (4)

in this embodiment, the idiom sentence Sg and 5 English sentences S are obtained according to the formula (4)₁,S₂,S₃,S₄,S₅The maximum sentence similarity of (2) is 0.43.

Claims

1. A semantic evaluation method based on scene description is characterized by comprising the following steps:

(1) analyzing part of speech of English sentence

1) Selecting 5 English sentences in an original image to be described in a scene from an MSCOCO image data set, wherein the 5 English sentences are marked as S₁,S₂,S₃,S₄,S₅；

2) According to different text description generation models, carrying out scene description on the selected original image to obtain a generation sentence Sg;

3) counting the number of keywords in the generated sentence Sg, and dividing all the keywords in the generated sentence Sg into a noun set n according to nouns, verbs, adjectives and adverbs₁Verb set v₁Set of adjectives and adverbs a₁The number of words in each set is respectively expressed as Cn₁、Cv₁、Ca₁；

4) Counting the number of keywords in 5 English sentences, and dividing the 5 English sentences S according to nouns, verbs, adjectives and adverbs₁,S₂,S₃,S₄,S₅The key word in (1) is divided into n₂ ⁱ、v₂ ⁱ、a₂ ⁱSets, the number of words in each set being respectively represented as Cn₂ ⁱ、Cv₂ ⁱ、Ca₂ ⁱ，i∈[1,5]；

(2) Counting the number of related words by using synonym library

1) Com website, respectively for 5 English sentences S₁,S₂,S₃,S₄,S₅Set of keywords n₂ ⁱ、v₂ ⁱ、a₂ ⁱThe synonyms are inquired by the words in (1) to obtain a corresponding synonym Set-nⁱSet-v of synonymsⁱSet-a of synonymsⁱ；

2) Respectively determining a keyword noun set n of the generated sentence Sg₁Verb set v₁Set of adjectives and adverbs a₁Chinese word and 5 English sentences S₁,S₂,S₃,S₄,S₅Keyword n in (1)₂ ⁱ、v₂ ⁱ、a₂ ⁱSet-n of words or corresponding synonyms in a Setⁱ、Set-vⁱ、Set-aⁱThe same number of Chinese words, i.e. determining (n)₁∩n₂ ⁱ)∪(n₁∩Set-nⁱ)、(v₁∩v₂ ⁱ)∪(v₁∩Set-vⁱ)、(a₁∩a₂ ⁱ)∪(a₁∩Set-aⁱ) The number of elements in the three sets is C_n-syn ⁱ、C_v-syn ⁱ、C_a-syn ⁱ，i∈[1,5]；

similarity coefficient of part of speech k_iValue range [0,1 ]]；

semantic similarity coefficient j_iValue range [0,1 ]]；

similarity s_iValue range [0,1 ]]；

SimilarSyn＝max{s_i} (4)。

2. the semantic evaluation method based on scene description according to claim 1, wherein in the step (2) of counting the number of related words by using the thesaurus, said step of counting 5 English sentences S₁,S₂,S₃,S₄,S₅Set of keywords n₂ ⁱ、v₂ ⁱ、a₂ ⁱThe synonym query method for the words comprises the following steps: 5 English sentences S₁,S₂,S₃,S₄,S₅Inputting the English sentences into a Linux system, and enabling the system to convert 5 English sentences S into nouns, verbs, adjectives and adverbs₁,S₂,S₃,S₄And dividing all keywords in the S into 3 sets, and performing similarity analysis on the 3 sets through the ThesaurusAnd (5) combining the query synonym sets and returning synonyms of the keywords in the 5 English sentences.

3. The scene description-based semantic evaluation method according to claim 1, characterized in that: in the step 2) of analyzing parts of speech of the english sentence, the text description generation model is a deep network model under an encoding-decoding framework.