CN114662456A

CN114662456A - Image ancient poem generation method based on Faster R-convolutional neural network detection model

Info

Publication number: CN114662456A
Application number: CN202210273907.XA
Authority: CN
Inventors: 谈启雷; 吴晓军; 杨红红; 张玉梅
Original assignee: Shaanxi Normal University
Current assignee: Shaanxi Normal University
Priority date: 2022-03-19
Filing date: 2022-03-19
Publication date: 2022-06-24

Abstract

An image ancient poem generation method based on a Faster R-convolutional neural network detection model comprises the steps of collecting ancient poem elephant word pictures, preprocessing the ancient poem elephant word pictures, constructing an ancient poem elephant word image data set, inputting user images, extracting image keyword features, extracting visual image features, constructing an ancient poem text generation model, judging ancient poem emotional tendency and displaying and generating ancient poems. According to the image ancient poem image data collection method, the image ancient poem image data collection is collected, trained and constructed, so that the accuracy rate, the detection speed and the generation speed of image detection during image ancient poem generation are improved. Image keyword characteristics and image visual characteristics are combined in the ancient poetry generating network, and the theme consistency of the pictures and the ancient poetry is improved. Adopt the emotional tendency who judges the generation ancient poetry in image ancient poetry generates, richened image ancient poetry and generated the function, promoted image ancient poetry and generated the quality. The method has the advantages of high generation speed, high consistency of the image and the ancient poetry theme and the like, and can be used in the technical field of image ancient poetry generation.

Description

Image ancient poem generation method based on Faster R-convolutional neural network detection model

Technical Field

The invention belongs to the technical field of computers, and particularly relates to computer image target detection, natural language generation and text emotion classification.

Background

Ancient poetry generation is an important and challenging research task in the field of natural language generation, aiming at enabling computers to create high-quality poetry like poetry. The automatic poetry generation study underwent a transition from the traditional machine translation approach to the deep learning text input approach. However, the mainstream text input method based on deep learning has a great problem: first, the input of a small number of keywords, limited to the text expression ability of the input text, may not sufficiently express the user's writing ideas and emotional fluctuations in the generated poetry. Second, current image data set has the problem that the detection accuracy is low, detection speed is slow when facing image ancient poetry to produce, lacks the proprietary image data set of ancient poetry ideogram. Third, there is a lack of emotional tendency judgment for ancient poems. Compared with keywords, the pictures contain richer semantic information and visual information, are more suitable for being used as input for generating ancient poems and can more fully express the writing intention of the user. And constructing a special image word and image data set for the ancient poems has a remarkable effect of improving the generation quality of the ancient poems.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides an image ancient poem generation method based on a Faster R-convolution neural network detection model, which has high retrieval accuracy, high generation speed and good generation quality.

The technical scheme adopted for solving the technical problems comprises the following steps:

(1) collecting picture of ancient poetry elephant words

Based on 100 image words common to the ancient poems, crawling 100 pictures corresponding to the image words from the Internet image data by a crawler method to obtain 10000 pictures of the ancient poems and image words.

(2) Preprocessing ancient poetry elephant word picture

The size unification processing is carried out on the collected image word pictures, the detail gray level processing is carried out on the pictures by adopting a piecewise linear gray level enhancement method, the image contrast is enhanced, and unnecessary image details are compressed.

(3) Construction of image data set of ancient poetry elephant words

Labeling the preprocessed pictures by using a Pascal visual object class method, sequentially labeling image word labels contained in the pictures, outputting extensible markup language files corresponding to the pictures, and performing 8-step analysis on 10000 collected pictures and 10000 extensible markup language files corresponding to the pictures: 2, carrying out data set segmentation; and (4) training by adopting a Faster R-convolutional neural network to obtain an ancient poetry image data set.

(4) Inputting user images

And a single picture which needs to be poem is selected as user input, and the size of the picture is not required.

(5) Extracting image keyword features

Extracting high latitude semantic features in an input picture of a user by using a convolutional neural network, predicting probability distribution of image tags by using a Softmax function, and determining predicted tag distribution pi according to an formula:

Π＝Soft max(f(I))

wherein I represents an input picture, f represents a convolutional neural network calculation, j represents the jth component in pi, and pi_j(I) Indicates the probability of the jth tag contained in the picture I, f_n(I) And (3) representing the nth label score of the picture I after the calculation of the convolutional neural network, wherein the value range of j is 0-9.

The loss function J of the keyword extraction network is:

where Ψ represents the number of samples.

Setting a probability threshold, selecting a label with a high probability threshold as a label of the sample, namely a keyword of the image, wherein a keyword set K is represented as:

K＝{k₁,k₂,……,k_N}

and N represents the number of the extracted keywords, and the value range of N is 0-9.

(6) Extracting visual features of an image

And a set V of visual characteristic vectors extracted from the picture, wherein each vector contains local visual coding information of different positions of the picture, and the weight of each word when the ancient poetry is generated is represented by different vectors.

Obtaining visual feature vectors of the user input pictures through convolutional neural network processing, wherein convolutional neural network convolutional layers are determined according to an formula:

where n represents the nth layer of convolution,

represents the output of the nth layer of convolution, represents the convolution operation, and g represents the activation function Relu;

extracting a visual feature vector V according to formula :

V＝{v₁,v₂,…,v_B}

(7) creating ancient poetry text generation model

By keywordsExtracting N keyword sets K obtained by the network, wherein the K belongs to the { K ∈ [ ]₁,k₂,…,k_NV, V e { V ∈ { V } in combination with a set of visual feature vectors using convolutional neural networks₁,v₂,…,v_B}，v_jRepresenting j part information in pictures contained in each visual feature vector, generating ancient poems sentence by sentence, and generating i row ancient poems l_iAll rows l generated before_1:i-1∈{x₁,x₂,…,x_CAll can be used as inputs to the model, where l_1:i-1Sequence representing the connection of ancient poems from line 1 to line i, x_jAnd (3) vector representation representing the jth word, wherein C is the length of the ancient poetry sequence, the first sentence of the ancient poetry is generated by a keyword set K and a visual characteristic vector set V, and other sentences are generated according to the method.

The encoder of the ancient poetry generating network selects a bidirectional gating circulating unit and generates an ancient poetry sequence l_1:i-1Coding into a hidden vector H, and a forward gating cyclic unit is responsible for a forward coding sequence l_1:i-1And obtaining a forward semantic hidden vector

The reverse gated cyclic unit is responsible for the reverse coding sequence l_1:i-1And obtaining a reverse semantic hidden vector

Splicing

As sequence l_1:i-1Determining a forward semantic hidden vector according to the formula

Reverse semantic hidden vector

Coding vector h_j：

Wherein GRU () is a gated loop unit operation,

representing hidden vectors

And reverse semantic hidden vectors

The splicing operation of (1).

The decoder of the ancient poetry generating network selects a one-way gating circulation unit, and the next sentence of ancient poetry l is obtained by decoding visual characteristic information V and the preceding coding information H_i∈{y₁,y₂,…,y_GCircularly updating internal transition state s by using gated cyclic unit of decoder_tFor decoding y_t，s_tCalculating the probability distribution y of each word by using Softmax function after updating_tThe output with the highest probability is selected as y_tI.e. the next word, the next sentence of ancient poetry is generated word by word.

(8) Judging emotional tendency of ancient poems

Outputting an emotional tendency judgment network for the generated poetry and poetry sentences, giving emotional word weights according to emotional strength by inquiring an emotional dictionary, and determining weighted summation according to the following formula to judge the emotional tendency of the poetry and poetry

Wherein N is_pNumber of words representing positive emotions, N_nNumber of words representing negative emotions, wp_iWeight, wp, representing positive emotion words_jRepresenting the weight of the emotional words.

The result of the weighted calculation is used as the judgment basis,

the generated poems are positively emotional inclined,

the generated poems are negative emotions,

0 is not an emotional tendency.

(9) Show and generate ancient poems

And displaying the generated ancient poems and emotion tendency judgment results of the ancient poems to the user.

In the step (2) of the present invention, the performing detail gray level processing on the picture by using the piecewise linear gray level enhancement method comprises: the gray level of an image f (x, y) input by a user is 0-32 levels, the gray level of an enhanced image h (x, y) is 38-64 levels, and the enhanced image h (x, y) is determined according to an formula:

wherein, x represents the length of the picture pixel, y represents the width of the picture pixel, and the sections [ a, b ], [ c, d ] are respectively a certain gray scale section of the original image and the enhanced image.

In the step (3), the labeling the preprocessed picture by using the pascal visual object class method includes: image collection, image preprocessing, image annotation, data set segmentation, Faster R-convolution spiritTraining by a network model, and exporting and storing the model; wherein the loss function L ({ p) of the fast R-convolutional neural network model_i},{t_i}) as a classification loss function

And regression function

Is determined as follows, the loss function L ({ p)_i},{t_i}), classification loss function

Regression function

R＝smooth_L1(x)

Wherein i is an index of an anchor frame in the small batch descent; p is a radical of_iIs the predicted probability with anchor frame i as the target;

is a standard box label with a value of 0 or 1; t is t_i4 parameterized coordinate vectors representing predicted bounding boxes; n is a radical of_clsIs a mark corresponding to the positive anchor frameCoordinate vectors of the standard frames; n is a radical of_regThe number of anchor frame positions; smooth_L1(x) For the smooth function, here 2400, λ is 10, the learning rate of the model is 0.0025, the number of iteration rounds is 12, the batch size of each round of training is 2, and the backbone network is selected as the residual network 50.

Compared with the prior art, the invention has the following advantages:

according to the method, an image detection algorithm based on a fast R-convolutional neural network is adopted to self-establish an image data set of ancient poetry ideogram, so that the detection accuracy and detection speed of the image ancient poetry during generation are improved, the extracted keyword information and visual characteristic information of the image are combined to determine the generation of the ancient poetry, the theme consistency of the generated ancient poetry and the image information is enhanced, the connection and continuity between the ancient poetry sentences are improved through the combination of a decoder gate control circulation unit and a decoder gate control circulation unit of a coder Bi-gate control circulation unit, the functionality and readability of the ancient poetry generation are enriched through the judgment of the emotional tendency of the ancient poetry, and the generation quality and interestingness of the ancient poetry are improved. The invention can help the user experience the input picture to obtain the fun of ancient poetry, promotes the ancient poetry culture to the common public, attracts the public to carry and propagate the poetry culture, is beneficial to enhancing the culture confidence of the public and carries and propagates the Chinese excellent traditional culture.

Drawings

FIG. 1 is a flow chart of example 1 of the present invention.

Detailed Description

The present invention will be described in further detail below with reference to the drawings and examples, but the present invention is not limited to the embodiments described below.

Example 1

In fig. 1, the image ancient poem generating method based on the fast R-convolutional neural network detection model of the embodiment comprises the following steps:

(1) collecting picture of ancient poetry elephant words

(2) Preprocessing ancient poetry elephant word picture

The method for carrying out detail gray level processing on the picture by using the piecewise linear gray level enhancement method comprises the following steps: the gray level of an image f (x, y) input by a user is 0-32 levels, the gray level of the image f (x, y) in the embodiment is 16 levels, the specific number of the gray levels of the image f (x, y) is determined according to the input image, the gray level of an enhanced image h (x, y) is 38-64 levels, the gray level of the enhanced image h (x, y) in the embodiment is 50 levels, and the specific number of the gray levels of the enhanced image h (x, y) is determined according to the specific result of the detail gray level processing on the image.

The enhanced image h (x, y) is determined by equation :

(3) Construction of image data set of ancient poetry elephant words

The labeling of the preprocessed picture by using the Pascal visual object class method comprises the following steps: image collection, image preprocessing, image annotation, data set segmentation, fast R-convolution neural network model training and model export storage; wherein the loss function L ({ p) of the fast R-convolutional neural network model_i},{t_i}) as a classification loss function

And regression function

Is determined as follows, the loss function L ({ p)_i},{t_i}), classification loss function L_cls(p_i,p_i ^*) Regression function

R＝smooth_L1(x)

is a standard box label with a value of 0 or 1; t is t_iIs 4 parameterized coordinate vectors representing predicted bounding boxes; n is a radical of_clsIs the coordinate vector of the standard frame corresponding to the positive anchor frame; n is a radical of_regThe number of anchor frame positions; smooth_L1(x) For a smooth function, here 2400, λ is 10, the learning rate of the model is 0.0025, the number of iteration rounds is 12, the batch size of each round of training is 2, and the backbone network is chosen asA residual network 50.

(4) Inputting user images

(5) Extracting image keyword features

Π＝Softmax(f(I))

where I represents the input picture, f represents the convolutional neural network computation, j represents the jth component in Π, and π_j(I) Indicates the probability of the jth label contained in the picture I, f_n(I) The nth label score of the picture I after the convolution neural network calculation is represented, the value range of j is 0-9, and the value of j in the embodiment is 5.

The loss function J of the keyword extraction network is:

where Ψ represents the number of samples.

K＝{k₁,k₂,……,k_N}

wherein N represents the number of extracted keywords, the value range of N is 0 to 9, and the value of N in this embodiment is 5.

(6) Extracting visual features of an image

where n represents the nth layer of convolution,

represents the output of the nth layer of convolution, represents the convolution operation, and g represents the activation function Relu.

Extracting a visual feature vector V according to formula :

V＝{v₁,v₂,…,v_B}

the method adopts the extracted keyword information and visual characteristic information of the image to determine the generation of the ancient poems, thereby enhancing the theme consistency of the generated ancient poems and the image information.

(7) Creating ancient poetry text generation model

N keyword sets K obtained from keyword extraction network, K ∈ { K }₁,k₂,…,k_NV, V e { V ∈ { V } in combination with a set of visual feature vectors using convolutional neural networks₁,v₂,…,v_B}，v_jRepresenting j part information in pictures contained in each visual feature vector, generating ancient poems sentence by sentence, and generating I line ancient poems l_iAll rows l generated before_1:i-1∈{x₁,x₂,…,x_CAll can be used as inputs to the model, where l_1:i-1Sequence representing the coupling of the poetry from line 1 to line i, x_jAnd (3) vector representation representing the jth word, wherein C is the length of the ancient poetry sequence, the first sentence of the ancient poetry is generated by a keyword set K and a visual characteristic vector set V, and other sentences are generated according to the method.

The encoder of the ancient poetry generating network selects a bidirectional gating circulating unit and generates an ancient poetry sequence l_1:i-1Coded into a hidden vector H, the forward-gated cyclic unit being responsible forForward coding sequence l_1:i-1And obtaining a forward semantic hidden vector

Splicing

Reverse semantic hidden vector

Coding vector h_j：

Wherein GRU () is a gated loop unit operation,

representing hidden vectors

And reverse semantic hidden vectors

The splicing operation of (1).

The decoder of the ancient poetry generating network selects a one-way gating circulation unit, and the next sentence of ancient poetry l is obtained by decoding visual characteristic information V and preceding coding information H_i∈{y₁,y₂,…,y_GCircularly updating internal transition state s by using gated cyclic unit of decoder_tFor decoding y_t，s_tCalculating the probability distribution y of each word by using a Softmax function after updating_tThe output with the highest probability is selected as y_tI.e. the next word, the next sentence of ancient poetry is generated word by word.

The invention adopts the combination of the encoder Bi-gate control circulation unit decoder gate control circulation unit, and improves the connection and the continuity between the ancient poetry sentences.

(8) Judging emotional tendency of ancient poems

Wherein N is_pNumber of words representing positive emotion, N_nNumber of words representing negative emotions, wp_iWeight, wp, representing positive emotion words_jRepresenting the weight of the emotional words.

The result of the weighted calculation is used as the judgment basis,

the generated ancient poems are positively emotional tendencies,

the generated ancient poems are negative emotions,

0 is not an emotional tendency.

(9) Show and generate ancient poems

Image ancient poem generation method based on Faster R-convolution neural network detection model

Example 2

The image ancient poem generating method based on the Faster R-convolutional neural network detection model comprises the following steps:

(1) collecting picture of ancient poetry elephant words

This procedure is the same as in example 1.

(2) Pretreatment of ancient poetry ideogram picture

The method for carrying out detail gray level processing on the picture by using the piecewise linear gray level enhancement method comprises the following steps: the gray level of an image f (x, y) input by a user is 0-32 levels, the gray level of the image f (x, y) in this embodiment is 0 level, the specific number of gray levels of the image f (x, y) is specifically determined according to the input image, the gray level of an enhanced image h (x, y) is 38-64 levels, the gray level of the enhanced image h (x, y) in this embodiment is 38 levels, and the specific number of gray levels of the enhanced image h (x, y) is determined according to the specific result of the detail gray level processing on the image.

The specific method for determining the enhanced image h (x, y) is the same as in example 1.

(3) Construction of image data set of ancient poetry elephant words

This procedure is the same as in example 1.

(4) Inputting user images

This procedure is the same as in example 1.

(5) Extracting image keyword features

Π＝Soft max(f(I))

where I represents the input picture, f represents the convolutional neural network computation, j represents the jth component in Π, and π_j(I) Indicates the probability of the jth tag contained in the picture I, f_n(I) The nth label score of the picture I after the convolution neural network calculation is represented, the value range of j is 0-9, and the value of j in the embodiment is 0.

The loss function J of the keyword extraction network is:

where Ψ represents the number of samples.

K＝{k₁,k₂,……,k_N}

wherein N represents the number of extracted keywords, the value range of N is 0-9, and the value of N in this embodiment is 0.

The other steps were the same as in example 1.

And finishing the image ancient poem generation method based on the Faster R-convolution neural network detection model.

Example 3

The method for generating image ancient poems based on the Faster R-convolutional neural network detection model comprises the following steps:

(1) collecting picture of ancient poetry elephant words

This procedure is the same as in example 1.

(2) Preprocessing ancient poetry elephant word picture

And carrying out size unification treatment on the acquired image words and pictures, and carrying out detail gray level treatment on the pictures by adopting a piecewise linear gray level enhancement method to enhance the image contrast and compress unnecessary image details.

The method for carrying out detail gray level processing on the picture by using the piecewise linear gray level enhancement method comprises the following steps: the gray level of an image f (x, y) input by a user is 0-32 levels, the gray level of the image f (x, y) in this embodiment is 32 levels, the specific number of the gray levels of the image f (x, y) is specifically determined according to the input image, the gray level of an enhanced image h (x, y) is 38-64 levels, the gray level of the enhanced image h (x, y) in this embodiment is 64 levels, and the specific number of the gray levels of the enhanced image h (x, y) is determined according to the specific result of performing detail gray level processing on the image.

(3) Construction of image data set of ancient poetry elephant words

This procedure is the same as in example 1.

(4) Inputting user images

This procedure is the same as in example 1.

(5) Extracting image keyword features

Π＝Soft max(f(I))

where I represents the input picture, f represents the convolutional neural network computation, j represents the jth component in Π, and π_j(I) Indicates the probability of the jth tag contained in the picture I, f_n(I) The nth label score of the picture I after the convolution neural network calculation is represented, the value range of j is 0-9, and the value range of j is 9.

The loss function J of the keyword extraction network is:

where Ψ represents the number of samples.

K＝{k₁,k₂,……,k_N}

wherein N represents the number of extracted keywords, the value range of N is 0-9, and the value of N in this embodiment is 9.

The other steps were the same as in example 1.

Claims

1. An image ancient poem generation method based on a Faster R-convolutional neural network detection model is characterized by comprising the following steps:

(1) collecting ancient poetry elephant words picture

Based on 100 common image words of the ancient poems, crawling 100 pictures corresponding to the image words from the Internet image data by adopting a crawler method to obtain 10000 pictures of the ancient poems;

(2) preprocessing ancient poetry elephant word picture

Carrying out size unification treatment on the collected image word picture, and carrying out detail gray level treatment on the picture by adopting a piecewise linear gray level enhancement method to enhance the image contrast and compress unnecessary image details;

(3) construction of image data set of ancient poetry elephant words

Labeling the preprocessed pictures by using a Pascal visual object method, sequentially labeling image word labels contained in the pictures, outputting extensible markup language files corresponding to the pictures, and segmenting a data set of 10000 collected pictures and the extensible markup language files corresponding to 10000 pictures according to a ratio of 8: 2; training by adopting a Faster R-convolutional neural network to obtain an ancient poetry image data set;

(4) inputting user images

A single picture which needs to be poem is selected as user input, and the size of the picture has no requirement;

(5) extracting image keyword features

Extracting high latitude semantic features in an image from an input image of a user by using a convolutional neural network, predicting probability distribution of image tags by using a Softmax function, and determining predicted tag distribution pi according to an formula:

Π＝Softmax(f(I))

where I represents the input picture, f represents the convolutional neural network computation, j represents the jth component in Π, and π_j(I) Indicates the probability of the jth tag contained in the picture I, f_n(I) Representing the nth label score of the picture I after the calculation of the convolutional neural network, wherein the value range of j is 0-9;

the loss function J of the keyword extraction network is:

where Ψ represents the number of samples;

K＝{k₁,k₂,……,k_N}

wherein N represents the number of the extracted keywords, and the value range of N is 0-9;

(6) extracting visual features of an image

A set V of visual characteristic vectors extracted from the picture, wherein each vector comprises local visual coding information of different positions of the picture, and the weight of each character when the ancient poetry is generated is represented by different vectors;

where n represents the nth layer of convolution,

extracting the visual feature vector V according to formula :

V＝{v₁,v₂,…,v_B}

(7) creating ancient poetry text generation model

N keyword sets K obtained by keyword extraction network, wherein K belongs to { K ∈ [ ]₁,k₂,…,k_NV, V e { V ∈ { V } in combination with a set of visual feature vectors using convolutional neural networks₁,v₂,…,v_B}，v_jRepresenting j part information in pictures contained in each visual feature vector, generating ancient poems sentence by sentence, and generating i row ancient poems l_iAll rows l generated before_1:i-1∈{x₁,x₂,…,x_CAll can be used as inputs to the model, where l_1:i-1Sequence representing the connection of ancient poems from line 1 to line i, x_jExpressing the vector expression of the jth word, wherein C is the length of an ancient poetry sequence, a first sentence of the ancient poetry is generated by a keyword set K and a visual characteristic vector set V, and other sentences are generated according to the method;

the encoder of the ancient poetry generating network selects a bidirectional gating circulation unit, and the generated ancient poetry sequence l_1:i-1Coding into a hidden vector H, and a forward gating cyclic unit is responsible for a forward coding sequence l_1:i-1And obtaining a forward semantic hidden vector

Reverse directionThe gated cyclic unit is responsible for the reverse coding sequence l_1:i-1And obtaining a reverse semantic hidden vector

Splicing

Reverse semantic hidden vector

Encoding vector h_j：

Wherein GRU () is a gated loop unit operation,

representing hidden vectors

And reverse semantic hidden vectors

Splicing operation of (3);

the decoder of the ancient poetry generating network selects a one-way gating circulation unit, and the next sentence of ancient poetry l is obtained by decoding visual characteristic information V and the preceding coding information H_i∈{y₁,y₂,…,y_GCircularly updating internal transition state s by using gated cyclic unit of decoder_tFor decoding y_t，s_tCalculating the probability distribution y of each word by using Softmax function after updating_tThe output with the highest probability is selected as y_tThe next word is obtained, and the next sentence of ancient poetry is generated word by word;

(8) judging emotional tendency of ancient poems

Wherein N is_pNumber of words representing positive emotion, N_nNumber of words representing negative emotions, wp_iWeight, wp, representing positive sentiment words_jRepresenting the weight of the emotional words;

the result of the weighted calculation is used as the judgment basis,

the generated poems are positively emotional inclined,

the generated poems are negative emotions,

0 is not emotional tendency;

(9) show and generate ancient poems

2. The method for generating image ancient poetry based on the Faster R-convolutional neural network detection model as claimed in claim 1, wherein in the step (2), the detail gray level processing of the image by using the piecewise linear gray level enhancement method is as follows: the gray level of an image f (x, y) input by a user is 0-32 levels, the gray level of an enhanced image h (x, y) is 38-64 levels, and the enhanced image h (x, y) is determined according to an formula:

3. The method for generating image ancient poetry based on the Faster R-convolutional neural network detection model as claimed in claim 1, wherein in the step (3), said labeling the preprocessed pictures by using the pascal visual object class method comprises: image collection, image preprocessing, image annotation, data set segmentation, fast R-convolution neural network model training and model export storage; wherein the loss function L ({ p) of the fast R-convolutional neural network model_i},{t_i}) as a classification loss function L_cls(p_i,p_i ^*) And a regression function L_reg(t_i,t_i ^*) Is determined as follows_i},{t_i}), classification loss function L_cls(p_i,p_i ^*) Regression function L_reg(t_i,t_i ^*)：

R＝smooth_L1(x)

is a standard box label with a value of 0 or 1; t is t_iIs 4 parameterized coordinate vectors representing predicted bounding boxes; n is a radical of_clsIs the coordinate vector of the standard frame corresponding to the positive anchor frame; n is a radical of_regThe number of anchor frame positions; smooth_L1(x) For a smooth function, here 2400, λ is 10, the learning rate of the model is 0.0025, the number of iteration rounds is 12, the batch size of each round of training is 2, and the backbone network is selected as the residual network 50.