WO2023273173A1 - 目标分割的方法、装置及电子设备 - Google Patents
目标分割的方法、装置及电子设备 Download PDFInfo
- Publication number
- WO2023273173A1 WO2023273173A1 PCT/CN2021/136548 CN2021136548W WO2023273173A1 WO 2023273173 A1 WO2023273173 A1 WO 2023273173A1 CN 2021136548 W CN2021136548 W CN 2021136548W WO 2023273173 A1 WO2023273173 A1 WO 2023273173A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature map
- frame
- correlation matrix
- target object
- correlation
- Prior art date
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 89
- 238000000034 method Methods 0.000 title claims abstract description 86
- 239000011159 matrix material Substances 0.000 claims abstract description 243
- 230000004927 fusion Effects 0.000 claims description 31
- 238000012545 processing Methods 0.000 claims description 22
- 238000010606 normalization Methods 0.000 claims description 20
- 238000000605 extraction Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 10
- 238000005516 engineering process Methods 0.000 abstract description 12
- 238000013473 artificial intelligence Methods 0.000 abstract description 5
- 238000013135 deep learning Methods 0.000 abstract description 2
- 239000013598 vector Substances 0.000 description 31
- 238000010586 diagram Methods 0.000 description 21
- 238000004891 communication Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 8
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 4
- 230000018109 developmental process Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000001953 sensory effect Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000007500 overflow downdraw method Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000011282 treatment Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Definitions
- the present disclosure relates to the field of artificial intelligence, in particular to computer vision and deep learning technology, which can be used in smart city and intelligent traffic scenarios, and in particular to a method, device and electronic equipment for object segmentation.
- the present disclosure provides a method, device, electronic equipment and storage medium for object segmentation.
- a method for target segmentation including:
- the generating the feature map of the frame to be identified, the feature map of the reference frame of the target object and the feature map of the previous frame of the target object includes:
- the generating the first correlation matrix and the second correlation matrix according to the feature map of the frame to be identified, the feature map of the reference frame of the target object, and the feature map of the previous frame of the target object includes:
- the second correlation matrix is generated according to the feature map of the frame to be recognized and the feature map of the previous frame of the target object.
- the generating the first correlation matrix according to the feature map of the frame to be identified and the feature map of the target object reference frame includes:
- a reference value in each row of the second reference correlation matrix is generated, and the first correlation matrix is generated according to the reference value, wherein the reference value is greater than other values in the same row.
- the generating the second correlation matrix according to the feature map of the frame to be recognized and the feature map of the previous frame of the target object includes:
- generating a first correlation feature map and a second correlation feature map according to the first correlation matrix, the second correlation matrix, the target object reference frame feature map, and the target object previous frame feature map include:
- the second correlation matrix is multiplied point-to-point by the feature map of the previous frame of the target object to generate the second correlation feature map.
- the generating the target segmented image of the current frame according to the first correlation feature map, the second correlation feature map and the frame feature map to be identified includes:
- the generating a fusion feature map according to the first correlation feature map, the second correlation feature map and the frame feature map to be identified includes:
- a device for object segmentation including:
- a video frame generating module configured to generate a frame to be identified, a frame before the frame to be identified and a reference frame according to the video to be identified, the reference frame being the first frame of the video to be identified;
- a feature extraction module configured to input the frame to be identified, the previous frame and the reference frame into the encoding network, and generate a feature map of the frame to be identified, a feature map of the reference frame of the target object, and a feature map of the previous frame of the target object;
- a correlation matrix generating module configured to generate a first correlation matrix and a second correlation matrix according to the feature map of the frame to be identified, the feature map of the reference frame of the target object, and the feature map of the previous frame of the target object;
- a feature map generation module configured to generate a first correlation feature map and a second correlation feature map according to the first correlation matrix, the second correlation matrix, the target object reference frame feature map, and the target object previous frame feature map ;
- a target segmentation module configured to generate a current frame target segmentation image according to the first correlation feature map, the second correlation feature map, and the frame-to-be-recognized feature map.
- the feature extraction module includes:
- a feature extraction submodule configured to extract features of the frame to be identified, the previous frame, and the reference frame, so as to generate a feature map of the frame to be identified, a feature map of the previous frame, and a feature map of the reference frame;
- a first mask submodule configured to generate a target object reference frame feature map according to the reference frame feature map and the target object mask of the reference frame;
- the second mask submodule is configured to generate the feature map of the previous frame of the target object according to the feature map of the previous frame and the mask of the target object of the previous frame.
- the correlation matrix generation module includes:
- a first correlation matrix generating submodule configured to generate the first correlation matrix according to the feature map of the frame to be identified and the feature map of the target object reference frame;
- the second correlation matrix generating submodule is configured to generate the second correlation matrix according to the feature map of the frame to be recognized and the feature map of the previous frame of the target object.
- the first correlation matrix generating submodule includes:
- a reference correlation matrix generating unit configured to generate a reference correlation matrix according to the feature map of the frame to be identified and the feature map of the reference frame of the target object;
- a second reference correlation matrix generating unit configured to normalize the reference correlation matrix to generate a second reference correlation matrix
- the first correlation matrix generating unit is configured to generate a reference value in each row of the second reference correlation matrix, and generate the first correlation matrix according to the reference value, wherein the reference value is greater than other values in the same row.
- the second correlation matrix generating submodule includes:
- a previous frame correlation matrix generating unit configured to generate a previous frame correlation matrix according to the feature map of the frame to be recognized and the feature map of the previous frame of the target object;
- a second previous frame correlation matrix generating unit configured to normalize the previous frame correlation matrix to generate a second previous frame correlation matrix
- a second correlation matrix generating unit configured to generate a reference value in each row of the correlation matrix of the second previous frame, and generate the second correlation matrix according to the reference value, wherein the reference value is greater than other values in the same row .
- the feature map generation module includes:
- the first correlation feature map generation submodule is used to multiply the first correlation matrix and the target object reference frame feature map point-to-point to generate the first correlation feature map;
- the second correlation feature map generation submodule is used to multiply the second correlation matrix with the frame feature map of the previous frame of the target object point-to-point to generate the second correlation feature map.
- the segmentation module according to the target includes:
- a feature fusion submodule configured to generate a fusion feature map according to the first correlation feature map, the second correlation feature map, and the frame feature map to be identified;
- the decoding sub-module is used to input the fusion feature map into the decoding network, and generate the current frame target segmented image.
- the feature fusion submodule includes:
- a feature fusion unit configured to splice the first correlation feature map, the second correlation feature map, and the frame feature map to be recognized to generate the fusion feature map.
- an electronic device including:
- the memory stores instructions that can be executed by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can perform the instructions described in any one of the first aspects. Methods.
- a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the computer according to any one of the first aspects.
- a computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of the first aspects.
- the attention is focused on the target object, and the accuracy of identifying the target object is improved.
- FIG. 1 is a schematic flowchart of a method for object segmentation provided according to an embodiment of the present disclosure
- FIG. 2 is a schematic flowchart of a method for object segmentation provided according to an embodiment of the present disclosure
- FIG. 3 is a schematic flowchart of a method for object segmentation provided according to an embodiment of the present disclosure
- FIG. 4 is a schematic flowchart of a method for object segmentation provided according to an embodiment of the present disclosure
- FIG. 5 is a schematic flowchart of a method for object segmentation provided according to an embodiment of the present disclosure
- FIG. 6 is a schematic flowchart of a method for object segmentation provided according to an embodiment of the present disclosure
- FIG. 7 is a schematic flowchart of a method for object segmentation provided according to an embodiment of the present disclosure.
- FIG. 8 is a schematic structural diagram of an object segmentation device provided according to an embodiment of the present disclosure.
- FIG. 9 is a schematic structural diagram of an object segmentation device provided according to an embodiment of the present disclosure.
- FIG. 10 is a schematic structural diagram of an apparatus for object segmentation provided according to an embodiment of the present disclosure.
- FIG. 11 is a schematic structural diagram of an object segmentation device provided according to an embodiment of the present disclosure.
- FIG. 12 is a schematic structural diagram of an apparatus for object segmentation provided according to an embodiment of the present disclosure.
- FIG. 13 is a schematic structural diagram of an object segmentation device provided according to an embodiment of the present disclosure.
- Fig. 14 is a schematic structural diagram of an apparatus for object segmentation according to an embodiment of the present disclosure.
- FIG. 15 is a block diagram of an electronic device used to implement the method for object segmentation in an embodiment of the present disclosure
- Fig. 16 is a schematic structural diagram of an apparatus for object segmentation according to an embodiment of the present disclosure.
- a common method is to generate a kind of entity attention instance attention by reading the information of the historical frame and extracting the vectors of all the occurrence positions of the target objects in the historical frame, but this method will add the extracted target vectors, Compress the vector of (c, h, w) into a vector of (c, 1, 1), and then add the (c, 1, 1) vector to the network to assist the network for target segmentation.
- This can solve the problem of target occlusion to a certain extent, but when this method compresses the extracted vector into (c, 1, 1), it loses all the position, shape, and adjacent vector correlation of the target. and other related information, so there is still much room for improvement in this method.
- Fig. 1 is a schematic flowchart of a method for object segmentation provided according to an embodiment of the present disclosure. As shown in Figure 1, the method for described target segmentation comprises:
- Step 101 generating a frame to be recognized, a frame preceding the frame to be recognized and a reference frame according to the video to be recognized, the reference frame being the first frame of the video to be recognized;
- the present disclosure can be used in smart cities and smart transportation scenarios.
- Smart cities use information and communication technology to sense, analyze, and integrate various key information of the core system of urban operation.
- the construction of smart cities requires the realization of comprehensive perception, ubiquitous interconnection, pervasive computing and integrated applications through the new generation of information technology applications such as the Internet of Things and cloud computing represented by mobile technology.
- An important sensory information of a smart city is the video information obtained by surveillance cameras.
- the video information can be further mined.
- the video to be identified is collected by a camera, and one frame is selected as the frame to be identified.
- This disclosure uses historical frames, that is, the previous frame of the frame to be recognized and a reference frame to enhance the characteristics of the target object in the frame to be recognized, and the previous frame is the previous frame adjacent to the frame to be recognized , the reference frame is the first frame of the video to be recognized.
- Step 102 input the frame to be recognized, the previous frame and the reference frame into the encoding network, and generate a feature map of the frame to be recognized, a feature map of the reference frame of the target object, and a feature map of the previous frame of the target object;
- the encoding network is an encoder in a neural network, and the encoding network is used to down-sample the frame to be identified, the frame to be identified, and the reference frame to extract the frame to be identified, the frame to be identified, and the frame to be identified. High-dimensional features of the previous frame and the reference frame. That is, generating the feature map of the frame to be identified,
- the present disclosure uses the target object mask corresponding to the previous frame and the reference frame to obtain the feature map of the target object reference frame and the feature map of the target object previous frame.
- Step 103 generating a first correlation matrix and a second correlation matrix according to the feature map of the frame to be identified, the feature map of the reference frame of the target object, and the feature map of the previous frame of the target object;
- the correlation matrix also known as the correlation matrix.
- each element in the matrix is used to represent the correlation between the local feature vector in a feature map and the local feature vector in another feature map, usually by two local feature vectors Dot product to represent.
- the size of the correlation matrix of two feature maps of size H*W*d is (H*W)*(H*W), where H is height, W is width, and d is the number of channels.
- Correlation is the basis for measuring the matching degree of features, and features will have different representations according to different tasks, usually based on semantic features of shape, color, and texture.
- the present disclosure uses the correlation matrix to characterize the correlation between pixels in the feature map of the reference frame of the target object, pixels in the feature map of the previous frame of the target object, and pixels in the feature map of the frame to be recognized, and the frame to be recognized.
- Step 104 generating a first correlation feature map and a second correlation feature map according to the first correlation matrix, the second correlation matrix, the target object reference frame feature map, and the target object previous frame feature map;
- the first correlation matrix, the second correlation matrix and the feature map of the frame to be recognized can generate the target feature map of the frame to be recognized, and according to the correlation matrix, the characteristics of the feature map of the frame to be recognized can be strengthened to improve The detection accuracy of the target object.
- Step 105 generating a current frame target segmentation image according to the first correlation feature map, the second correlation feature map and the frame feature map to be recognized.
- the first correlation feature map and the second correlation feature map can be generated by multiplying the distribution of the first correlation feature map and the second correlation feature map by point-to-point multiplication of pixels in the feature map of the frame to be recognized. Then splicing concat the first correlation feature map, the second correlation feature map and the frame feature map to be recognized to enhance features of pixels related to the target object to generate a fusion feature map.
- the target segmented image can be obtained by inputting the fused feature map into a decoder, and the decoder is used for upsampling and restoring the target segmented image to the size of the frame to be detected. Obtain the pixels belonging to the target object in the frame to be detected.
- Fig. 2 is a schematic flowchart of a method for object segmentation according to an embodiment of the present disclosure. As shown in Figure 2, the method for described target segmentation comprises:
- Step 201 extracting the features of the frame to be recognized, the previous frame and the reference frame to generate a feature map of the frame to be recognized, a feature map of the previous frame and a feature map of the reference frame;
- the present disclosure utilizes a neural network to extract features of the frame to be recognized, the previous frame, and the reference frame.
- the methods for extracting features are known and diverse, and are not protected by the present disclosure.
- a random down-sampling method is used for feature extraction, and the feature map of the frame to be recognized, the feature map of the previous frame and the feature map of the reference frame are generated.
- Step 202 generating a target object reference frame feature map according to the reference frame feature map and the target object mask of the reference frame;
- the reference frame has obtained the mask of the target object through the target segmentation method, and the target object can be generated by multiplying the target object mask of the reference frame and the pixel of the feature map of the reference frame point by point Reference frame feature map.
- the feature map of the target object in the reference frame containing only the target object can be obtained, so as to facilitate the subsequent acquisition of the first correlation matrix.
- Step 203 Generate the feature map of the previous frame of the target object according to the feature map of the previous frame and the mask of the target object in the previous frame.
- the previous frame has obtained the mask of the target object through the target segmentation method, and the target object mask of the previous frame is multiplied point-to-point by the pixel of the feature map of the reference frame to generate the The feature map of the previous frame of the target object.
- the feature map of the target object in the previous frame containing only the target object can be acquired, so as to facilitate subsequent acquisition of the second correlation matrix.
- Fig. 3 is a schematic flowchart of a method for object segmentation according to an embodiment of the present disclosure. As shown in Figure 3, the method for described target segmentation comprises:
- Step 301 generating the first correlation matrix according to the feature map of the frame to be recognized and the feature map of the target object reference frame;
- the present disclosure generates the first correlation matrix according to the feature map of the frame to be recognized and the feature map of the target object reference frame, to characterize the pixels in the feature map of the frame to be recognized and the feature map of the target object reference frame that belong to the target
- the correlation of object pixels facilitates subsequent feature extraction.
- Step 302 generating the second correlation matrix according to the feature map of the frame to be recognized and the feature map of the previous frame of the target object.
- the present disclosure simultaneously generates the second correlation matrix according to the feature map of the frame to be recognized and the feature map of the reference frame of the target object to characterize the pixels in the feature map of the frame to be recognized and the feature map of the previous frame of the target object
- the correlation of pixels belonging to the target object facilitates subsequent feature extraction.
- Fig. 4 is a schematic flowchart of a method for object segmentation according to an embodiment of the present disclosure. As shown in Figure 4, the method for described target segmentation comprises:
- Step 401 generating a reference correlation matrix according to the feature map of the frame to be recognized and the feature map of the reference frame of the target object;
- a reference correlation matrix is generated according to the feature map of the frame to be recognized and the feature map of the reference frame of the target object, and there are various methods for generating the correlation matrix.
- the Euclidean distance between the feature vector corresponding to the pixel in the feature map of the frame to be recognized and the feature vector corresponding to the pixel in the feature map of the target object reference frame is calculated, and the Euclidean distance is used as the reference The values of the elements in the correlation matrix to generate the reference correlation matrix.
- Step 402 performing normalization processing on the reference correlation matrix to generate a second reference correlation matrix
- the reference correlation matrix is normalized to reduce the error of subsequent target segmentation.
- the normalization process is performed using a softmax function . After normalization processing, a second reference correlation matrix is generated, and in any row of the second reference correlation matrix, the sum of all elements is 1.
- Step 403 generating a reference value in each row of the second reference correlation matrix, and generating the first correlation matrix according to the reference value, wherein the reference value is greater than other values in the same row.
- the present disclosure only reserves the element with the largest value in each row of the second reference correlation matrix, and the value of the element with the largest value is the reference value.
- the second reference frame correlation matrix is a (h ⁇ w, N) matrix, and after retaining the reference values, a (h ⁇ w, 1) matrix is generated and then reshaped , the first correlation matrix of (h, w) can be obtained.
- Fig. 5 is a schematic flowchart of a method for object segmentation according to an embodiment of the present disclosure. As shown in Figure 5, the method of described target segmentation comprises:
- Step 501 generating a previous frame correlation matrix according to the feature map of the frame to be recognized and the feature map of the previous frame of the target object;
- a correlation matrix of a previous frame is generated according to the feature map of the frame to be recognized and the feature map of the previous frame of the target object.
- the correlation matrix There are various methods for generating the correlation matrix.
- the Euclidean distance between the feature vector corresponding to the pixel in the feature map of the frame to be recognized and the feature vector corresponding to the pixel in the feature map of the previous frame of the target object is calculated, and the Euclidean distance is used as the Values of elements in the correlation matrix of the previous frame to generate the correlation matrix of the previous frame.
- Step 502 performing normalization processing on the correlation matrix of the previous frame to generate a second correlation matrix of the previous frame;
- the softmax function is used for the normalization processing. After the normalization process, the second previous frame correlation matrix is generated, and in any row of the second previous frame correlation matrix, the sum of all elements is 1.
- Step 503 generating a reference value in each row of the correlation matrix of the second previous frame, and generating the second correlation matrix according to the reference value, wherein the reference value is greater than other values in the same row.
- the present disclosure only retains the element with the largest value in each row in the correlation matrix of the second previous frame, and the value of the element with the largest value is the reference value.
- the correlation matrix of the second previous frame is a matrix of (h ⁇ w, N), and after saving the reference value, a matrix of (h ⁇ w, 1) is generated, and then re- Then, the second correlation matrix of (h, w) can be obtained.
- Fig. 6 is a schematic flowchart of a method for object segmentation according to an embodiment of the present disclosure. As shown in Figure 6, the method of described target segmentation comprises:
- Step 601 multiplying the first correlation matrix point-to-point by the target object reference frame feature map to generate the first correlation feature map
- the present disclosure multiplies the first correlation matrix with pixels in the target object reference frame feature map to obtain the first correlation feature map.
- the size of the first correlation matrix is the same as that of the target object reference frame feature map.
- Step 602 Multiply the second correlation matrix point-to-point by the feature map of the previous frame of the target object to generate the second correlation feature map.
- the present disclosure multiplies the second correlation matrix with pixels in the feature map of the previous frame of the target object point-to-point to obtain the second correlation feature map.
- the size of the second correlation matrix is the same as that of the feature map of the previous frame of the target object.
- Fig. 7 is a schematic flowchart of a method for object segmentation according to an embodiment of the present disclosure. As shown in Figure 7, the method of described target segmentation comprises:
- Step 701 generating a fusion feature map according to the first correlation feature map, the second correlation feature map and the frame feature map to be recognized;
- the present disclosure fuses the features in the first correlation feature map, the second correlation feature map, and the feature map of the frame to be recognized to generate a fusion feature map.
- the first correlation feature map, the second correlation feature map, and the frame feature map to be recognized are spliced and concat, and the number of channels of each pixel is increased to Generate the fused feature map.
- Step 702 input the fused feature map into the decoding network, and generate the target segmentation image of the current frame.
- the decoding network is used to upsample the fused feature map to restore features, and the pixels belonging to the target object can be obtained by segmenting the image of the target.
- the generating a fusion feature map according to the first correlation feature map, the second correlation feature map and the frame feature map to be identified includes:
- the splicing concat can increase the dimension of the image, fuse features together, and facilitate subsequent target segmentation.
- Fig. 8 is a schematic structural diagram of an apparatus for object segmentation according to an embodiment of the present disclosure.
- the device 800 for target segmentation includes:
- the video frame generation module 810 is used to generate a frame to be recognized, a frame before the frame to be recognized and a reference frame according to the video to be recognized, and the reference frame is the first frame of the video to be recognized;
- the present disclosure can be used in smart cities and smart transportation scenarios.
- Smart cities use information and communication technology to sense, analyze, and integrate various key information of the core system of urban operation.
- the construction of smart cities requires the realization of comprehensive perception, ubiquitous interconnection, pervasive computing and integrated applications through the new generation of information technology applications such as the Internet of Things and cloud computing represented by mobile technology.
- An important sensory information of a smart city is the video information obtained by surveillance cameras.
- the video information can be further mined.
- the video to be identified is collected by a camera, and one frame is selected as the frame to be identified.
- This disclosure uses historical frames, that is, the previous frame of the frame to be recognized and a reference frame to enhance the characteristics of the target object in the frame to be recognized, and the previous frame is the previous frame adjacent to the frame to be recognized , the reference frame is the first frame of the video to be recognized.
- a feature extraction module 820 configured to input the frame to be recognized, the previous frame and the reference frame into the encoding network, and generate a feature map of the frame to be recognized, a feature map of the reference frame of the target object, and a feature map of the previous frame of the target object ;
- the encoding network is an encoder in a neural network, and the encoding network is used to down-sample the frame to be identified, the frame to be identified, and the reference frame to extract the frame to be identified, the frame to be identified, and the frame to be identified. High-dimensional features of the previous frame and the reference frame. That is, generating the feature map of the frame to be identified,
- the present disclosure uses the target object mask corresponding to the previous frame and the reference frame to obtain the feature map of the target object reference frame and the feature map of the target object previous frame.
- a correlation matrix generating module 830 configured to generate a first correlation matrix and a second correlation matrix according to the feature map of the frame to be identified, the feature map of the reference frame of the target object, and the feature map of the previous frame of the target object;
- the correlation matrix also known as the correlation matrix.
- each element in the matrix is used to represent the correlation between the local feature vector in a feature map and the local feature vector in another feature map, usually by two local feature vectors Dot product to represent.
- the size of the correlation matrix of two feature maps of size H*W*d is (H*W)*(H*W), where H is height, W is width, and d is the number of channels.
- Correlation is the basis for measuring the matching degree of features, and features will have different representations according to different tasks, usually based on semantic features of shape, color, and texture.
- the present disclosure uses the correlation matrix to characterize the correlation between pixels in the feature map of the reference frame of the target object, pixels in the feature map of the previous frame of the target object, and pixels in the feature map of the frame to be recognized, and the frame to be recognized.
- a feature map generation module 840 configured to generate a first correlation feature map and a second correlation feature according to the first correlation matrix, the second correlation matrix, the target object reference frame feature map, and the target object previous frame feature map picture;
- the first correlation matrix, the second correlation matrix and the feature map of the frame to be recognized can generate the target feature map of the frame to be recognized, and according to the correlation matrix, the characteristics of the feature map of the frame to be recognized can be strengthened to improve The detection accuracy of the target object.
- the target segmentation module 850 is configured to generate a current frame target segmentation image according to the first correlation feature map, the second correlation feature map and the frame feature map to be recognized.
- the first correlation feature map and the second correlation feature map can be generated by multiplying the distribution of the first correlation feature map and the second correlation feature map by point-to-point multiplication of pixels in the feature map of the frame to be recognized. Then splicing concat the first correlation feature map, the second correlation feature map and the frame feature map to be recognized to enhance features of pixels related to the target object to generate a fusion feature map.
- the target segmented image can be obtained by inputting the fused feature map into a decoder, and the decoder is used for upsampling and restoring the target segmented image to the size of the frame to be detected. Obtain the pixels belonging to the target object in the frame to be detected.
- Fig. 9 is a schematic structural diagram of an apparatus for object segmentation according to an embodiment of the present disclosure.
- the device 900 for object segmentation includes:
- a feature extraction submodule 910 configured to extract features of the frame to be identified, the previous frame, and the reference frame, so as to generate a feature map of the frame to be identified, a feature map of the previous frame, and a feature map of the reference frame;
- the present disclosure utilizes a neural network to extract features of the frame to be recognized, the previous frame, and the reference frame.
- the methods for extracting features are known and diverse, and are not protected by the present disclosure.
- a random down-sampling method is used for feature extraction, and the feature map of the frame to be recognized, the feature map of the previous frame and the feature map of the reference frame are generated.
- the first mask submodule 920 is configured to generate a target object reference frame feature map according to the reference frame feature map and the target object mask of the reference frame;
- the reference frame has obtained the mask of the target object through the target segmentation method, and the target object can be generated by multiplying the target object mask of the reference frame and the pixel of the feature map of the reference frame point by point Reference frame feature map.
- the feature map of the target object in the reference frame containing only the target object can be obtained, so as to facilitate the subsequent acquisition of the first correlation matrix.
- the second mask submodule 930 is configured to generate the feature map of the previous frame of the target object according to the feature map of the previous frame and the mask of the target object of the previous frame.
- the previous frame has obtained the mask of the target object through the target segmentation method, and the target object mask of the previous frame is multiplied point-to-point by the pixel of the feature map of the reference frame to generate the The feature map of the previous frame of the target object.
- the feature map of the target object in the previous frame containing only the target object can be acquired, so as to facilitate subsequent acquisition of the second correlation matrix.
- Fig. 10 is a schematic structural diagram of an apparatus for object segmentation according to an embodiment of the present disclosure.
- the device 1000 for target segmentation includes:
- the first correlation matrix generation sub-module 1010 is configured to generate the first correlation matrix according to the feature map of the frame to be identified and the feature map of the target object reference frame;
- the present disclosure generates the first correlation matrix according to the feature map of the frame to be recognized and the feature map of the target object reference frame, to characterize the pixels in the feature map of the frame to be recognized and the feature map of the target object reference frame that belong to the target
- the correlation of object pixels facilitates subsequent feature extraction.
- the second correlation matrix generation sub-module 1020 is configured to generate the second correlation matrix according to the feature map of the frame to be recognized and the feature map of the previous frame of the target object.
- the present disclosure simultaneously generates the second correlation matrix according to the feature map of the frame to be recognized and the feature map of the reference frame of the target object to characterize the pixels in the feature map of the frame to be recognized and the feature map of the previous frame of the target object
- the correlation of pixels belonging to the target object facilitates subsequent feature extraction.
- Fig. 11 is a schematic structural diagram of an apparatus for object segmentation according to an embodiment of the present disclosure. As shown in Figure 11, the device 1100 for object segmentation includes:
- a reference correlation matrix generating unit 1110 configured to generate a reference correlation matrix according to the feature map of the frame to be identified and the feature map of the reference frame of the target object;
- a reference correlation matrix is generated according to the feature map of the frame to be recognized and the feature map of the reference frame of the target object, and there are various methods for generating the correlation matrix.
- the Euclidean distance between the feature vector corresponding to the pixel in the feature map of the frame to be recognized and the feature vector corresponding to the pixel in the feature map of the target object reference frame is calculated, and the Euclidean distance is used as the reference The values of the elements in the correlation matrix to generate the reference correlation matrix.
- the second reference correlation matrix generating unit 1120 is configured to perform normalization processing on the reference correlation matrix to generate a second reference correlation matrix
- the reference correlation matrix is normalized to reduce the error of subsequent target segmentation.
- the normalization process is performed using a softmax function . After normalization processing, a second reference correlation matrix is generated, and in any row of the second reference correlation matrix, the sum of all elements is 1.
- the first correlation matrix generating unit 1130 is configured to generate a reference value in each row of the second reference correlation matrix, and generate the first correlation matrix according to the reference value, wherein the reference value is greater than other values in the same row.
- the present disclosure only reserves the element with the largest value in each row of the second reference correlation matrix, and the value of the element with the largest value is the reference value.
- the second reference frame correlation matrix is a (h ⁇ w, N) matrix, and after retaining the reference values, a (h ⁇ w, 1) matrix is generated and then reshaped , the first correlation matrix of (h, w) can be obtained.
- Fig. 12 is a schematic structural diagram of an apparatus for object segmentation according to an embodiment of the present disclosure. As shown in Figure 12, the device 1200 for object segmentation includes:
- a previous frame correlation matrix generating unit 1210 configured to generate a previous frame correlation matrix according to the feature map of the frame to be recognized and the feature map of the previous frame of the target object;
- a correlation matrix of a previous frame is generated according to the feature map of the frame to be recognized and the feature map of the previous frame of the target object.
- the correlation matrix There are various methods for generating the correlation matrix.
- the Euclidean distance between the feature vector corresponding to the pixel in the feature map of the frame to be recognized and the feature vector corresponding to the pixel in the feature map of the previous frame of the target object is calculated, and the Euclidean distance is used as the Values of elements in the correlation matrix of the previous frame to generate the correlation matrix of the previous frame.
- the second previous frame correlation matrix generating unit 1220 is configured to perform normalization processing on the previous frame correlation matrix to generate a second previous frame correlation matrix
- the softmax function is used for the normalization treatment.
- the second previous frame correlation matrix is generated, and in any row of the second previous frame correlation matrix, the sum of all elements is 1.
- the second correlation matrix generation unit is used for the normalization treatment.
- the present disclosure only retains the element with the largest value in each row in the correlation matrix of the second previous frame, and the value of the element with the largest value is the reference value.
- the correlation matrix of the second previous frame is a matrix of (h ⁇ w, N), and after saving the reference value, a matrix of (h ⁇ w, 1) is generated, and then re- Then, the second correlation matrix of (h, w) can be obtained.
- Fig. 13 is a schematic structural diagram of an apparatus for object segmentation according to an embodiment of the present disclosure. As shown in Figure 13, the device 1300 for object segmentation includes:
- the first correlation feature map generation sub-module 1310 is used to multiply the first correlation matrix and the target object reference frame feature map point-to-point to generate the first correlation feature map;
- the present disclosure multiplies the first correlation matrix with pixels in the target object reference frame feature map to obtain the first correlation feature map.
- the size of the first correlation matrix is the same as that of the target object reference frame feature map.
- the second correlation feature map generation sub-module 1320 is configured to multiply the second correlation matrix with the frame feature map of the previous frame of the target object point-to-point to generate the second correlation feature map.
- the present disclosure multiplies the second correlation matrix with pixels in the feature map of the previous frame of the target object point-to-point to obtain the second correlation feature map.
- the size of the second correlation matrix is the same as that of the feature map of the previous frame of the target object.
- Fig. 14 is a schematic structural diagram of an apparatus for object segmentation according to an embodiment of the present disclosure.
- the device 1400 for object segmentation includes:
- a feature fusion submodule 1410 configured to generate a fusion feature map according to the first correlation feature map, the second correlation feature map, and the frame feature map to be identified;
- the present disclosure fuses the features in the first correlation feature map, the second correlation feature map, and the feature map of the frame to be recognized to generate a fusion feature map.
- the first correlation feature map, the second correlation feature map, and the frame feature map to be recognized are spliced and concat, and the number of channels of each pixel is increased to Generate the fused feature map.
- the decoding sub-module 1420 is configured to input the fused feature map into the decoding network, and generate the target segmentation image of the current frame.
- the feature fusion submodule includes:
- a feature fusion unit configured to splice the first correlation feature map, the second correlation feature map, and the frame feature map to be recognized to generate the fusion feature map.
- the splicing concat can increase the dimension of the image, fuse features together, and facilitate subsequent target segmentation.
- Fig. 16 is a schematic structural diagram of a target segmentation device provided according to an embodiment of the present disclosure.
- three frames of images including the first frame ref_im, the previous frame pre_im and the current frame cur_im are input into the network, and the feature Extracting the network, we can obtain the vector diagrams of the first frame, the previous frame, and the current frame respectively, represented by ref_emb, pre_emb, and cur_emb respectively, and their sizes are all (c, h, w), where, c is the number of channels, h is the height, and w is the width.
- the vector maps ref_e and pre_e corresponding to the pixel positions of the target object are respectively extracted from the vector map of the first frame and the vector map of the previous frame.
- the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
- FIG. 15 shows a schematic block diagram of an example electronic device 1500 that may be used to implement embodiments of the present disclosure.
- Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers.
- Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices.
- the components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
- the device 1500 includes a computing unit 1501 that can execute according to a computer program stored in a read-only memory (ROM) 1502 or loaded from a storage unit 1508 into a random-access memory (RAM) 1503. Various appropriate actions and treatments. In the RAM 1503, various programs and data necessary for the operation of the device 1500 can also be stored.
- the computing unit 1501, ROM 1502, and RAM 1503 are connected to each other through a bus 1504.
- An input/output (I/O) interface 1505 is also connected to the bus 1504 .
- the I/O interface 1505 includes: an input unit 1506, such as a keyboard, a mouse, etc.; an output unit 1507, such as various types of displays, speakers, etc.; a storage unit 1508, such as a magnetic disk, an optical disk, etc. ; and a communication unit 1509, such as a network card, a modem, a wireless communication transceiver, and the like.
- the communication unit 1509 allows the device 1500 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
- the computing unit 1501 may be various general-purpose and/or special-purpose processing components with processing and computing capabilities. Some examples of computing units 1501 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc.
- the computing unit 1501 executes various methods and processes described above, such as the object segmentation method.
- the object segmentation method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1508 .
- part or all of the computer program may be loaded and/or installed on the device 1500 via the ROM 1502 and/or the communication unit 1509.
- the computer program When the computer program is loaded into RAM 1503 and executed by computing unit 1501, one or more steps of the object segmentation method described above can be performed.
- the calculation unit 1501 may be configured to execute the object segmentation method in any other appropriate manner (for example, by means of firmware).
- Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips Implemented in a system of systems (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof.
- FPGAs field programmable gate arrays
- ASICs application specific integrated circuits
- ASSPs application specific standard products
- SOC system of systems
- CPLD load programmable logic device
- computer hardware firmware, software, and/or combinations thereof.
- programmable processor can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.
- Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented.
- the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
- a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
- a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
- a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
- machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
- RAM random access memory
- ROM read only memory
- EPROM or flash memory erasable programmable read only memory
- CD-ROM compact disk read only memory
- magnetic storage or any suitable combination of the foregoing.
- the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer.
- a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
- a keyboard and pointing device eg, a mouse or a trackball
- Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
- the systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
- the components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include: local area networks (LANs), wide area networks (WANs), the Internet, and blockchain networks.
- a computer system may include clients and servers.
- Clients and servers are generally remote from each other and typically interact through a communication network.
- the relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
- the server can be a cloud server, also known as cloud computing server or cloud host, which is a host product in the cloud computing service system to solve the problem of traditional physical host and VPS service ("Virtual Private Server", or "VPS”) Among them, there are defects such as difficult management and weak business scalability.
- the server can also be a server of a distributed system, or a server combined with a blockchain.
- steps may be reordered, added or deleted using the various forms of flow shown above.
- each step described in the present disclosure may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure can be achieved, no limitation is imposed herein.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
Abstract
一种目标分割的方法、装置及电子设备,涉及人工智能领域,具体涉及计算机视觉和深度学习技术,具体可用于智慧城市和智能交通场景下。具体实现方案为:根据待识别视频生成待识别帧、所述待识别帧的前一帧和参考帧,输入编码网络,生成待识别帧特征图、目标物体参考帧特征图和目标物体前一帧特征图;并生成第一相关矩阵和第二相关矩阵;生成第一相关特征图和第二相关特征图,并根据所述待识别帧特征图生成当前帧目标分割图像。本公开实施例可以检测待识别帧中的目标。本公开实施例根据只包含目标物体的参考帧和前一帧特征图来获取和待识别帧特征图的相关矩阵,注意力集中到目标物体上,提高了识别目标物体的准确性。
Description
相关申请的交叉引用
本公开基于申请号为202110736166.X、申请日为2021年06月30日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本公开作为参考。
本公开涉及人工智能领域,具体涉及计算机视觉和深度学习技术,具体可用于智慧城市和智能交通场景下,尤其涉及一种目标分割的方法、装置及电子设备。
随着人工智能相关技术的发展与应用,越来越多的领域显露出对智能化、自动化技术的强烈需求,其中短视频领域就是其中之一。在短视频领域中,视频目标分割方法的使用前景非常好,无论是视频中指定目标的抠除,背景虚化等都是比较依赖于视频目标分割方法的。所以视频目标分割方法的发展对短视频处理的智能化,特效处理等具有十分重要的意义。
但是现在视频目标分割方法中检测目标物体的准确度较低,目前缺乏可以较为精确地检测目标物体的视频目标分割方法。
发明内容
本公开提供了一种用于目标分割的方法、装置、电子设备以及存储介质。
根据本公开的第一方面,提供了一种目标分割的方法,包括:
根据待识别视频生成待识别帧、所述待识别帧的前一帧和参考帧,所述参考帧为所述待识别视频的第一帧;
将所述待识别帧、所述前一帧和所述参考帧输入编码网络,并生成待识别帧特征图、目标物体参考帧特征图和目标物体前一帧特征图;
根据所述待识别帧特征图、所述目标物体参考帧特征图和所述目标物体前一帧特征图生成第一相关矩阵和第二相关矩阵;
根据所述第一相关矩阵、第二相关矩阵、所述目标物体参考帧特征图和所述目标物体前一帧特征图生成第一相关特征图和第二相关特征图;
根据所述第一相关特征图、所述第二相关特征图和所述待识别帧特征图生成当前帧目标分割图像。
可选地,所述生成待识别帧特征图、目标物体参考帧特征图和目标物体前一帧特征图,包括:
提取所述待识别帧、所述前一帧和所述参考帧的特征,以生成待识别帧特征图、前一帧特征图和参考帧特征图;
根据所述参考帧特征图和所述参考帧的目标物体遮罩生成目标物体参考帧特征图;
根据所述前一帧特征图和所述前一帧的目标物体遮罩生成所述目标物体前一帧特征图。
可选地,所述根据所述待识别帧特征图、所述目标物体参考帧特征图和所述目标物体前一帧特征图生成第一相关矩阵和第二相关矩阵,包括:
根据所述待识别帧特征图和所述目标物体参考帧特征图生成所述第一相关矩阵;
根据所述待识别帧特征图和所述目标物体前一帧特征图生成所述第二相关矩阵。
可选地,所述根据所述待识别帧特征图和所述目标物体参考帧特征图生成所述第一相关矩阵,包括:
根据所述待识别帧特征图和所述目标物体参考帧特征图生成参考相关矩阵;
对所述参考相关矩阵进行归一化处理,以生成第二参考相关矩阵;
生成第二参考相关矩阵每一行中的参考值,并根据所述参考值生成所述第一相关矩阵,其中,所述参考值大于同一行中其他值。
可选地,所述根据所述待识别帧特征图和所述目标物体前一帧特征图生成所述第二相关矩阵,包括:
根据所述待识别帧特征图和所述目标物体前一帧特征图生成前一帧相关矩阵;
对所述前一帧相关矩阵进行归一化处理,以生成第二前一帧相关矩阵;
生成第二前一帧相关矩阵每一行中的参考值,并根据所述参考值生成所述 第二相关矩阵,其中,所述参考值大于同一行中其他值。
可选地,所述根据所述第一相关矩阵、第二相关矩阵和所述目标物体参考帧特征图和所述目标物体前一帧特征图生成第一相关特征图、第二相关特征图,包括:
将所述第一相关矩阵与所述目标物体参考帧特征图点对点相乘,以生成所述第一相关特征图;
将所述第二相关矩阵与所述目标物体前一帧帧特征图点对点相乘,以生成所述第二相关特征图。
可选地,所述根据所述第一相关特征图、所述第二相关特征图和所述待识别帧特征图生成当前帧目标分割图像,包括:
根据所述第一相关特征图、第二相关特征图和所述待识别帧特征图生成融合特征图;
将所述融合特征图输入解码网络,并生成当前帧目标分割图像。
可选地,所述根据所述第一相关特征图、第二相关特征图和所述待识别帧特征图生成融合特征图,包括:
将所述第一相关特征图、所述第二相关特征图和所述待识别帧特征图拼接,以生成所述融合特征图。
根据本公开的第二方面,提供了一种目标分割的装置,包括:
视频帧生成模块,用于根据待识别视频生成待识别帧、所述待识别帧的前一帧和参考帧,所述参考帧为所述待识别视频的第一帧;
特征提取模块,用于将所述待识别帧、所述前一帧和所述参考帧输入编码网络,并生成待识别帧特征图、目标物体参考帧特征图和目标物体前一帧特征图;
相关矩阵生成模块,用于根据所述待识别帧特征图、所述目标物体参考帧特征图和所述目标物体前一帧特征图生成第一相关矩阵和第二相关矩阵;
特征图生成模块,用于根据所述第一相关矩阵、第二相关矩阵、所述目标物体参考帧特征图和所述目标物体前一帧特征图生成第一相关特征图和第二相关特征图;
目标分割模块,用于根据所述第一相关特征图、所述第二相关特征图和所述待识别帧特征图生成当前帧目标分割图像。
可选地,所述特征提取模块,包括:
特征提取子模块,用于提取所述待识别帧、所述前一帧和所述参考帧的特征,以生成待识别帧特征图、前一帧特征图和参考帧特征图;
第一遮罩子模块,用于根据所述参考帧特征图和所述参考帧的目标物体遮罩生成目标物体参考帧特征图;
第二遮罩子模块,用于根据所述前一帧特征图和所述前一帧的目标物体遮罩生成所述目标物体前一帧特征图。
可选地,所述相关矩阵生成模块,包括:
第一相关矩阵生成子模块,用于根据所述待识别帧特征图和所述目标物体参考帧特征图生成所述第一相关矩阵;
第二相关矩阵生成子模块,用于根据所述待识别帧特征图和所述目标物体前一帧特征图生成所述第二相关矩阵。
可选地,所述第一相关矩阵生成子模块,包括:
参考相关矩阵生成单元,用于根据所述待识别帧特征图和所述目标物体参考帧特征图生成参考相关矩阵;
第二参考相关矩阵生成单元,用于对所述参考相关矩阵进行归一化处理,以生成第二参考相关矩阵;
第一相关矩阵生成单元,用于生成第二参考相关矩阵每一行中的参考值,并根据所述参考值生成所述第一相关矩阵,其中,所述参考值大于同一行中其他值。
可选地,所述第二相关矩阵生成子模块,包括:
前一帧相关矩阵生成单元,用于根据所述待识别帧特征图和所述目标物体前一帧特征图生成前一帧相关矩阵;
第二前一帧相关矩阵生成单元,用于对所述前一帧相关矩阵进行归一化处理,以生成第二前一帧相关矩阵;
第二相关矩阵生成单元,用于生成第二前一帧相关矩阵每一行中的参考值,并根据所述参考值生成所述第二相关矩阵,其中,所述参考值大于同一行中其他值。
可选地,所述特征图生成模块,包括:
第一相关特征图生成子模块,用于将所述第一相关矩阵与所述目标物体参考帧特征图点对点相乘,以生成所述第一相关特征图;
第二相关特征图生成子模块,用于将所述第二相关矩阵与所述目标物体前 一帧帧特征图点对点相乘,以生成所述第二相关特征图。
可选地,所述根据所述目标分割模块,包括:
特征融合子模块,用于根据所述第一相关特征图、第二相关特征图和所述待识别帧特征图生成融合特征图;
解码子模块,用于将所述融合特征图输入解码网络,并生成当前帧目标分割图像。
可选地,所述特征融合子模块,包括:
特征融合单元,用于将所述第一相关特征图、所述第二相关特征图和所述待识别帧特征图拼接,以生成所述融合特征图。
根据本公开的第三方面,提供了一种电子设备,包括:
至少一个处理器;以及
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行所述第一方面中任一项所述的方法。
根据本公开的第四方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行根据所述第一方面中任一项所述的方法。
根据本公开的第五方面,提供了一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现根据所述第一方面中任一项所述的方法。
本公开实施例具有以下有益效果:
根据只包含目标物体的参考帧和前一帧特征图来获取和待识别帧特征图的相关矩阵,注意力集中到目标物体上,提高了识别目标物体的准确性。
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。
附图用于更好地理解本方案,不构成对本公开的限定。其中:
图1是根据本公开实施例提供的目标分割的方法的流程示意图;
图2是根据本公开实施例提供的目标分割的方法的流程示意图;
图3是根据本公开实施例提供的目标分割的方法的流程示意图;
图4是根据本公开实施例提供的目标分割的方法的流程示意图;
图5是根据本公开实施例提供的目标分割的方法的流程示意图;
图6是根据本公开实施例提供的目标分割的方法的流程示意图;
图7是根据本公开实施例提供的目标分割的方法的流程示意图;
图8是根据本公开实施例提供的目标分割的装置的结构示意图;
图9是根据本公开实施例提供的目标分割的装置的结构示意图;
图10是根据本公开实施例提供的目标分割的装置的结构示意图;
图11是根据本公开实施例提供的目标分割的装置的结构示意图;
图12是根据本公开实施例提供的目标分割的装置的结构示意图;
图13是根据本公开实施例提供的目标分割的装置的结构示意图;
图14是根据本公开实施例提供的目标分割的装置的结构示意图;
图15是用来实现本公开实施例的目标分割的方法的电子设备的框图;
图16是根据本公开实施例提供的目标分割的装置的结构示意图。
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。
随着人工智能相关技术的发展与应用,越来越多的领域显露出对智能化、自动化技术的强烈需求,其中短视频领域就是其中之一。在短视频领域中,视频目标分割方法的使用前景非常好,无论是视频中指定目标的抠除,背景虚化等都是比较依赖于视频目标分割方法的。所以视频目标分割方法的发展对短视频处理的智能化,特效处理等具有十分重要的意义。
现在视频目标分割方法中,有一个比较难解决的问题,即指定目标在视频中的遮挡问题,当目标被遮挡后再出现是会导致目标物体被分割错误,就是所以本文基于视频目标分割领域,目标物体被遮挡后再出现,容易导致目标物体被错误分割。目前常见的方案中没有特别成熟的方法去处理这种遮挡的问题。
常见的方法有通过读取历史帧的信息,并提取历史帧中所述目标物体所有出现位置的向量来生成一种实体注意力instance attention,不过该方法会将提取出来的目标向量进行加和,将(c,h,w)的向量压缩成(c,1,1)的向量,之后将便可将该(c,1,1)向量添加到网络中辅助网络进行目标分割。这样在一定程度上可以解决目标遮挡的问题,但是该方法在处理时,将提取的向量压缩成(c,1,1)之后,便失去了该目标所有的位置、形状、相邻向量相关性等相关的信息,所以该方法还有很大改进的空间。
图1是根据本公开实施例提供的目标分割的方法的流程示意图。如图1所示,所述目标分割的方法包括:
步骤101,根据待识别视频生成待识别帧、所述待识别帧的前一帧和参考帧,所述参考帧为所述待识别视频的第一帧;
本公开可用于智慧城市和智能交通场景下,智慧城市就是运用信息和通信技术手段感测、分析、整合城市运行核心系统的各项关键信息。智慧城市建设要求通过以移动技术为代表的物联网、云计算等新一代信息技术应用实现全面感知、泛在互联、普适计算与融合应用。智慧城市的一个重要感知信息即为监控摄像头获取的视频信息。
本实施例即可对所述视频信息进行进一步的挖掘,首先通过摄像头采集所述待识别视频,选取其中一帧为待识别帧。本公开利用历史帧,也即所述待识别帧的前一帧和参考帧来增强所述待识别帧中目标物体的特征,所述前一帧为所述待识别帧相邻的前一帧,所述参考帧为所述待识别视频的第一帧。
步骤102,将所述待识别帧、所述前一帧和所述参考帧输入编码网络,并生成待识别帧特征图、目标物体参考帧特征图和目标物体前一帧特征图;
所述编码网络即为神经网络中的编码器,所述编码网络用于对所述待识别帧、所述待识别帧和所述参考帧进行下采样,以提取所述待识别帧、所述前一帧和所述参考帧的高维特征。即生成所述待识别帧特征图、
同时为了后续获取相关矩阵,本公开利用所述前一帧和所述参考帧对应的目标物体遮罩来获取所述目标物体参考帧特征图和目标物体前一帧特征图。
步骤103,根据所述待识别帧特征图、所述目标物体参考帧特征图和所述目标物体前一帧特征图生成第一相关矩阵和第二相关矩阵;
相关矩阵,也即相关性矩阵(correlation matrix)。是一个范式,矩阵 中的每个元素用来表示一个特征图(feature map)中的局部特征向量和另一个特征图中的局部特征向量之间的相关性,通常是由两个局部特征向量的点积来表示。两个尺寸为H*W*d的特征图的相关性矩阵的尺寸即为(H*W)*(H*W),所述H为高、W为宽、d为通道数。相关性是衡量特征匹配程度的依据,特征则会根据不同的任务有着不同的表示,通常是基于形状、颜色、纹理的语义特征。
本公开利用所述相关矩阵来表征所述目标物体参考帧特征图中像素和所述目标物体前一帧特征图中像素与所述待识别帧特征图中像素的相关度,所述待识别帧特征图中像素对应特征向量与所述目标物体参考帧特征图中像素和所述目标物体前一帧特征图中像素对应特征向量相关性越强,说明所述待识别帧特征图中像素越有可能是所述目标物体的像素。
步骤104,根据所述第一相关矩阵、第二相关矩阵、所述目标物体参考帧特征图和所述目标物体前一帧特征图生成第一相关特征图和第二相关特征图;
所述第一相关矩阵、第二相关矩阵和所述待识别帧特征图即可生成所述待识别帧目标特征图,根据所述相关矩阵可以强化所述待识别帧特征图的特征,来提高所述目标物体的检测准确度。
步骤105,根据所述第一相关特征图、所述第二相关特征图和所述待识别帧特征图生成当前帧目标分割图像。
将所述第一相关特征图、所述第二相关特征图分布与所述待识别帧特征图中像素点对点相乘,可以生成第一相关特征图、第二相关特征图。再将所述第一相关特征图、第二相关特征图和所述待识别帧特征图进行拼接concat,增强与目标物体相关像素的特征,以生成融合特征图。
将所述融合特征图输入解码器,即可获取所述目标分割图像,所述解码器用于进行上采样,将所述目标分割图像还原为所述待检测帧的大小。获取所述待检测帧中属于目标物体的像素。
图2是根据本公开实施例提供的目标分割的方法的流程示意图。如图2所示,所述目标分割的方法包括:
步骤201,提取所述待识别帧、所述前一帧和所述参考帧的特征,以生成所述待识别帧特征图、前一帧特征图和参考帧特征图;
本公开利用神经网络提取所述待识别帧、所述前一帧和所述参考帧的特征,所述提取特征的方法为公知且多样的,不作为本公开的保护内容。
在一种可能的实施例中,采用随机下采样的方法进行特征提取,并生成所述待识别帧特征图、前一帧特征图和参考帧特征图。
步骤202,根据所述参考帧特征图和所述参考帧的目标物体遮罩生成目标物体参考帧特征图;
所述参考帧已经通过所述目标分割方法获取了其中目标物体的遮罩,将所述参考帧的目标物体遮罩和所述参考帧特征图的像素点对点相乘,即可生成所述目标物体参考帧特征图。本步骤可以获取只包含目标物体的参考帧目标物体特征图,方便后续获取所述第一相关矩阵。
步骤203,根据所述前一帧特征图和所述前一帧的目标物体遮罩生成所述目标物体前一帧特征图。
所述前一帧已经通过所述目标分割方法获取了其中目标物体的遮罩,将所述前一帧的目标物体遮罩和所述参考帧特征图的像素点对点相乘,即可生成所述目标物体前一帧特征图。本步骤可以获取只包含目标物体的前一帧目标物体特征图,方便后续获取所述第二相关矩阵。
图3是根据本公开实施例提供的目标分割的方法的流程示意图。如图3所示,所述目标分割的方法包括:
步骤301,根据所述待识别帧特征图和所述目标物体参考帧特征图生成所述第一相关矩阵;
本公开根据所述待识别帧特征图和所述目标物体参考帧特征图生成所述第一相关矩阵,来表征所述待识别帧特征图中像素与所述目标物体参考帧特征图中属于目标物体像素的相关性,方便后续特征提取。
步骤302,根据所述待识别帧特征图和所述目标物体前一帧特征图生成所述第二相关矩阵。
本公开同时根据所述待识别帧特征图和所述目标物体参考帧特征图生成所述第二相关矩阵,来表征所述待识别帧特征图中像素与所述目标物体前一帧特征图中属于目标物体像素的相关性,方便后续特征提取。
图4是根据本公开实施例提供的目标分割的方法的流程示意图。如图4所示,所述目标分割的方法包括:
步骤401,根据所述待识别帧特征图和所述目标物体参考帧特征图生成参考相关矩阵;
首先根据待识别帧特征图和所述目标物体参考帧特征图生成参考相关矩阵,所述相关矩阵的生成方法是多样的。在一种可能的实施例中,计算 所述待识别帧特征图中像素对应特征向量和所述目标物体参考帧特征图中像素对应特征向量的欧氏距离,以所述欧式距离作为所述参考相关矩阵中元素的值,以生成所述参考相关矩阵。
步骤402,对所述参考相关矩阵进行归一化处理,以生成第二参考相关矩阵;
将所述参考相关矩阵进行归一化,以减小后续目标分割的误差,所述归一化的方法有多种,在一种可能的实施例中,采用softmax函数进行所述归一化处理。归一化处理后,生成第二参考相关矩阵,所述第二参考相关矩阵的任意一行中,所有元素相加的结果都为1。
步骤403,生成第二参考相关矩阵每一行中的参考值,并根据所述参考值生成所述第一相关矩阵,其中,所述参考值大于同一行中其他值。
为了去除相关性小的像素,本公开只保留所述第二参考相关矩阵中每一行值最大的元素,所述值最大的元素的值即为所述参考值。在一种可能的实施例中,所述第二参考帧相关矩阵为(h×w,N)的矩阵,保留所述参考值后,生成(h×w,1)的矩阵,再进行重塑,即可获取(h,w)的第一相关矩阵。
图5是根据本公开实施例提供的目标分割的方法的流程示意图。如图5所示,所述目标分割的方法包括:
步骤501,根据所述待识别帧特征图和所述目标物体前一帧特征图生成前一帧相关矩阵;
首先根据待识别帧特征图和所述目标物体前一帧特征图生成前一帧相关矩阵,所述相关矩阵的生成方法是多样的。在一种可能的实施例中,计算所述待识别帧特征图中像素对应特征向量和所述目标物体前一帧特征图中像素对应特征向量的欧氏距离,以所述欧式距离作为所述前一帧相关矩阵中元素的值,以生成所述前一帧相关矩阵。
步骤502,对所述前一帧相关矩阵进行归一化处理,以生成第二前一帧相关矩阵;
将所述前一帧相关矩阵进行归一化,以减小后续目标分割的误差,所述归一化的方法有多种,在一种可能的实施例中,采用softmax函数进行所述归一化处理。归一化处理后,生成所述第二前一帧相关矩阵,所述第二前一帧相关矩阵的任意一行中,所有元素相加的结果都为1。
步骤503,生成第二前一帧相关矩阵每一行中的参考值,并根据所述 参考值生成所述第二相关矩阵,其中,所述参考值大于同一行中其他值。
为了去除相关性小的像素,本公开只保留所述成第二前一帧相关矩阵中每一行值最大的元素,所述值最大的元素的值即为所述参考值。在一种可能的实施例中,所述第二前一帧相关矩阵为(h×w,N)的矩阵,保留所述参考值后,生成(h×w,1)的矩阵,再进行重塑,即可获取(h,w)的第二相关矩阵。
图6是根据本公开实施例提供的目标分割的方法的流程示意图。如图6所示,所述目标分割的方法包括:
步骤601,将所述第一相关矩阵与所述目标物体参考帧特征图点对点相乘,以生成所述第一相关特征图;
为了增强所述目标物体参考帧特征图中的特征,本公开将所述第一相关矩阵与所述目标物体参考帧特征图中像素点对点相乘,以获取所述第一相关特征图。所述第一相关矩阵与所述目标物体参考帧特征图大小相同。
步骤602,将所述第二相关矩阵与所述目标物体前一帧帧特征图点对点相乘,以生成所述第二相关特征图。
为了增强所述目标物体参考帧特征图中的特征,本公开将所述第二相关矩阵与所述目标物体前一帧特征图中像素点对点相乘,以获取所述第二相关特征图。所述第二相关矩阵与所述目标物体前一帧特征图大小相同。
图7是根据本公开实施例提供的目标分割的方法的流程示意图。如图7所示,所述目标分割的方法包括:
步骤701,根据所述第一相关特征图、第二相关特征图和所述待识别帧特征图生成融合特征图;
同样为了增强所述目标物体的特征,本公开将所述第一相关特征图、第二相关特征图和所述待识别帧特征图中特征融合在一起,以生成融合特征图。所述融合的方法多样,一种可能的实施例中,将所述第一相关特征图、第二相关特征图和所述待识别帧特征图进行拼接concat,增加每个像素的通道数量,以生成所述融合特征图。
步骤702,将所述融合特征图输入解码网络,并生成当前帧目标分割图像。
利用所述解码网络来对所述融合特征图进行上采样,来恢复特征,通过所述目标分割图像,即可获取属于目标物体的像素。
可选的,所述根据所述第一相关特征图、第二相关特征图和所述待识 别帧特征图生成融合特征图,包括:
将所述第一相关特征图、所述第二相关特征图和所述待识别帧特征图拼接,以生成所述融合特征图。
所述拼接concat可以增加图像的维度,将特征融合在一起,方便后续的目标分割。
图8是根据本公开实施例提供的目标分割的装置的结构示意图。如图8所示,所述目标分割的装置800包括:
视频帧生成模块810,用于根据待识别视频生成待识别帧、所述待识别帧的前一帧和参考帧,所述参考帧为所述待识别视频的第一帧;
本公开可用于智慧城市和智能交通场景下,智慧城市就是运用信息和通信技术手段感测、分析、整合城市运行核心系统的各项关键信息。智慧城市建设要求通过以移动技术为代表的物联网、云计算等新一代信息技术应用实现全面感知、泛在互联、普适计算与融合应用。智慧城市的一个重要感知信息即为监控摄像头获取的视频信息。
本实施例即可对所述视频信息进行进一步的挖掘,首先通过摄像头采集所述待识别视频,选取其中一帧为待识别帧。本公开利用历史帧,也即所述待识别帧的前一帧和参考帧来增强所述待识别帧中目标物体的特征,所述前一帧为所述待识别帧相邻的前一帧,所述参考帧为所述待识别视频的第一帧。
特征提取模块820,用于将所述待识别帧、所述前一帧和所述参考帧输入编码网络,并生成待识别帧特征图、目标物体参考帧特征图和目标物体前一帧特征图;
所述编码网络即为神经网络中的编码器,所述编码网络用于对所述待识别帧、所述待识别帧和所述参考帧进行下采样,以提取所述待识别帧、所述前一帧和所述参考帧的高维特征。即生成所述待识别帧特征图、
同时为了后续获取相关矩阵,本公开利用所述前一帧和所述参考帧对应的目标物体遮罩来获取所述目标物体参考帧特征图和目标物体前一帧特征图。
相关矩阵生成模块830,用于根据所述待识别帧特征图、所述目标物体参考帧特征图和所述目标物体前一帧特征图生成第一相关矩阵和第二相关矩阵;
相关矩阵,也即相关性矩阵(correlation matrix)。是一个范式,矩阵 中的每个元素用来表示一个特征图(feature map)中的局部特征向量和另一个特征图中的局部特征向量之间的相关性,通常是由两个局部特征向量的点积来表示。两个尺寸为H*W*d的特征图的相关性矩阵的尺寸即为(H*W)*(H*W),所述H为高、W为宽、d为通道数。相关性是衡量特征匹配程度的依据,特征则会根据不同的任务有着不同的表示,通常是基于形状、颜色、纹理的语义特征。
本公开利用所述相关矩阵来表征所述目标物体参考帧特征图中像素和所述目标物体前一帧特征图中像素与所述待识别帧特征图中像素的相关度,所述待识别帧特征图中像素对应特征向量与所述目标物体参考帧特征图中像素和所述目标物体前一帧特征图中像素对应特征向量相关性越强,说明所述待识别帧特征图中像素越有可能是所述目标物体的像素。
特征图生成模块840,用于根据所述第一相关矩阵、第二相关矩阵、所述目标物体参考帧特征图和所述目标物体前一帧特征图生成第一相关特征图和第二相关特征图;
所述第一相关矩阵、第二相关矩阵和所述待识别帧特征图即可生成所述待识别帧目标特征图,根据所述相关矩阵可以强化所述待识别帧特征图的特征,来提高所述目标物体的检测准确度。
目标分割模块850,用于根据所述第一相关特征图、所述第二相关特征图和所述待识别帧特征图生成当前帧目标分割图像。
将所述第一相关特征图、所述第二相关特征图分布与所述待识别帧特征图中像素点对点相乘,可以生成第一相关特征图、第二相关特征图。再将所述第一相关特征图、第二相关特征图和所述待识别帧特征图进行拼接concat,增强与目标物体相关像素的特征,以生成融合特征图。
将所述融合特征图输入解码器,即可获取所述目标分割图像,所述解码器用于进行上采样,将所述目标分割图像还原为所述待检测帧的大小。获取所述待检测帧中属于目标物体的像素。
图9是根据本公开实施例提供的目标分割的装置的结构示意图。如图9所示,所述目标分割的装置900包括:
特征提取子模块910,用于提取所述待识别帧、所述前一帧和所述参考帧的特征,以生成所述待识别帧特征图、前一帧特征图和参考帧特征图;
本公开利用神经网络提取所述待识别帧、所述前一帧和所述参考帧的特征,所述提取特征的方法为公知且多样的,不作为本公开的保护内容。
在一种可能的实施例中,采用随机下采样的方法进行特征提取,并生成所述待识别帧特征图、前一帧特征图和参考帧特征图。
第一遮罩子模块920,用于根据所述参考帧特征图和所述参考帧的目标物体遮罩生成目标物体参考帧特征图;
所述参考帧已经通过所述目标分割方法获取了其中目标物体的遮罩,将所述参考帧的目标物体遮罩和所述参考帧特征图的像素点对点相乘,即可生成所述目标物体参考帧特征图。本步骤可以获取只包含目标物体的参考帧目标物体特征图,方便后续获取所述第一相关矩阵。
第二遮罩子模块930,用于根据所述前一帧特征图和所述前一帧的目标物体遮罩生成所述目标物体前一帧特征图。
所述前一帧已经通过所述目标分割方法获取了其中目标物体的遮罩,将所述前一帧的目标物体遮罩和所述参考帧特征图的像素点对点相乘,即可生成所述目标物体前一帧特征图。本步骤可以获取只包含目标物体的前一帧目标物体特征图,方便后续获取所述第二相关矩阵。
图10是根据本公开实施例提供的目标分割的装置的结构示意图。如图10所示,所述目标分割的装置1000包括:
第一相关矩阵生成子模块1010,用于根据所述待识别帧特征图和所述目标物体参考帧特征图生成所述第一相关矩阵;
本公开根据所述待识别帧特征图和所述目标物体参考帧特征图生成所述第一相关矩阵,来表征所述待识别帧特征图中像素与所述目标物体参考帧特征图中属于目标物体像素的相关性,方便后续特征提取。
第二相关矩阵生成子模块1020,用于根据所述待识别帧特征图和所述目标物体前一帧特征图生成所述第二相关矩阵。
本公开同时根据所述待识别帧特征图和所述目标物体参考帧特征图生成所述第二相关矩阵,来表征所述待识别帧特征图中像素与所述目标物体前一帧特征图中属于目标物体像素的相关性,方便后续特征提取。
图11是根据本公开实施例提供的目标分割的装置的结构示意图。如图11所示,所述目标分割的装置1100包括:
参考相关矩阵生成单元1110,用于根据所述待识别帧特征图和所述目标物体参考帧特征图生成参考相关矩阵;
首先根据待识别帧特征图和所述目标物体参考帧特征图生成参考相关矩阵,所述相关矩阵的生成方法是多样的。在一种可能的实施例中,计算 所述待识别帧特征图中像素对应特征向量和所述目标物体参考帧特征图中像素对应特征向量的欧氏距离,以所述欧式距离作为所述参考相关矩阵中元素的值,以生成所述参考相关矩阵。
第二参考相关矩阵生成单元1120,用于对所述参考相关矩阵进行归一化处理,以生成第二参考相关矩阵;
将所述参考相关矩阵进行归一化,以减小后续目标分割的误差,所述归一化的方法有多种,在一种可能的实施例中,采用softmax函数进行所述归一化处理。归一化处理后,生成第二参考相关矩阵,所述第二参考相关矩阵的任意一行中,所有元素相加的结果都为1。
第一相关矩阵生成单元1130,用于生成第二参考相关矩阵每一行中的参考值,并根据所述参考值生成所述第一相关矩阵,其中,所述参考值大于同一行中其他值。
为了去除相关性小的像素,本公开只保留所述第二参考相关矩阵中每一行值最大的元素,所述值最大的元素的值即为所述参考值。在一种可能的实施例中,所述第二参考帧相关矩阵为(h×w,N)的矩阵,保留所述参考值后,生成(h×w,1)的矩阵,再进行重塑,即可获取(h,w)的第一相关矩阵。
图12是根据本公开实施例提供的目标分割的装置的结构示意图。如图12所示,所述目标分割的装置1200包括:
前一帧相关矩阵生成单元1210,用于根据所述待识别帧特征图和所述目标物体前一帧特征图生成前一帧相关矩阵;
首先根据待识别帧特征图和所述目标物体前一帧特征图生成前一帧相关矩阵,所述相关矩阵的生成方法是多样的。在一种可能的实施例中,计算所述待识别帧特征图中像素对应特征向量和所述目标物体前一帧特征图中像素对应特征向量的欧氏距离,以所述欧式距离作为所述前一帧相关矩阵中元素的值,以生成所述前一帧相关矩阵。
第二前一帧相关矩阵生成单元1220,用于对所述前一帧相关矩阵进行归一化处理,以生成第二前一帧相关矩阵;
将所述前一帧相关矩阵进行归一化,以减小后续目标分割的误差,所述归一化的方法有多种,在一种可能的实施例中,采用softmax函数进行所述归一化处理。归一化处理后,生成所述第二前一帧相关矩阵,所述第二前一帧相关矩阵的任意一行中,所有元素相加的结果都为1。第二相关 矩阵生成单元。
1230,用于生成第二前一帧相关矩阵每一行中的参考值,并根据所述参考值生成所述第二相关矩阵,其中,所述参考值大于同一行中其他值。
为了去除相关性小的像素,本公开只保留所述成第二前一帧相关矩阵中每一行值最大的元素,所述值最大的元素的值即为所述参考值。在一种可能的实施例中,所述第二前一帧相关矩阵为(h×w,N)的矩阵,保留所述参考值后,生成(h×w,1)的矩阵,再进行重塑,即可获取(h,w)的第二相关矩阵。
图13是根据本公开实施例提供的目标分割的装置的结构示意图。如图13所示,所述目标分割的装置1300包括:
第一相关特征图生成子模块1310,用于将所述第一相关矩阵与所述目标物体参考帧特征图点对点相乘,以生成所述第一相关特征图;
为了增强所述目标物体参考帧特征图中的特征,本公开将所述第一相关矩阵与所述目标物体参考帧特征图中像素点对点相乘,以获取所述第一相关特征图。所述第一相关矩阵与所述目标物体参考帧特征图大小相同。
第二相关特征图生成子模块1320,用于将所述第二相关矩阵与所述目标物体前一帧帧特征图点对点相乘,以生成所述第二相关特征图。
为了增强所述目标物体参考帧特征图中的特征,本公开将所述第二相关矩阵与所述目标物体前一帧特征图中像素点对点相乘,以获取所述第二相关特征图。所述第二相关矩阵与所述目标物体前一帧特征图大小相同。
图14是根据本公开实施例提供的目标分割的装置的结构示意图。如图14所示,所述目标分割的装置1400包括:
特征融合子模块1410,用于根据所述第一相关特征图、第二相关特征图和所述待识别帧特征图生成融合特征图;
同样为了增强所述目标物体的特征,本公开将所述第一相关特征图、第二相关特征图和所述待识别帧特征图中特征融合在一起,以生成融合特征图。所述融合的方法多样,一种可能的实施例中,将所述第一相关特征图、第二相关特征图和所述待识别帧特征图进行拼接concat,增加每个像素的通道数量,以生成所述融合特征图。
解码子模块1420,用于将所述融合特征图输入解码网络,并生成当前帧目标分割图像。
利用所述解码网络来对所述融合特征图进行上采样,来恢复特征,通 过所述目标分割图像,即可获取属于目标物体的像素。
可选的,所述特征融合子模块,包括:
特征融合单元,用于将所述第一相关特征图、所述第二相关特征图和所述待识别帧特征图拼接,以生成所述融合特征图。
所述拼接concat可以增加图像的维度,将特征融合在一起,方便后续的目标分割。
图16是根据本公开实施例提供的目标分割的装置的结构示意图,如图16所示,将第一帧ref_im,前一帧pre_im和当前帧cur_im共三帧图像都输入到网络中,经过特征提取网络,我们可以分别得到所述第一帧、所述前一帧和所述当前帧的向量图,分别用ref_emb,pre_emb和cur_emb表示,其尺寸都是(c,h,w),其中,c为通道数、h为高度、w为宽度。
然后根据第一帧的目标物体掩码ref_m和前一帧的目标物体掩码pre_m,分别从第一帧向量图和前一帧向量图中提取目标物体对应像素位置的向量图ref_e、pre_e。
分别计算当前帧向量图相对于第一帧和前一帧的相关性矩阵,并通过softmax计算得到当前帧每一个像素位置相对于第一帧和前一帧所有像素位置的归一化相关性表示。取归一化相关矩阵中每行的最大值,构建1×(c×h)的矩阵,并恢复成c×h的矩阵,也即cur_ref、cur_pre。
根据所述cur_ref和cur_pre更新第一帧和前一帧的向量图,也即concat,得到ref_e1、pre_e1。
最后我们将所述ref_e1、pre_e1拼接concat到cur_emb下,并输入所述解码网络,来获取目标分割图像,根据所述目标分割图像即可获取所述属于目标物体的像素。
根据本公开的实施例,本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。
图15示出了可以用来实施本公开的实施例的示例电子设备1500的示意性框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。
如图15所示,设备1500包括计算单元1501,其可以根据存储在只读存储器(ROM)1502中的计算机程序或者从存储单元1508加载到随机访问存储器(RAM)1503中的计算机程序,来执行各种适当的动作和处理。在RAM 1503中,还可存储设备1500操作所需的各种程序和数据。计算单元1501、ROM 1502以及RAM 1503通过总线1504彼此相连。输入/输出(I/O)接口1505也连接至总线1504。
设备1500中的多个部件连接至I/O接口1505,包括:输入单元1506,例如键盘、鼠标等;输出单元1507,例如各种类型的显示器、扬声器等;存储单元1508,例如磁盘、光盘等;以及通信单元1509,例如网卡、调制解调器、无线通信收发机等。通信单元1509允许设备1500通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。
计算单元1501可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元1501的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元1501执行上文所描述的各个方法和处理,例如所述目标分割方法。例如,在一些实施例中,所述目标分割方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元1508。在一些实施例中,计算机程序的部分或者全部可以经由ROM 1502和/或通信单元1509而被载入和/或安装到设备1500上。当计算机程序加载到RAM 1503并由计算单元1501执行时,可以执行上文描述的所述目标分割方法的一个或多个步骤。备选地,在其他实施例中,计算单元1501可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行所述目标分割方法。
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将 数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质 的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)、互联网和区块链网络。
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,又称为云计算服务器或云主机,是云计算服务体系中的一项主机产品,以解决了传统物理主机与VPS服务("Virtual Private Server",或简称"VPS")中,存在的管理难度大,业务扩展性弱的缺陷。服务器也可以为分布式系统的服务器,或者是结合了区块链的服务器。
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。
Claims (19)
- 一种目标分割的方法,包括:根据待识别视频生成待识别帧、所述待识别帧的前一帧和参考帧,所述参考帧为所述待识别视频的第一帧;将所述待识别帧、所述前一帧和所述参考帧输入编码网络,并生成待识别帧特征图、目标物体参考帧特征图和目标物体前一帧特征图;根据所述待识别帧特征图、所述目标物体参考帧特征图和所述目标物体前一帧特征图生成第一相关矩阵和第二相关矩阵;根据所述第一相关矩阵、第二相关矩阵、所述目标物体参考帧特征图和所述目标物体前一帧特征图生成第一相关特征图和第二相关特征图;根据所述第一相关特征图、所述第二相关特征图和所述待识别帧特征图生成当前帧目标分割图像。
- 根据权利要求1所述的方法,其中,所述生成待识别帧特征图、目标物体参考帧特征图和目标物体前一帧特征图,包括:提取所述待识别帧、所述前一帧和所述参考帧的特征,以生成所述待识别帧特征图、前一帧特征图和参考帧特征图;根据所述参考帧特征图和所述参考帧的目标物体遮罩生成目标物体参考帧特征图;根据所述前一帧特征图和所述前一帧的目标物体遮罩生成所述目标物体前一帧特征图。
- 根据权利要求1所述的方法,其中,所述根据所述待识别帧特征图、所述目标物体参考帧特征图和所述目标物体前一帧特征图生成第一相关矩阵和第二相关矩阵,包括:根据所述待识别帧特征图和所述目标物体参考帧特征图生成所述第一相关矩阵;根据所述待识别帧特征图和所述目标物体前一帧特征图生成所述第二相关矩阵。
- 根据权利要求3所述的方法,其中,所述根据所述待识别帧特征图和所述目标物体参考帧特征图生成所述第一相关矩阵,包括:根据所述待识别帧特征图和所述目标物体参考帧特征图生成参考相关矩 阵;对所述参考相关矩阵进行归一化处理,以生成第二参考相关矩阵;生成第二参考相关矩阵每一行中的参考值,并根据所述参考值生成所述第一相关矩阵,其中,所述参考值大于同一行中其他值。
- 根据权利要求3所述的方法,其中,所述根据所述待识别帧特征图和所述目标物体前一帧特征图生成所述第二相关矩阵,包括:根据所述待识别帧特征图和所述目标物体前一帧特征图生成前一帧相关矩阵;对所述前一帧相关矩阵进行归一化处理,以生成第二前一帧相关矩阵;生成第二前一帧相关矩阵每一行中的参考值,并根据所述参考值生成所述第二相关矩阵,其中,所述参考值大于同一行中其他值。
- 根据权利要求1所述的方法,其中,所述根据所述第一相关矩阵、第二相关矩阵和所述目标物体参考帧特征图和所述目标物体前一帧特征图生成第一相关特征图、第二相关特征图,包括:将所述第一相关矩阵与所述目标物体参考帧特征图点对点相乘,以生成所述第一相关特征图;将所述第二相关矩阵与所述目标物体前一帧帧特征图点对点相乘,以生成所述第二相关特征图。
- 根据权利要求1所述的方法,其中,所述根据所述第一相关特征图、所述第二相关特征图和所述待识别帧特征图生成当前帧目标分割图像,包括:根据所述第一相关特征图、第二相关特征图和所述待识别帧特征图生成融合特征图;将所述融合特征图输入解码网络,并生成当前帧目标分割图像。
- 根据权利要求7所述的方法,其中,所述根据所述第一相关特征图、第二相关特征图和所述待识别帧特征图生成融合特征图,包括:将所述第一相关特征图、所述第二相关特征图和所述待识别帧特征图拼接,以生成所述融合特征图。
- 一种目标分割的装置,包括:视频帧生成模块,用于根据待识别视频生成待识别帧、所述待识别帧的前 一帧和参考帧,所述参考帧为所述待识别视频的第一帧;特征提取模块,用于将所述待识别帧、所述前一帧和所述参考帧输入编码网络,并生成待识别帧特征图、目标物体参考帧特征图和目标物体前一帧特征图;相关矩阵生成模块,用于根据所述待识别帧特征图、所述目标物体参考帧特征图和所述目标物体前一帧特征图生成第一相关矩阵和第二相关矩阵;特征图生成模块,用于根据所述第一相关矩阵、第二相关矩阵、所述目标物体参考帧特征图和所述目标物体前一帧特征图生成第一相关特征图和第二相关特征图;目标分割模块,用于根据所述第一相关特征图、所述第二相关特征图和所述待识别帧特征图生成当前帧目标分割图像。
- 根据权利要求9所述的装置,其中,所述特征提取模块,包括:特征提取子模块,用于提取所述待识别帧、所述前一帧和所述参考帧的特征,以生成所述待识别帧特征图、前一帧特征图和参考帧特征图;第一遮罩子模块,用于根据所述参考帧特征图和所述参考帧的目标物体遮罩生成目标物体参考帧特征图;第二遮罩子模块,用于根据所述前一帧特征图和所述前一帧的目标物体遮罩生成所述目标物体前一帧特征图。
- 根据权利要求9所述的装置,其中,所述相关矩阵生成模块,包括:第一相关矩阵生成子模块,用于根据所述待识别帧特征图和所述目标物体参考帧特征图生成所述第一相关矩阵;第二相关矩阵生成子模块,用于根据所述待识别帧特征图和所述目标物体前一帧特征图生成所述第二相关矩阵。
- 根据权利要求11所述的装置,其中,所述第一相关矩阵生成子模块,包括:参考相关矩阵生成单元,用于根据所述待识别帧特征图和所述目标物体参考帧特征图生成参考相关矩阵;第二参考相关矩阵生成单元,用于对所述参考相关矩阵进行归一化处理,以生成第二参考相关矩阵;第一相关矩阵生成单元,用于生成第二参考相关矩阵每一行中的参考值,并根据所述参考值生成所述第一相关矩阵,其中,所述参考值大于同一行中其 他值。
- 根据权利要求11所述的装置,其中,所述第二相关矩阵生成子模块,包括:前一帧相关矩阵生成单元,用于根据所述待识别帧特征图和所述目标物体前一帧特征图生成前一帧相关矩阵;第二前一帧相关矩阵生成单元,用于对所述前一帧相关矩阵进行归一化处理,以生成第二前一帧相关矩阵;第二相关矩阵生成单元,用于生成第二前一帧相关矩阵每一行中的参考值,并根据所述参考值生成所述第二相关矩阵,其中,所述参考值大于同一行中其他值。
- 根据权利要求9所述的装置,其中,所述特征图生成模块,包括:第一相关特征图生成子模块,用于将所述第一相关矩阵与所述目标物体参考帧特征图点对点相乘,以生成所述第一相关特征图;第二相关特征图生成子模块,用于将所述第二相关矩阵与所述目标物体前一帧帧特征图点对点相乘,以生成所述第二相关特征图。
- 根据权利要求9所述的装置,其中,所述根据所述目标分割模块,包括:特征融合子模块,用于根据所述第一相关特征图、第二相关特征图和所述待识别帧特征图生成融合特征图;解码子模块,用于将所述融合特征图输入解码网络,并生成当前帧目标分割图像。
- 根据权利要求15所述的装置,其中,所述特征融合子模块,包括:特征融合单元,用于将所述第一相关特征图、所述第二相关特征图和所述待识别帧特征图拼接,以生成所述融合特征图。
- 一种电子设备,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-8中任一 项所述的方法。
- 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行根据权利要求1-8中任一项所述的方法。
- 一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现根据权利要求1-8中任一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020227030785A KR20220129093A (ko) | 2021-06-30 | 2021-12-08 | 타겟 분할 방법, 장치 및 전자 기기 |
JP2022581655A JP7372487B2 (ja) | 2021-06-30 | 2021-12-08 | オブジェクトセグメンテーション方法、オブジェクトセグメンテーション装置及び電子機器 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110736166.X | 2021-06-30 | ||
CN202110736166.XA CN113570606B (zh) | 2021-06-30 | 2021-06-30 | 目标分割的方法、装置及电子设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023273173A1 true WO2023273173A1 (zh) | 2023-01-05 |
Family
ID=78163240
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/136548 WO2023273173A1 (zh) | 2021-06-30 | 2021-12-08 | 目标分割的方法、装置及电子设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113570606B (zh) |
WO (1) | WO2023273173A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116543147A (zh) * | 2023-03-10 | 2023-08-04 | 武汉库柏特科技有限公司 | 一种颈动脉超声图像分割方法、装置、设备及存储介质 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113570606B (zh) * | 2021-06-30 | 2023-09-05 | 北京百度网讯科技有限公司 | 目标分割的方法、装置及电子设备 |
CN116962715A (zh) * | 2022-03-31 | 2023-10-27 | 华为技术有限公司 | 编码方法、装置、存储介质及计算机程序产品 |
CN114648446A (zh) * | 2022-03-31 | 2022-06-21 | 网银在线(北京)科技有限公司 | 视频处理方法和装置 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109685805A (zh) * | 2019-01-09 | 2019-04-26 | 银河水滴科技(北京)有限公司 | 一种图像分割方法及装置 |
CN112070044A (zh) * | 2020-09-15 | 2020-12-11 | 北京深睿博联科技有限责任公司 | 一种视频物体分类方法及装置 |
CN112950640A (zh) * | 2021-02-23 | 2021-06-11 | Oppo广东移动通信有限公司 | 视频人像分割方法、装置、电子设备及存储介质 |
CN113570606A (zh) * | 2021-06-30 | 2021-10-29 | 北京百度网讯科技有限公司 | 目标分割的方法、装置及电子设备 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10671855B2 (en) * | 2018-04-10 | 2020-06-02 | Adobe Inc. | Video object segmentation by reference-guided mask propagation |
CN111210446B (zh) * | 2020-01-08 | 2022-07-29 | 中国科学技术大学 | 一种视频目标分割方法、装置和设备 |
-
2021
- 2021-06-30 CN CN202110736166.XA patent/CN113570606B/zh active Active
- 2021-12-08 WO PCT/CN2021/136548 patent/WO2023273173A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109685805A (zh) * | 2019-01-09 | 2019-04-26 | 银河水滴科技(北京)有限公司 | 一种图像分割方法及装置 |
CN112070044A (zh) * | 2020-09-15 | 2020-12-11 | 北京深睿博联科技有限责任公司 | 一种视频物体分类方法及装置 |
CN112950640A (zh) * | 2021-02-23 | 2021-06-11 | Oppo广东移动通信有限公司 | 视频人像分割方法、装置、电子设备及存储介质 |
CN113570606A (zh) * | 2021-06-30 | 2021-10-29 | 北京百度网讯科技有限公司 | 目标分割的方法、装置及电子设备 |
Non-Patent Citations (1)
Title |
---|
WANG ZIQIN; XU JUN; LIU LI; ZHU FAN; SHAO LING: "RANet: Ranking Attention Network for Fast Video Object Segmentation", 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), IEEE, 27 October 2019 (2019-10-27), pages 3977 - 3986, XP033723818, DOI: 10.1109/ICCV.2019.00408 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116543147A (zh) * | 2023-03-10 | 2023-08-04 | 武汉库柏特科技有限公司 | 一种颈动脉超声图像分割方法、装置、设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN113570606A (zh) | 2021-10-29 |
CN113570606B (zh) | 2023-09-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023273173A1 (zh) | 目标分割的方法、装置及电子设备 | |
WO2023015941A1 (zh) | 文本检测模型的训练方法和检测文本方法、装置和设备 | |
EP3876197A2 (en) | Portrait extracting method and apparatus, electronic device and storage medium | |
CN113570610B (zh) | 采用语义分割模型对视频进行目标分割的方法、装置 | |
CN114550177A (zh) | 图像处理的方法、文本识别方法及装置 | |
CN113343826A (zh) | 人脸活体检测模型的训练方法、人脸活体检测方法及装置 | |
WO2022218012A1 (zh) | 特征提取方法、装置、设备、存储介质以及程序产品 | |
CN115578735B (zh) | 文本检测方法和文本检测模型的训练方法、装置 | |
CN113591566A (zh) | 图像识别模型的训练方法、装置、电子设备和存储介质 | |
WO2022247343A1 (zh) | 识别模型训练方法、识别方法、装置、设备及存储介质 | |
CN113177449B (zh) | 人脸识别的方法、装置、计算机设备及存储介质 | |
WO2023005253A1 (zh) | 文本识别模型框架的训练方法、装置及系统 | |
CN114429637B (zh) | 一种文档分类方法、装置、设备及存储介质 | |
CN114022887B (zh) | 文本识别模型训练及文本识别方法、装置、电子设备 | |
US20230047748A1 (en) | Method of fusing image, and method of training image fusion model | |
US20230115765A1 (en) | Method and apparatus of transferring image, and method and apparatus of training image transfer model | |
JP7403673B2 (ja) | モデルトレーニング方法、歩行者再識別方法、装置および電子機器 | |
JP7372487B2 (ja) | オブジェクトセグメンテーション方法、オブジェクトセグメンテーション装置及び電子機器 | |
CN113553905A (zh) | 图像识别方法、装置及系统 | |
CN113361536A (zh) | 图像语义分割模型训练、图像语义分割方法及相关装置 | |
CN113326766A (zh) | 文本检测模型的训练方法及装置、文本检测方法及装置 | |
CN113177483A (zh) | 视频目标分割方法、装置、设备以及存储介质 | |
CN114093006A (zh) | 活体人脸检测模型的训练方法、装置、设备以及存储介质 | |
CN113903071A (zh) | 人脸识别方法、装置、电子设备和存储介质 | |
CN114463862A (zh) | 人脸识别方法、装置、电子设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2022581655 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21948099 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21948099 Country of ref document: EP Kind code of ref document: A1 |