WO2022252881A1 - Procédé et appareil de traitement d'image, support lisible et dispositif électronique - Google Patents

Procédé et appareil de traitement d'image, support lisible et dispositif électronique Download PDF

Info

Publication number
WO2022252881A1
WO2022252881A1 PCT/CN2022/089240 CN2022089240W WO2022252881A1 WO 2022252881 A1 WO2022252881 A1 WO 2022252881A1 CN 2022089240 W CN2022089240 W CN 2022089240W WO 2022252881 A1 WO2022252881 A1 WO 2022252881A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target
encoder
unlabeled
preset
Prior art date
Application number
PCT/CN2022/089240
Other languages
English (en)
Chinese (zh)
Inventor
佘琪
张天远
李映虹
肖梅峰
王长虎
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2022252881A1 publication Critical patent/WO2022252881A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular, to an image processing method, device, readable medium, and electronic equipment.
  • the neural classification network needs to rely on a large amount of labeled data for training, but in practice, due to the high cost of data collection and labeling, it is usually difficult to obtain a large amount of labeled data for the training of the neural classification network. In the case of less labeled data, the accuracy of judging whether an image contains high-risk information will be reduced.
  • the present disclosure provides an image processing method, the method comprising:
  • the image detection model includes a target convolution layer and a processing layer
  • the target convolution layer is used to extract the features of the target image
  • the processing layer is used to determine the image detection function according to the features of the target image.
  • the image detection information is used to indicate whether the target image is an abnormal image
  • the target convolutional layer is the convolutional layer in the first encoder
  • the first encoder uses an unlabeled sample set to predict It is assumed that the network is trained, and the processing layer is obtained by using the labeled sample set to train the preset neural network layer
  • abnormal image processing is performed on the target image.
  • the present disclosure provides an image processing device, the device comprising:
  • Obtaining module for obtaining the target image in the video to be processed
  • a determining module configured to determine image detection information corresponding to the target image through a pre-trained image detection model according to the target image;
  • the image detection model includes a target convolution layer and a processing layer
  • the target convolution layer is used to extract the features of the target image
  • the processing layer is used to determine the image detection function according to the features of the target image.
  • the image detection information is used to indicate whether the target image is an abnormal image
  • the target convolutional layer is the convolutional layer in the first encoder
  • the first encoder uses an unlabeled sample set to predict It is assumed that the network is trained, and the processing layer is obtained by using the labeled sample set to train the preset neural network layer
  • a processing module configured to perform abnormal image processing on the target image when the image detection information indicates that the target image is an abnormal image.
  • the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, the steps of the method described in the first aspect of the present disclosure are implemented.
  • an electronic device including:
  • a processing device configured to execute the computer program in the storage device to implement the steps of the method described in the first aspect of the present disclosure.
  • the disclosure first acquires the target image in the video to be processed, and determines the image detection information corresponding to the target image through the pre-trained image detection model according to the target image, wherein the image detection model includes the target convolution layer and The processing layer, the target convolution layer is used to extract the characteristics of the target image, the processing layer is used to determine the image detection information according to the characteristics of the target image, the image detection information is used to indicate whether the target image is an abnormal image, and the target convolution layer is the first encoding
  • the convolutional layer in the encoder the first encoder is obtained by using the unlabeled sample set to train the preset network
  • the processing layer is obtained by using the labeled sample set to train the preset neural network layer
  • the image detection information indicates When the target image is an abnormal image, abnormal image processing is performed on the target image.
  • This disclosure is based on the unlabeled sample set, combined with the labeled sample set to train the image detection model, and does not need to rely on a large amount of labeled data to train the image detection model, even in the absence of a large amount of labeled data.
  • An accurate image detection model can be obtained, and the accuracy of image detection information determined by the image detection model is improved, thereby timely processing target images detected as abnormal images.
  • Fig. 1 is a flowchart of an image processing method shown according to an exemplary embodiment
  • Fig. 2 is a flow chart showing a step 102 according to the embodiment shown in Fig. 1;
  • Fig. 3 is a flow chart showing a training image detection model according to an exemplary embodiment
  • Fig. 4 is a flow chart showing a step 202 according to the embodiment shown in Fig. 3;
  • Fig. 5 is a flow chart showing another training image detection model according to an exemplary embodiment
  • Fig. 6 is a block diagram of an image processing device according to an exemplary embodiment
  • Fig. 7 is a block diagram of a determination module according to the embodiment shown in Fig. 1;
  • Fig. 8 is a block diagram of an electronic device according to an exemplary embodiment.
  • the term “comprise” and its variations are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • the application scenario may be any scenario where it is necessary to detect whether an image is an abnormal image.
  • the scenario may be a scenario in which video content security detection is performed on a video by detecting whether an image in a video is an abnormal image.
  • abnormal images can be images containing high-risk information such as pornography, drug abuse, violence, cults, and various vulgarities.
  • the images can be individual images (such as user avatars on the Internet), or videos (such as short video platforms, The video frame in the video uploaded by the user in the social media) is not specifically limited in this disclosure.
  • Fig. 1 is a flowchart of an image processing method according to an exemplary embodiment. As shown in Figure 1, the method may include the following steps:
  • Step 101 acquiring a target image in a video to be processed.
  • video content security detection can be used to detect the content of videos on the network to determine whether there is high-risk information in the video.
  • the video that needs content detection can be taken as the video to be processed, and the target image is selected from the video to be processed.
  • the method of selecting the target image may be: according to preset rules, frame extraction is performed on the video to be processed, and the extracted video frame in the video to be processed is used as the target image.
  • the preset rules may include at least one of the following:
  • the specified time can be a certain time point in the video to be processed in advance, or it can be the time point corresponding to each division point after the video to be processed is evenly divided. At this time, it is equivalent to uniform frame extraction of the video to be processed, and the video frame obtained by uniform frame extraction is used as the target image.
  • the key frame is the video frame containing the most information in the video to be processed (ie the most complete frame picture in the video to be processed). For example, an I frame (English: Intra-coded picture, Chinese: Intra-coded image frame) in the video to be processed may be selected as a key frame.
  • the similarity between the scene transition frame and the adjacent video frame is less than or equal to the preset similarity threshold
  • the adjacent video frame is adjacent to the scene transition frame in the video to be processed, and is located at The video frame before the scene transition frame.
  • the scene transition frame is used to indicate that the video enters a new scene.
  • a video scene extraction algorithm can be used to select scene transition frames according to the similarity between video frames in the video to be processed. If the similarity between the previous video frame and the next video frame in the video to be processed is less than or equal to the preset similarity degree threshold, the latter video frame is used as a scene transition frame.
  • Step 102 according to the target image, determine image detection information corresponding to the target image through a pre-trained image detection model.
  • the image detection model includes a target convolution layer and a processing layer
  • the target convolution layer is used to extract the features of the target image
  • the processing layer is used to determine the image detection information according to the features of the target image
  • the image detection information is used to indicate whether the target image is Abnormal image
  • the target convolutional layer is the convolutional layer in the first encoder
  • the first encoder is obtained by using the unlabeled sample set to train the preset network
  • the processing layer is the preset neural network layer using the labeled sample set obtained by training.
  • a model for detecting whether an image is an abnormal image can be trained in advance by combining unsupervised (or semi-supervised) and supervised methods.
  • a large amount of unlabeled data (that is, a large number of images of various types) on the network can be used as an unlabeled sample set to train the preset network including the first encoder, and the preset network
  • the convolutional layer in the first encoder (which may be all or part of the convolutional layers in the first encoder) is used as the target convolutional layer.
  • the first encoder is used to perform feature extraction on images in the unlabeled sample set, and the preset network may be obtained by training on the unlabeled sample set by means of contrastive learning (English: Contrastive learning), for example.
  • the manually annotated annotation data can be used as an annotated sample set, and supervised fine-tuning of the target convolutional layer is performed on the annotated sample set to obtain an image detection model.
  • the image detection model may output image detection information corresponding to the image according to the input image, indicating whether the image is an abnormal image.
  • the preset neural network layer can be trained by using the labeled sample set, and after the training of the preset neural network layer is completed, the trained preset neural network layer can be used as a processing layer. Afterwards, the target convolutional layer and the processing layer are spliced, and the spliced neural network is used as the image detection model.
  • the processing layer is used for outputting image detection information corresponding to an input image according to image features of the input image.
  • a combination of unsupervised (or semi-supervised) and supervised methods to train the image detection model a large amount of unlabeled data on the network can be efficiently used to obtain an accurate image detection model without increasing the cost of labeling. And improve the performance of the image detection model, thereby improving the accuracy of the image detection information determined by the image detection model.
  • the target image can be input into the image detection model to obtain image detection information output by the image detection model.
  • the image detection information may be any information that can indicate that the target image is an abnormal image.
  • the image detection information may be the image type of the target image, and the image type may include a normal image type and an abnormal image type.
  • the image detection information indicates that the target image is an abnormal image.
  • Abnormal image types may include pornography, drug abuse, violence, cult, dark, and vulgarity, and the processing layer is used to perform image type classification tasks.
  • the image detection information may be an image of an abnormal object in the target image that can reflect that the target image is an abnormal image.
  • the image detection information indicates that the target image is an abnormal image.
  • the processing layer is used to execute The task of detecting abnormal object images or performing the segmentation task of abnormal object images.
  • Step 103 when the image detection information indicates that the target image is an abnormal image, perform abnormal image processing on the target image.
  • abnormal image processing may be performed on the target image.
  • Abnormal image processing may include deleting the target image, blurring the target image, and marking the target image.
  • the target image can also be combined with text information and audio information in the video to be processed to further judge whether the target image is an abnormal image, thereby improving the accuracy of judging whether the target image is an abnormal image.
  • the disclosure first obtains the target image in the video to be processed, and determines the image detection information corresponding to the target image through the pre-trained image detection model according to the target image, wherein the image detection model includes the target convolutional layer and The processing layer, the target convolution layer is used to extract the characteristics of the target image, the processing layer is used to determine the image detection information according to the characteristics of the target image, the image detection information is used to indicate whether the target image is an abnormal image, and the target convolution layer is the first encoding
  • the convolutional layer in the encoder the first encoder is obtained by using the unlabeled sample set to train the preset network
  • the processing layer is obtained by using the labeled sample set to train the preset neural network layer
  • the image detection information indicates When the target image is an abnormal image, abnormal image processing is performed on the target image.
  • This disclosure is based on the unlabeled sample set, combined with the labeled sample set to train the image detection model, and does not need to rely on a large amount of labeled data to train the image detection model, even in the absence of a large amount of labeled data.
  • An accurate image detection model can be obtained, and the accuracy of image detection information determined by the image detection model is improved, thereby timely processing target images detected as abnormal images.
  • Fig. 2 is a flow chart showing a step 102 according to the embodiment shown in Fig. 1 .
  • step 102 may include the following steps:
  • Step 1021 preprocessing the target image to obtain a processed target image, the preprocessing includes at least one of grayscale processing, geometric transformation processing and image enhancement processing.
  • step 1022 the processed target image is used as an input of the image detection model to obtain image detection information output by the image detection model.
  • the target image may be preprocessed to obtain the processed target image.
  • grayscale processing, geometric transformation processing, and image enhancement processing may be performed sequentially on the target image to obtain the processed target image.
  • Preprocessing can eliminate irrelevant information in the target image, restore relevant information, enhance the detectability of relevant information, and simplify the data to the greatest extent, so as to ensure the accuracy of image detection information determined by the image detection model.
  • the processed target image can be input into the image detection model to obtain image detection information output by the image detection model.
  • Fig. 3 is a flowchart showing a training image detection model according to an exemplary embodiment.
  • the preset network includes a first encoder and a second encoder, and the image detection model is trained in the following way:
  • Step 201 obtaining an unlabeled sample set and a labeled sample set.
  • the unlabeled sample set includes unlabeled image samples
  • the labeled sample set includes labeled image samples and image detection information samples corresponding to the labeled image samples.
  • an unlabeled sample set including unlabeled image samples may be obtained.
  • a large number of videos can be collected from the Internet (such as a short video platform), and then 5 video frames are sampled at equal intervals for each video, and an unlabeled sample set is constructed using the 5 video frames sampled for each video.
  • the unlabeled sample set can also be packaged into the format of arnold_dataset and uploaded to HDFS (English: Hadoop Distributed File System, Chinese: Distributed File System) to facilitate access and reading on the arnold cluster.
  • HDFS Hadoop Distributed File System
  • a labeled sample set including labeled image samples and image detection information samples corresponding to the labeled image samples may be acquired.
  • the image detection information sample ie, image type
  • the labeled image sample may be violent.
  • Step 202 Train the preset network according to the unlabeled sample set, and determine the target convolutional layer from the convolutional layers of the first encoder included in the trained preset network.
  • step 203 the preset neural network layer is trained according to the labeled sample set, and the trained preset neural network layer is used as a processing layer.
  • the preset network is trained by contrastive learning
  • the preset network includes the first encoder and a second encoder.
  • the preset network can be trained according to the unlabeled sample set, and after the training of the preset network is completed, the convolutional layer in the first encoder is used as the target convolutional layer.
  • the preset neural network layer can be trained according to the labeled sample set, and after the training of the preset neural network layer is completed, the trained preset neural network layer is used as a processing layer.
  • the preset neural network layer can be any decoder (Chinese: encoder) structure, or MLP (English: Muti-Layer Perception, Chinese: multi-layer perceptron), which is not specifically limited in this disclosure.
  • the upper processing layer can be concatenated after the target convolutional layer to obtain the image detection model.
  • FIG. 4 is a flow chart of step 202 according to the embodiment shown in FIG. 3 . As shown in Figure 4, there are multiple unlabeled image samples, and step 202 may include the following steps:
  • Step 2021 selecting a first number of unlabeled image samples from a plurality of unlabeled image samples.
  • Step 2022 For each unlabeled image sample in the first number of unlabeled image samples, after performing data augmentation processing on the unlabeled image sample, input it into the first encoder and the second encoder respectively, to obtain the first A first eigenvector output by an encoder and a second eigenvector output by a second encoder.
  • a first number of unlabeled image samples (the first number may be 256, for example) may be selected from a plurality of unlabeled image samples included in the unlabeled sample set to form a Batch (Chinese: batch).
  • random data augmentation transformation can be used to perform data augmentation processing on the unlabeled image sample to obtain the corresponding positive sample pair sum, and and are respectively input into the first encoder and the second encoder to obtain the first eigenvector output by the first encoder and the second eigenvector output by the second encoder.
  • Step 2023 update the first encoder and the second encoder according to the first feature vector and the second feature vector corresponding to each unlabeled image sample.
  • Steps 2021 to 2023 are executed in a loop until the number of loop executions reaches a preset number of times.
  • the second feature vector corresponding to each unlabeled image sample can be sequentially input into the preset queue, and the preset Set up a queue.
  • the number of elements of the preset queue is the second number, the second number is less than or equal to the first number
  • the preset queue can be a memory queue (English: Memory Bank) with a fixed size of 65536, and the preset queue adopts FIFO (English: First in First out) update method.
  • the first encoder can be updated according to the first eigenvector corresponding to each unlabeled image sample and the preset queue, for example, the first eigenvector corresponding to each unlabeled image sample and the preset queue can be used
  • the second number of elements calculates a contrastive loss function (English: contrastive loss), and calculates the gradient of the first encoder, and at the same time updates the first encoder according to the stochastic gradient descent method.
  • the second encoder can be updated according to the updated first encoder (that is, the second encoder does not calculate the gradient during the training process, but when the first encoder each time After the update, the second encoder will also be updated).
  • Fig. 5 is a flowchart showing another training image detection model according to an exemplary embodiment.
  • the preset network includes a first encoder and a second encoder, and the image detection model is trained in the following way:
  • Step 201 obtaining an unlabeled sample set and a labeled sample set.
  • the unlabeled sample set includes unlabeled image samples
  • the labeled sample set includes labeled image samples and image detection information samples corresponding to the labeled image samples.
  • Step 202 Train the preset network according to the unlabeled sample set, and determine the target convolutional layer from the convolutional layers of the first encoder included in the trained preset network.
  • the preset model is trained according to the labeled sample set, the preset model includes a target convolutional layer and a preset neural network layer, and the preset neural network layer included in the trained preset model is used as a processing layer.
  • an unlabeled sample set including unlabeled image samples may be obtained, and an annotated sample set including annotated image samples and image detection information samples corresponding to the annotated image samples may be obtained.
  • the preset network is trained by comparative learning, after obtaining the unlabeled sample set, the preset network can be trained according to the unlabeled sample set, and after the training of the preset network is completed, the first encoder
  • the convolutional layer in is used as the target convolutional layer.
  • the target convolutional layer and the preset neural network layer can be used as a whole preset model, and the preset model is trained according to the labeled sample set (that is, the entire neural network composed of the target convolutional layer and the preset neural network layer The network is jointly trained), and after the training of the preset model is completed, the preset neural network layer included in the trained preset model is used as the processing layer.
  • the disclosure first obtains the target image in the video to be processed, and determines the image detection information corresponding to the target image through the pre-trained image detection model according to the target image, wherein the image detection model includes the target convolutional layer and The processing layer, the target convolution layer is used to extract the characteristics of the target image, the processing layer is used to determine the image detection information according to the characteristics of the target image, the image detection information is used to indicate whether the target image is an abnormal image, and the target convolution layer is the first encoding
  • the convolutional layer in the encoder the first encoder is obtained by using the unlabeled sample set to train the preset network
  • the processing layer is obtained by using the labeled sample set to train the preset neural network layer
  • the image detection information indicates When the target image is an abnormal image, abnormal image processing is performed on the target image.
  • This disclosure is based on the unlabeled sample set, combined with the labeled sample set to train the image detection model, and does not need to rely on a large amount of labeled data to train the image detection model, even in the absence of a large amount of labeled data.
  • An accurate image detection model can be obtained, and the accuracy of image detection information determined by the image detection model is improved, thereby timely processing target images detected as abnormal images.
  • Fig. 6 is a block diagram of an image processing device according to an exemplary embodiment. As shown in Figure 5, the device 300 includes:
  • the acquiring module 301 is configured to acquire target images in the video to be processed.
  • the determining module 302 is configured to determine image detection information corresponding to the target image through a pre-trained image detection model according to the target image.
  • the image detection model includes a target convolution layer and a processing layer
  • the target convolution layer is used to extract the features of the target image
  • the processing layer is used to determine the image detection information according to the features of the target image
  • the image detection information is used to indicate whether the target image is Abnormal image
  • the target convolutional layer is the convolutional layer in the first encoder
  • the first encoder is obtained by using the unlabeled sample set to train the preset network
  • the processing layer is the preset neural network layer using the labeled sample set obtained by training.
  • the processing module 303 is configured to perform abnormal image processing on the target image when the image detection information indicates that the target image is an abnormal image.
  • the obtaining module 301 is used for:
  • frame extraction is performed on the video to be processed, and the extracted video frame in the video to be processed is used as a target image.
  • the preset rules include at least one of the following:
  • the key frame is the video frame containing the most information in the video to be processed.
  • the similarity between the scene transition frame and the adjacent video frame is less than or equal to the preset similarity threshold
  • the adjacent video frame is adjacent to the scene transition frame in the video to be processed, and is located in the scene transition The video frame before the frame.
  • Fig. 7 is a block diagram of a determining module according to the embodiment shown in Fig. 1 .
  • the determining module 302 includes:
  • the preprocessing sub-module 3021 is used to preprocess the target image to obtain the processed target image.
  • the preprocessing includes at least one of grayscale processing, geometric transformation processing and image enhancement processing.
  • the determining sub-module 3022 is configured to use the processed target image as an input of the image detection model to obtain image detection information output by the image detection model.
  • the preset network includes a first encoder and a second encoder
  • the determination module 302 is used to train an image detection model in the following manner:
  • the unlabeled sample set includes unlabeled image samples
  • the labeled sample set includes labeled image samples and image detection information samples corresponding to the labeled image samples.
  • the preset network is trained according to the unlabeled sample set, and the target convolution layer is determined from the convolutional layers of the first encoder included in the trained preset network.
  • the preset neural network layer is trained according to the labeled sample set, and the trained preset neural network layer is used as a processing layer.
  • the preset network includes a first encoder and a second encoder
  • the determination module 302 is used to train an image detection model in the following manner:
  • the unlabeled sample set includes unlabeled image samples
  • the labeled sample set includes labeled image samples and image detection information samples corresponding to the labeled image samples.
  • the preset network is trained according to the unlabeled sample set, and the target convolution layer is determined from the convolutional layers of the first encoder included in the trained preset network.
  • the preset model is trained according to the labeled sample set, the preset model includes a target convolution layer and a preset neural network layer, and the preset neural network layer included in the trained preset model is used as a processing layer.
  • the determination module 302 is used for:
  • a first number of unlabeled image samples are selected from the plurality of unlabeled image samples.
  • the first encoder and the second encoder are updated according to the first feature vector and the second feature vector corresponding to each unlabeled image sample.
  • Looping to select a first number of unlabeled image samples from a plurality of unlabeled image samples, to the first encoder and the second encoder according to the first eigenvector and the second eigenvector corresponding to each unlabeled image sample The step of updating is performed until the number of times of loop execution reaches a preset number of times.
  • the determination module 302 is used for:
  • the second feature vector corresponding to each unlabeled image sample is sequentially input into the preset queue, and the preset queue is updated.
  • the number of elements in the preset queue is a second number, and the second number is less than or equal to the first number.
  • the first encoder is updated according to the first feature vector and the preset queue corresponding to each unlabeled image sample.
  • the second encoder is updated according to the updated first encoder.
  • the disclosure first obtains the target image in the video to be processed, and determines the image detection information corresponding to the target image through the pre-trained image detection model according to the target image, wherein the image detection model includes the target convolutional layer and The processing layer, the target convolution layer is used to extract the characteristics of the target image, the processing layer is used to determine the image detection information according to the characteristics of the target image, the image detection information is used to indicate whether the target image is an abnormal image, and the target convolution layer is the first encoding
  • the convolutional layer in the encoder the first encoder is obtained by using the unlabeled sample set to train the preset network
  • the processing layer is obtained by using the labeled sample set to train the preset neural network layer
  • the image detection information indicates When the target image is an abnormal image, abnormal image processing is performed on the target image.
  • This disclosure is based on the unlabeled sample set, combined with the labeled sample set to train the image detection model, and does not need to rely on a large amount of labeled data to train the image detection model, even in the absence of a large amount of labeled data.
  • An accurate image detection model can be obtained, and the accuracy of image detection information determined by the image detection model is improved, thereby timely processing target images detected as abnormal images.
  • FIG. 8 it shows a schematic structural diagram of an electronic device (such as the terminal device or server in FIG. 1 ) 400 suitable for implementing the embodiments of the present disclosure.
  • the terminal equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like.
  • the electronic device shown in FIG. 8 is only an example, and should not limit the functions and scope of use of the embodiments of the present disclosure.
  • an electronic device 400 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) Various appropriate actions and processes are executed by programs in the memory (RAM) 403 .
  • RAM random access memory
  • various programs and data necessary for the operation of the electronic device 400 are also stored.
  • the processing device 401, the ROM 402, and the RAM 403 are connected to each other through a bus 404.
  • An input/output (I/O) interface 405 is also connected to bus 404 .
  • the following devices can be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 407 such as a computer; a storage device 408 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 409.
  • the communication means 409 may allow the electronic device 400 to perform wireless or wired communication with other devices to exchange data. While FIG. 8 shows electronic device 400 having various means, it should be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a non-transitory computer readable medium, the computer program including program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication means 409, or from storage means 408, or from ROM 402.
  • the processing device 401 When the computer program is executed by the processing device 401, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
  • the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the client and the server can communicate using any currently known or future network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium
  • HTTP HyperText Transfer Protocol
  • the communication eg, communication network
  • Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: acquires a target image in the video to be processed; according to the target image, through pre-training An image detection model for determining the image detection information corresponding to the target image; wherein, the image detection model includes a target convolution layer and a processing layer, the target convolution layer is used to extract features of the target image, the The processing layer is used to determine the image detection information according to the characteristics of the target image, and the image detection information is used to indicate whether the target image is an abnormal image; the target convolution layer is the convolution in the first encoder Layer, the first encoder is obtained by using the unlabeled sample set to train the preset network, and the processing layer is obtained by using the labeled sample set to train the preset neural network layer; when the image detection information indicates When the target image is an abnormal image, abnormal image processing is performed on the target image.
  • the image detection model includes a target con
  • Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as "C" or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (for example, using an Internet service provider to connected via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service provider for example, using an Internet service provider to connected via the Internet.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the modules involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the module does not limit the module itself under certain circumstances, for example, the obtaining module can also be described as "a module for obtaining the target image in the video to be processed".
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs System on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • Example 1 provides an image processing method, including: acquiring a target image in a video to be processed; determining the target according to the target image through a pre-trained image detection model Image detection information corresponding to the image; wherein, the image detection model includes a target convolutional layer and a processing layer, the target convolutional layer is used to extract the features of the target image, and the processing layer is used to extract the features of the target image according to the target image
  • the image detection information is used to indicate whether the target image is an abnormal image
  • the target convolutional layer is the convolutional layer in the first encoder, and the first encoder is It is obtained by training a preset network with an unlabeled sample set
  • the processing layer is obtained by training a preset neural network layer with a labeled sample set
  • abnormal image processing is performed on the target image.
  • Example 2 provides the method of Example 1.
  • the acquisition of the target image in the video to be processed includes: performing frame extraction on the video to be processed according to preset rules, and The video frame in the video to be processed is extracted as the target image; wherein, the preset rule includes at least one of the following: selecting a video frame corresponding to a specified moment in the video to be processed; selecting the video frame to be processed A key frame in the video, the key frame is the video frame containing the most information in the video to be processed; the scene transition frame in the video to be processed is selected, and the similarity between the scene transition frame and the adjacent video frame is less than Or equal to a preset similarity threshold, the adjacent video frame is a video frame adjacent to the scene transition frame and located before the scene transition frame in the video to be processed.
  • Example 3 provides the method of Example 1.
  • determining the image detection information corresponding to the target image includes: The target image is preprocessed to obtain the processed target image, and the preprocessing includes at least one of grayscale processing, geometric transformation processing and image enhancement processing; the processed target image is used as The image detection model is input to obtain the image detection information output by the image detection model.
  • Example 4 provides the method of Example 1, the preset network includes a first encoder and a second encoder, and the image detection model is trained in the following manner: obtaining The unlabeled sample set and the labeled sample set; the unlabeled sample set includes unlabeled image samples, and the labeled sample set includes labeled image samples and image detection information samples corresponding to the labeled image samples; according to the The unlabeled sample set trains the preset network, and determines the target convolutional layer from the convolutional layer of the first encoder included in the trained preset network; according to the labeled samples The set trains the preset neural network layer, and uses the trained preset neural network layer as the processing layer.
  • Example 5 provides the method of Example 1, the preset network includes a first encoder and a second encoder, and the image detection model is obtained by training in the following manner: obtaining The unlabeled sample set and the labeled sample set; the unlabeled sample set includes unlabeled image samples, and the labeled sample set includes labeled image samples and image detection information samples corresponding to the labeled image samples; according to the The unlabeled sample set trains the preset network, and determines the target convolutional layer from the convolutional layer of the first encoder included in the trained preset network; according to the labeled samples Set to train the preset model, the preset model includes the target convolution layer and the preset neural network layer, and the preset neural network layer included in the trained preset model is used as the preset neural network layer the processing layer.
  • Example 6 provides the method of Example 4 or 5, wherein there are multiple unlabeled image samples, and the preset network is trained according to the unlabeled sample set, Including: selecting a first number of unlabeled image samples from a plurality of said unlabeled image samples; performing data processing on each unlabeled image sample in the first number of said unlabeled image samples After the augmentation processing, input them into the first encoder and the second encoder respectively to obtain the first feature vector output by the first encoder and the second feature vector output by the second encoder; The first encoder and the second encoder are updated according to the first feature vector and the second feature vector corresponding to each of the unlabeled image samples; Selecting a first number of unlabeled image samples from the unlabeled image samples, to the first encoding according to the first feature vector and the second feature vector corresponding to each of the unlabeled image samples The step of updating the encoder and the second encoder until the number of loop executions reaches a
  • Example 7 provides the method of Example 6, according to the first feature vector and the second feature vector corresponding to each of the unlabeled image samples, the The update by the first encoder and the second encoder includes: sequentially inputting the second feature vector corresponding to each of the unlabeled image samples into a preset queue, and updating the preset queue, so that The number of elements of the preset queue is a second number, and the second number is less than or equal to the first number; according to the first feature vector and the preset queue corresponding to each of the unlabeled image samples, updating the first encoder; after the updating of the first encoder is completed, updating the second encoder according to the updated first encoder.
  • Example 8 provides an image processing device, the device comprising: an acquisition module, configured to acquire a target image in a video to be processed; a determination module, configured to , determine the image detection information corresponding to the target image through a pre-trained image detection model; wherein, the image detection model includes a target convolution layer and a processing layer, and the target convolution layer is used to extract the target image features, the processing layer is used to determine the image detection information according to the characteristics of the target image, and the image detection information is used to indicate whether the target image is an abnormal image; the target convolutional layer is a first encoder In the convolutional layer, the first encoder is obtained by using the unlabeled sample set to train the preset network, and the processing layer is obtained by using the labeled sample set to train the preset neural network layer; the processing module, and performing abnormal image processing on the target image when the image detection information indicates that the target image is an abnormal image.
  • the image detection model includes a target convolution layer and a processing layer, and the target convolution layer is
  • Example 9 provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, the steps of the methods described in Example 1 to Example 7 are implemented.
  • Example 10 provides an electronic device, including: a storage device, on which a computer program is stored; a processing device, configured to execute the computer program in the storage device, to Implement the steps of the method described in Example 1 to Example 7.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

La présente divulgation concerne un procédé et un appareil de traitement d'image, ainsi qu'un support lisible et un dispositif électronique, et se rapporte au domaine technique du traitement d'image. Le procédé consiste à : acquérir une image cible dans une vidéo à traiter ; déterminer, en fonction de l'image cible et au moyen d'un modèle de détection d'image pré-appris, des informations de détection d'image correspondant à l'image cible, le modèle de détection d'image comprenant une couche de convolution cible et une couche de traitement, la couche de convolution cible servant à extraire des caractéristiques de l'image cible, la couche de traitement servant à déterminer les informations de détection d'image selon les caractéristiques de l'image cible, les informations de détection d'image servant à indiquer si l'image cible est une image anormale, la couche de convolution cible étant une couche de convolution dans un premier codeur, le premier codeur étant obtenu en apprenant un réseau prédéfini à l'aide d'un ensemble d'échantillons non étiqueté, et la couche de traitement étant obtenue en apprenant une couche de réseau neuronal prédéfinie à l'aide d'un ensemble d'échantillons étiqueté ; et lorsque les informations de détection d'image indiquent que l'image cible est une image anormale, effectuer un traitement d'image anormale sur l'image cible.
PCT/CN2022/089240 2021-06-03 2022-04-26 Procédé et appareil de traitement d'image, support lisible et dispositif électronique WO2022252881A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110621440.9 2021-06-03
CN202110621440.9A CN113222983A (zh) 2021-06-03 2021-06-03 图像处理方法、装置、可读介质和电子设备

Publications (1)

Publication Number Publication Date
WO2022252881A1 true WO2022252881A1 (fr) 2022-12-08

Family

ID=77082817

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/089240 WO2022252881A1 (fr) 2021-06-03 2022-04-26 Procédé et appareil de traitement d'image, support lisible et dispositif électronique

Country Status (2)

Country Link
CN (1) CN113222983A (fr)
WO (1) WO2022252881A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117292211A (zh) * 2023-11-27 2023-12-26 潍坊市海洋发展研究院 水质标注图像发送方法、装置、电子设备和计算机可读介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113222983A (zh) * 2021-06-03 2021-08-06 北京有竹居网络技术有限公司 图像处理方法、装置、可读介质和电子设备
CN113723341B (zh) * 2021-09-08 2023-09-01 北京有竹居网络技术有限公司 视频的识别方法、装置、可读介质和电子设备
CN113723344A (zh) * 2021-09-08 2021-11-30 北京有竹居网络技术有限公司 视频的识别方法、装置、可读介质和电子设备
CN113849645B (zh) * 2021-09-28 2024-06-04 平安科技(深圳)有限公司 邮件分类模型训练方法、装置、设备及存储介质
CN114627102B (zh) * 2022-03-31 2024-02-13 苏州浪潮智能科技有限公司 一种图像异常检测方法、装置、系统及可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9424494B1 (en) * 2016-01-28 2016-08-23 International Business Machines Corporation Pure convolutional neural network localization
CN108288078A (zh) * 2017-12-07 2018-07-17 腾讯科技(深圳)有限公司 一种图像中字符识别方法、装置和介质
CN109831392A (zh) * 2019-03-04 2019-05-31 中国科学技术大学 半监督网络流量分类方法
CN113222983A (zh) * 2021-06-03 2021-08-06 北京有竹居网络技术有限公司 图像处理方法、装置、可读介质和电子设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389180A (zh) * 2018-10-30 2019-02-26 国网四川省电力公司广元供电公司 一款基于深度学习的电力设备图像识别方法及巡查机器人
CN111008643B (zh) * 2019-10-29 2024-03-19 平安科技(深圳)有限公司 基于半监督学习的图片分类方法、装置和计算机设备
CN112232384A (zh) * 2020-09-27 2021-01-15 北京迈格威科技有限公司 模型训练方法、图像特征提取方法、目标检测方法和装置
CN112580581A (zh) * 2020-12-28 2021-03-30 英特灵达信息技术(深圳)有限公司 目标检测方法、装置及电子设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9424494B1 (en) * 2016-01-28 2016-08-23 International Business Machines Corporation Pure convolutional neural network localization
CN108288078A (zh) * 2017-12-07 2018-07-17 腾讯科技(深圳)有限公司 一种图像中字符识别方法、装置和介质
CN109831392A (zh) * 2019-03-04 2019-05-31 中国科学技术大学 半监督网络流量分类方法
CN113222983A (zh) * 2021-06-03 2021-08-06 北京有竹居网络技术有限公司 图像处理方法、装置、可读介质和电子设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117292211A (zh) * 2023-11-27 2023-12-26 潍坊市海洋发展研究院 水质标注图像发送方法、装置、电子设备和计算机可读介质
CN117292211B (zh) * 2023-11-27 2024-02-27 潍坊市海洋发展研究院 水质标注图像发送方法、装置、电子设备和计算机可读介质

Also Published As

Publication number Publication date
CN113222983A (zh) 2021-08-06

Similar Documents

Publication Publication Date Title
WO2022252881A1 (fr) Procédé et appareil de traitement d'image, support lisible et dispositif électronique
US20230394671A1 (en) Image segmentation method and apparatus, and device, and storage medium
WO2022105779A1 (fr) Procédé de traitement d'image, procédé d'entraînement de modèle, appareil, support et dispositif
WO2023035877A1 (fr) Procédé et appareil de reconnaissance vidéo, support lisible et dispositif électronique
WO2022171036A1 (fr) Procédé de suivi de cible vidéo, appareil de suivi de cible vidéo, support de stockage et dispositif électronique
WO2023035896A1 (fr) Procédé et appareil de reconnaissance vidéo, support lisible, et dispositif électronique
WO2022105622A1 (fr) Procédé et appareil de segmentation d'image, support lisible et dispositif électronique
CN113449070A (zh) 多模态数据检索方法、装置、介质及电子设备
CN115294501A (zh) 视频识别方法、视频识别模型训练方法、介质及电子设备
CN113140012B (zh) 图像处理方法、装置、介质及电子设备
CN113033707B (zh) 视频分类方法、装置、可读介质及电子设备
CN111783632B (zh) 针对视频流的人脸检测方法、装置、电子设备及存储介质
CN111311609B (zh) 一种图像分割方法、装置、电子设备及存储介质
WO2023130925A1 (fr) Procédé et appareil de reconnaissance de police, support lisible et dispositif électronique
WO2023065895A1 (fr) Procédé et appareil de reconnaissance textuelle, support lisible et dispositif électronique
WO2023016290A1 (fr) Procédé et appareil de classification de vidéo, support lisible et dispositif électronique
WO2023030426A1 (fr) Procédé et appareil de reconnaissance de polype, support et dispositif
CN110852242A (zh) 基于多尺度网络的水印识别方法、装置、设备及存储介质
CN113033682B (zh) 视频分类方法、装置、可读介质、电子设备
CN113033552B (zh) 文本识别方法、装置和电子设备
CN112418233B (zh) 图像处理方法、装置、可读介质及电子设备
CN113033680B (zh) 视频分类方法、装置、可读介质及电子设备
CN111737575B (zh) 内容分发方法、装置、可读介质及电子设备
CN114004313A (zh) 故障gpu的预测方法、装置、电子设备及存储介质
CN111898658A (zh) 图像分类方法、装置和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22814928

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22814928

Country of ref document: EP

Kind code of ref document: A1