CN116863311A - Image recognition method, model training method, device, equipment and storage medium - Google Patents

Image recognition method, model training method, device, equipment and storage medium Download PDF

Info

Publication number
CN116863311A
CN116863311A CN202210305631.9A CN202210305631A CN116863311A CN 116863311 A CN116863311 A CN 116863311A CN 202210305631 A CN202210305631 A CN 202210305631A CN 116863311 A CN116863311 A CN 116863311A
Authority
CN
China
Prior art keywords
image
sample
target
definition
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210305631.9A
Other languages
Chinese (zh)
Inventor
宋少鸿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210305631.9A priority Critical patent/CN116863311A/en
Publication of CN116863311A publication Critical patent/CN116863311A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses an image recognition method, a model training method, a device, equipment and a storage medium, which can be applied to scenes such as cloud technology. An image to be identified can be acquired; performing color channel processing on an image to be identified to obtain a target color image set; performing definition recognition on a target color image set through a target model to obtain a target definition label of an image to be recognized, wherein the target model is obtained by training a sample color image set corresponding to a sample image and a definition sample label, the definition sample label is obtained by coding a definition sample score, and the definition sample score has a mapping relation with a constant quality coefficient of the sample image; and weighting the target definition label to obtain a target definition score of the image to be identified. Therefore, labor and time cost are saved, and efficiency in definition identification is improved.

Description

Image recognition method, model training method, device, equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to an image recognition method, a model training method, a device, equipment, and a storage medium.
Background
With the development of information technology, people have increasingly widely obtained information, such as information obtained in the form of videos, characters, images, and the like. However, the sharpness of media information such as video, images, etc. can affect the user experience. In order to identify the sharpness of an image, the related art performs sharpness identification on the image by collecting a large number of video or image samples and manually scoring the sharpness of the video or image.
In the research and practice process of the prior art, the inventor finds that when the prior art performs definition scoring on the video or the image in a manual labeling mode, a great amount of labor cost is required, more time cost is required, and definition recognition efficiency is reduced.
Disclosure of Invention
The embodiment of the application provides an image recognition method, a model training method, a device, equipment and a storage medium. The labor and time cost can be saved, and the efficiency of identifying the video frame or the image definition can be improved.
The embodiment of the application provides an image identification method, which comprises the following steps:
collecting an image to be identified;
performing color channel processing on the image to be identified to obtain a target color image set;
Performing definition recognition on the target color image set through a target model to obtain a target definition label of the image to be recognized, wherein the target model is obtained by training a sample color image set corresponding to a sample image and a definition sample label, and the definition sample label has a mapping relation with a constant quality coefficient of the sample image;
and carrying out weighting treatment on the target definition label to obtain a target definition score of the image to be identified.
In addition, the embodiment of the application provides an image recognition model training method, which comprises the following steps:
acquiring a sample image and identifying a constant quality coefficient of the sample image;
determining a definition sample score of the sample image based on a preset definition relation table and a constant quality coefficient of the sample image;
coding the definition sample score to obtain a definition sample label corresponding to the definition sample score;
and acquiring a sample image sub-block set corresponding to the sample image, and training a preset model according to the sample image sub-block set and the definition sample label to obtain a trained target model.
Accordingly, an embodiment of the present application provides an image recognition apparatus, including:
the acquisition unit is used for acquiring the image to be identified;
the processing unit is used for performing color channel processing on the image to be identified to obtain a target color image set;
the identification unit is used for carrying out definition identification on the target color image set through a target model to obtain a target definition label of the image to be identified, wherein the target model is obtained by training a sample color image set corresponding to a sample image and a definition sample label, and the definition sample label has a mapping relation with a constant quality coefficient of the sample image;
and the weighting unit is used for carrying out weighting processing on the target definition label to obtain the target definition score of the image to be identified.
In some embodiments, the target color image set includes a target color image corresponding to each color channel, and the image recognition apparatus further includes a decomposition unit configured to:
decomposing each target color image to obtain a target image sub-block set corresponding to the image to be identified;
the identifying unit is further configured to identify the sharpness of the target image sub-block set through a target model.
In some embodiments, the decomposition unit is further configured to:
performing downsampling processing on the target color image corresponding to each color channel to obtain a processed target color thumbnail;
cutting each target color thumbnail to obtain an image sub-block set corresponding to each target color thumbnail;
and merging the image sub-block sets corresponding to the processed target color thumbnails to obtain the target image sub-block set corresponding to the image to be identified.
Correspondingly, the embodiment of the application also provides an image recognition model training device, which comprises:
an acquisition unit configured to acquire a sample image and identify a constant quality coefficient of the sample image;
the determining unit is used for determining a definition sample value of the sample image based on a preset definition relation table and a constant quality coefficient of the sample image;
the coding unit is used for coding the definition sample score to obtain a definition sample label corresponding to the definition sample score;
the training unit is used for acquiring a sample image sub-block set corresponding to the sample image, and training a preset model according to the sample image sub-block set and the definition sample label to obtain a trained target model.
In some embodiments, the acquiring unit is further configured to:
acquiring a target video and determining a plurality of constant quality coefficients of the target video during encoding;
based on each constant quality coefficient, carrying out coding processing on the target video to obtain a sample sub-video corresponding to each constant quality coefficient;
and acquiring a sample video frame from the sample sub-video, and determining the sample video frame as a sample image.
In some embodiments, the acquiring unit is further configured to:
determining the attribute type of the video to be acquired;
determining a target application according to the attribute type, and determining a target display window of the target application;
and acquiring a target video corresponding to the target duration from the target display window.
In some embodiments, the acquiring unit is further configured to:
carrying out framing treatment on the sample sub-video to obtain a plurality of video sub-frames corresponding to the sample sub-video;
classifying the plurality of video subframes to obtain a video subframe set corresponding to each type;
and selecting the sampling video frame from the video sub-frame set corresponding to each category.
In some embodiments, the image recognition model training apparatus further includes a building unit configured to:
Obtaining a constant quality coefficient range value;
determining the definition score grade number, and dividing the constant quality coefficient range value according to the definition score grade number to obtain a constant quality coefficient sub-range corresponding to each definition score grade number;
and establishing a mapping relation between each definition score grade and a corresponding constant quality coefficient sub-range, and generating the preset definition relation table according to the mapping relation.
In some embodiments, the training unit is further configured to:
inputting the sample image sub-block set into a preset model to obtain a definition prediction label;
acquiring a label difference value between the definition prediction label and the definition sample label, and adjusting network parameters of the preset model according to the label difference value;
and performing iterative training on the adjusted preset model until the label difference value converges to obtain a trained target model.
In addition, the embodiment of the application also provides a computer device, which comprises a processor and a memory, wherein the memory stores a computer program, and the processor is used for running the computer program in the memory to realize the steps in any image recognition method provided by the embodiment of the application and/or realize the steps in any image recognition model training method provided by the embodiment of the application.
In addition, the embodiment of the application further provides a computer readable storage medium, and the computer readable storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor to execute the steps in any image recognition method provided by the embodiment of the application and/or implement the steps in any image recognition model training method provided by the embodiment of the application.
In addition, the embodiment of the application also provides a computer program product, which comprises computer instructions, wherein the computer instructions are used for realizing the steps in any image recognition method provided by the embodiment of the application and/or realizing the steps in any image recognition model training method provided by the embodiment of the application when being executed.
The embodiment of the application can collect the image to be identified; performing color channel processing on an image to be identified to obtain a target color image set; performing definition recognition on the target color image set through a target model to obtain a target definition label of an image to be recognized, wherein the target model is obtained by training a sample color image set corresponding to a sample image and a definition sample label, and the definition sample label and a constant quality coefficient of the sample image have a mapping relation; and weighting the target definition label to obtain a target definition score of the image to be identified. Therefore, the method and the device can be used for carrying out color channel decomposition on the image to be identified, realizing that the image to be identified is represented by different color values, obtaining a color image set containing target color images corresponding to each color value, further carrying out image definition identification on the trained target model based on the color image set, and carrying out weighting treatment on the identified target definition label to obtain a target definition score of the image to be identified, thereby saving labor and time cost and improving efficiency in video or image definition identification.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic view of a scene of an image recognition system according to an embodiment of the present application;
fig. 2 is a schematic flow chart of steps of an image recognition method according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating steps of a training method for an image recognition model according to an embodiment of the present application;
FIG. 4 is a flowchart illustrating a further step of the image recognition method according to the embodiment of the present application;
FIG. 5 is a block flow diagram of an image recognition method according to an embodiment of the present application;
FIG. 6 is a graph showing the relationship between the constant quality coefficient and the sharpness score according to the embodiment of the present application;
fig. 7 is a schematic view of a scenario of an image recognition method according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an image recognition device according to an embodiment of the present application;
FIG. 9 is a schematic structural diagram of an image recognition model training device according to an embodiment of the present application;
Fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.
The embodiment of the application provides an image recognition method, a model training method, a device, equipment and a storage medium. Specifically, the embodiment of the application will be described from the perspective of an image recognition apparatus and/or an image recognition model training apparatus, and the apparatus may be specifically integrated in a computer device, where the computer device may be a server, or may be a device such as a user terminal. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms. The user terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, an intelligent sound box, a smart watch, an intelligent home appliance, a vehicle-mounted terminal, an intelligent voice interaction device, an aircraft, and the like.
The image recognition method provided by the embodiment of the application can be applied to various scenes including image definition recognition, such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like. The following examples are provided to illustrate the application:
for example, referring to fig. 1, a schematic view of a scene of an image recognition system according to an embodiment of the present application is provided. The scenario includes a terminal or a server.
The terminal or the server can collect the image to be identified; performing color channel processing on an image to be identified to obtain a target color image set; performing definition recognition on the target color image set through a target model to obtain a target definition label of an image to be recognized, wherein the target model is obtained by training a sample color image set corresponding to a sample image and a definition sample label, and the definition sample label and a constant quality coefficient of the sample image have a mapping relation; and weighting the target definition label to obtain a target definition score of the image to be identified. And/or acquiring a sample image and identifying a constant quality coefficient of the sample image; determining a definition sample value of the sample image based on a preset definition relation table and a constant quality coefficient of the sample image; coding the definition sample score to obtain a definition sample label corresponding to the definition sample score; and acquiring a sample image sub-block set corresponding to the sample image, and training a preset model according to the sample image sub-block set and the definition sample label to obtain a trained target model.
The image recognition process may include: the method comprises the steps of collecting images to be identified, processing color channels, decomposing, identifying definition, calculating definition scores, training models and the like.
The following will describe in detail. The order of the following examples is not limited to the preferred order of the examples.
In the embodiments of the present application, description will be made in terms of an image recognition apparatus, which may be integrated in a computer device such as a terminal or a server in particular. Referring to fig. 2, fig. 2 is a schematic step flow diagram of an image recognition method according to an embodiment of the present application, where the embodiment of the present application takes that an image recognition device is specifically integrated on a server or a terminal, and when a processor on the terminal or the server executes a program corresponding to the image recognition method, the specific flow is as follows:
101. and collecting an image to be identified.
The image to be identified is an image which needs to be identified in definition. The specific acquisition mode of the image to be identified can be a directly received picture, or can be a picture generated by computer equipment, wherein a video frame is extracted from video.
In order to perform definition recognition on an image, the embodiment of the application may collect an image to be recognized first, for example, receive a picture sent by other devices (such as a terminal or a server) as the image to be recognized, or collect a picture generated by a computer device (such as an application program) as the image to be recognized, and extract one or more frames of video frames from a video to be captured received or recorded by itself as the image to be recognized. Therefore, the definition recognition of the image to be recognized is conveniently carried out subsequently, and the definition score of the image to be recognized is obtained.
By the method, the image to be identified can be acquired so as to facilitate subsequent definition identification of the image, and the definition score of the image to be identified can be determined.
102. And carrying out color channel processing on the image to be identified to obtain a target color image set.
The color channel processing process refers to performing color value representation on the image, and representing the image to be identified by different color values. For example, an image to be recognized is represented by RGB (three primary colors of red, green, and blue) color values to acquire a target color image of R, G, B three color channels.
The target color image set includes target color images corresponding to each color channel, i.e. target color images including a plurality of color channels, such as target color images including R, G, B three color channels.
In order to improve the recognition efficiency of the image definition, the embodiment of the application can perform definition recognition through the target model. Specifically, before the definition recognition is performed through the target model, the image to be recognized needs to be preprocessed, and data is input as the target model after the preprocessing. Wherein the preprocessing process may include color channel processing and decomposition processing.
In order to obtain image data of an input model, after obtaining an image to be identified, the embodiment of the application can perform color channel processing on the image to be identified so as to represent the image to be identified through color values, and obtain an image corresponding to each color channel (each color value), namely, a target color image corresponding to each color channel. For example, the image to be identified is represented by RGB color values, and a color image corresponding to the R color value, a color image corresponding to the G color value, and a color image corresponding to the B color value are obtained respectively. Therefore, the image to be identified is represented through the plurality of color values, so that multidimensional color value data of the image to be identified is obtained, and the data is convenient to be used for the subsequent image definition identification.
Through the mode, the image to be identified can be processed by a plurality of color channels to obtain the target color image corresponding to each color value, so as to participate in the input data (characteristics) of the synthetic model, and the subsequent image to be identified can be identified in definition.
103. And carrying out definition recognition on the target color image set through the target model to obtain a target definition label of the image to be recognized.
In order to improve the recognition efficiency of the definition of the image to be recognized, after the target color image set corresponding to the image to be recognized is obtained, the target color image set is input into the trained target model, so that the definition recognition is performed through the target model, and the target definition label of the image to be recognized is obtained. Therefore, the time cost and the labor cost for manually evaluating the definition are reduced, and the recognition efficiency in recognizing the image definition is improved.
The target model is obtained by training a sample image sub-block set and a definition sample label corresponding to the sample image and is used for carrying out definition recognition on the preprocessed image block so as to determine the definition label of the image to be recognized. For example, the target model may be a convolutional neural network model, such as MobileNet, which may be trained from pre-processed sub-blocks of sample images and corresponding sharpness sample tags.
The target definition tag is output data of the target model, which may specifically be a vector-form data tag, for example, the target definition tag may be a vector of 1×10.
The definition sample tag may be obtained by encoding a definition sample score, for example, by encoding the definition sample score in One-bit efficient encoding (One-Hot) mode, so as to obtain the definition sample tag for use in a training process of a model. The sharpness sample score is determined by a preset sharpness relation table and a sample image, wherein the preset sharpness relation table comprises a mapping relation between the sample score and a constant quality coefficient (Constant Rate Factor, CRF) of the sample image.
It should be noted that, in order to enable the image features to better meet the data requirement of the model, and improve the effect of the model in image definition recognition, after the target color image set is obtained, the embodiment of the application can also respectively cut the target color image of each color channel in the target color image set to obtain the target image sub-block as the input of the model so as to better conform to the image data input by the model. Specifically, before the step 103 "performing sharpness recognition on the target color image set through the target model to obtain the target sharpness label of the image to be recognized", the step may include step a: and decomposing each target color image to obtain a target image sub-block set corresponding to the image to be identified.
The decomposition process refers to a process of cutting or dicing an image into a plurality of small image blocks. The purpose of this decomposition process may be to scale down the image to meet the data requirements of the model in identifying the sharpness of the image.
The target image sub-block set includes a plurality of target image sub-blocks, specifically may include target image sub-blocks corresponding to each color value, for example, include all target image sub-blocks corresponding to R color values, all target image sub-blocks corresponding to G color values, and all target image sub-blocks corresponding to B color values.
In order to obtain image data of an input model, after obtaining a target color image corresponding to each color channel, the embodiment of the application can decompose each target color image corresponding to the image to be identified to obtain a target image sub-block set corresponding to the image to be identified, so that the target image sub-block set is conveniently used as the input of the target model to identify the definition of the image to be identified.
In some embodiments, the step a "performing decomposition processing on each target color image to obtain a target image sub-block set corresponding to the image to be identified" may include: (A.1) carrying out downsampling processing on each target color image to obtain a processed target color thumbnail; cutting each processed target color thumbnail to obtain an image sub-block set corresponding to each processed target color thumbnail; and (A.3) merging the image sub-block sets corresponding to each processed target color thumbnail to obtain a target image sub-block set corresponding to the image to be identified.
The decomposition process may specifically include a downsampling process and a cutting process.
Specifically, the downsampling refers to downsampling each target color image based on color values, and the downsampling specifically includes: acquiring a target color value matrix corresponding to each target color image; and performing downsampling processing on the target color image based on the target color value matrix to obtain a processed target color thumbnail. It should be noted that, in the process of performing the downsampling process on the target color image, the downsampling implementation may specifically perform downsampling according to the aspect ratio of the target color image. Therefore, based on the color value matrix corresponding to each target color image, downsampling is performed to reduce the image specification of each target color image, and the downsampled corresponding target color thumbnail is obtained.
Specifically, the cutting process refers to cutting an image into a plurality of image sub-blocks. In order to obtain image data more in line with the input end of the model, after each target color thumbnail in the downsampling process is obtained, the embodiment of the application cuts/tailors each target color thumbnail to obtain an image sub-block set corresponding to each target color thumbnail. It should be noted that the cutting/cropping process may be random cropping, for example, mechanical energy of each target color thumbnail may be random cropping, and the image sub-block after cropping may be 400×400, and it is noted that the specification of the image sub-block is not particularly limited in the embodiment of the present application.
Further, after the image sub-block set corresponding to each target color thumbnail is obtained, the image sub-block sets corresponding to all the target color thumbnails are combined to obtain a target image sub-block set corresponding to the image to be identified, wherein the target image sub-block set comprises image sub-blocks of different color channels of the image to be identified, and the image sub-block set is used for inputting a target model. Through the implementation manner, the target color image of each color channel can be decomposed, so that the image blocks are more in line with the input of the target model, the target model can be used for carrying out definition recognition on the image to be recognized based on each image sub-block, and the accuracy of the model in image definition recognition is improved.
Before image sharpness recognition is performed on the target model, the model needs to be trained by sample data to improve performance of the model in image sharpness recognition.
Through the mode, the definition recognition can be carried out through the trained target model, and the target definition label of the image to be recognized is obtained. Therefore, the time cost and the labor cost for manually evaluating the definition are reduced, and the recognition efficiency in recognizing the image definition is improved.
104. And weighting the target definition label to obtain a target definition score of the image to be identified.
In order to obtain a target definition score corresponding to an image to be identified, in the embodiment of the application, after a target definition label of the image to be identified is obtained through a target model, the target definition label is a data label in a vector form, so that weighting processing is required to be performed on the target definition label. Specifically, weighting processing or weighted average processing is performed on the target definition label according to a preset weighting value, so as to obtain a target definition score of the image to be identified.
It should be noted that, the embodiment of the present application may be applied to a scene with automatically adjusted display screen definition, and in some implementations, after the step of weighting the target definition label to obtain the target definition score of the image to be identified, the method may further include: comparing the target definition score with a preset definition score threshold; if the target definition score is smaller than or equal to a preset definition score threshold, acquiring a pixel value matrix corresponding to the image to be identified; weighting each pixel value in the pixel value matrix according to a preset pixel depth weighting value to obtain a weighted target pixel value matrix; and displaying the picture to be identified based on the weighted target pixel value matrix. In addition, when the target definition score is smaller than or equal to a preset definition score threshold, the display parameters of the display interface can be adjusted, so that the images in the display interface can be displayed according to the adjusted display parameters. Therefore, the display picture is adjusted based on the definition score, and the display effect of the picture is adjusted in real time.
From the above, the embodiment of the application can collect the image to be identified; performing color channel processing on the image to be identified to obtain a target color image corresponding to each color channel; decomposing each target color image to obtain a target image sub-block set corresponding to the image to be identified; performing definition recognition on the target image sub-block set through a target model to obtain a target definition label of an image to be recognized, wherein the target model is obtained by training a sample image sub-block set corresponding to a sample image and a definition sample label, and the definition sample label has a mapping relation with a constant quality coefficient of the sample image; and weighting the target definition label to obtain a target definition score of the image to be identified. Therefore, the method and the device can be used for carrying out color channel decomposition on the image to be identified, realizing that the image to be identified is represented by different color values, obtaining a target color image corresponding to each color value, decomposing the target color image of each color value to obtain a target image sub-block set of the image to be identified, further carrying out image definition identification on the basis of the target image sub-block set through a trained target model, and carrying out weighting treatment on the identified target definition label to obtain a target definition score of the image to be identified, thereby saving labor and time cost and improving efficiency when video or image definition is identified.
According to the method described in the above embodiments, examples are described in further detail below.
In the embodiment of the application, before the image definition recognition is performed through the target model, the model needs to be trained through sample data so as to improve the performance of the model in the image definition recognition. Specifically, the embodiment of the application takes a training method of an image recognition model as an example, and further describes the training method of the image recognition model provided by the embodiment of the application.
In the embodiments of the present application, description will be made from the point of view of an image recognition model training apparatus, which may be integrated in a computer device such as a terminal and/or a server in particular. Fig. 3 is a flowchart of steps of an image recognition model training method provided by an embodiment of the present application, and referring to fig. 3, a model training process provided by the embodiment of the present application is described. For example, when the processor on the computer device executes a program corresponding to the image recognition method, the specific flow of the image recognition method is as follows:
201. a sample image is acquired and a constant quality coefficient of the sample image is identified.
Wherein the sample image may be a computer generated picture, or an image in a computer generated display that is distinguished from a natural image. The sharpness of the sample image is related to the encoding, i.e. the sharpness of the image is changed by changing the encoding parameters.
The lower the constant quality coefficient value is, the higher the image quality, i.e. the higher the sharpness, the more the image is sharp.
Specifically, in order to obtain a sample image for model training, the step of "obtaining a sample image" may include:
(201.1) acquiring a target video and determining a plurality of constant quality coefficients of the target video when being encoded;
(201.2) based on each constant quality coefficient, carrying out coding processing on the target video to obtain a sample sub-video corresponding to each constant quality coefficient;
(201.3) acquiring a sample video frame from the sample sub-video and determining the sample video frame as a sample image.
Wherein the target video may be a computer generated video.
Specifically, the embodiment of the application can collect video frames from videos as sample images, and because the sample images used for training are required to be computer-generated images, firstly, target videos generated by a computer are required to be acquired, then, the target videos are encoded into sample sub-videos with different definition, and finally, the sample video frames are respectively collected from the sample sub-videos as sample images.
The target video generated by the computer is acquired, for example, taking a video generated by a computer device (terminal or server) as an example. Step (201.1) "acquire target video" may include: determining the attribute type of the video to be acquired; determining a target application according to the attribute type, and determining a target display window of the target application; and acquiring a target video corresponding to the target duration from the target display window. The attribute type can be a content type, such as a video generated by computer coding, such as a picture type of a game, a picture type of a video call, and the like, and the content type can comprise an application identifier corresponding to the video to be acquired; therefore, the target application of the video to be acquired is determined according to the attribute type, and the current target display window of the target application or the display window of an application picture is determined, for example, a certain game application is determined according to the attribute type of the video to be acquired, so that the picture display window of the game application can be determined; further, a video of a target duration (e.g., 10 seconds, 1 minute, 10 minutes, or the entire application screen display duration) is acquired from the target display window as a target video.
Further, since sample sub-videos of different resolutions are encoded according to different constant quality coefficients, it is necessary to determine the constant quality coefficients of the sample sub-videos after obtaining the target video. Specifically, a plurality of constant quality coefficients of the target video are determined according to the range value of the constant quality coefficients and the range of the definition score level, if the range value of the constant quality coefficients is 0-51 and the range of the definition score level is 1-10, 10 constant quality coefficients are uniformly taken from the range values (0, 51) of the constant quality coefficients as the constant quality coefficients of the target video.
Further, the target video is encoded according to each constant quality coefficient to obtain a sample sub-video corresponding to each constant quality coefficient, wherein each sample sub-video corresponds to a definition score level.
Finally, after obtaining the sample sub-video corresponding to each constant quality coefficient, a sample video frame can be acquired from each sample sub-video. In some embodiments, step (201.3) "collect sample video frames from sample sub-video" may include: carrying out framing treatment on the sample sub-video to obtain a plurality of video sub-frames corresponding to the sample sub-video; classifying the plurality of video subframes to obtain a video subframe set corresponding to each class; the method comprises the steps of selecting a sampling video frame from a video subframe set corresponding to each category. Specifically, framing each sample sub-video to obtain a plurality of video sub-frames corresponding to each sample sub-video; in order to ensure that the difference exists between the video subframes of the subsequent sampling, the video subframes corresponding to each sample video can be independently classified based on the content difference of the video subframes, so that the video subframes with smaller content difference are classified into the video subframe set of the same category, and a plurality of video subframe sets corresponding to each sample video frame are obtained; because the information of the video subframes between each video subframe set has a larger difference, two video subframes of one or more frames at intervals can be directly selected from each video subframe set as sample video frames of each video subframe set, so that the sample video frames of each video subframe set can be conveniently used as sample images. Notably, the same sample sub-video has a plurality of sample images of different contents corresponding to the same sharpness sample score and constant quality factor.
It should be noted that, in the embodiment of the present application, one target video is encoded into a plurality of sample sub-videos with different constant quality coefficients, so that the sharpness between the different sample sub-videos is different. And then, a plurality of sample video subframes with different information are selected from each sample video as sample images, a plurality of sample images with different information and identical definition sample scores are obtained, and the performance and the robustness of the model can be improved by combining the plurality of sample images with different information with the same definition sample score for model training. Furthermore, when a plurality of sample images with different information and identical definition sample scores are taken as a sample combination, each sample sub-video corresponds to one sample combination, and the model is trained through the plurality of sample combinations, so that the model is comprehensively trained based on the sample images with different information and different definition, the recognition performance of the model on the images with different definition or content is improved, and the accuracy of the model in recognizing the image definition is improved.
202. And determining a definition sample value of the sample image based on a preset definition relation table and a constant quality coefficient of the sample image.
The preset definition relation table comprises a mapping relation between constant quality coefficients and definition scores, namely, each constant quality coefficient corresponds to one definition score. It will be appreciated that each sharpness score may correspond to a constant sub-range of quality coefficients; for example, the range value of the constant quality coefficient is 0-51, the range of the sharpness score is 1-10, the sharpness score "1" corresponds to the constant quality coefficient range value (46, 51), the sharpness score "2" corresponds to the constant quality coefficient range value (41, 46), and so on, and the embodiment of the present application is not particularly limited.
In order to obtain the score of the sample image, a preset sharpness relation table needs to be established in advance before the sharpness sample score corresponding to the sample image is determined. Specifically, in some embodiments, before the step of determining the sharpness sample score of the sample image based on the preset sharpness relation table and the constant quality coefficient of the sample image, the method further includes: obtaining a constant quality coefficient range value; determining the definition score grade number, and dividing the constant quality coefficient range value according to the definition score grade number to obtain a constant quality coefficient sub-range corresponding to each definition score grade number; and establishing a mapping relation between each definition score grade and a corresponding constant quality coefficient sub-range, and generating a preset definition relation table according to the mapping relation. The definition score order number refers to the number of integer scores in the definition score range. For example, the value range of the sharpness score is 1-10, and the sharpness score rank is 10.
203. And carrying out coding treatment on the definition sample score to obtain a definition sample label corresponding to the definition sample score.
Specifically, in order to obtain a definition sample label, the definition sample label is used for participating in model training, and after obtaining a definition sample score, the embodiment of the application needs to convert the definition sample score into label data output by a model. Specifically, the sharpness sample scores are encoded, for example, based on One-bit efficient coding (One-Hot) mode, to obtain sharpness sample tags for participating in the training process of the model.
204. And acquiring a sample image sub-block set corresponding to the sample image, and training a preset model according to the sample image sub-block set and the definition sample label to obtain a trained target model.
Wherein the sample image sub-block set comprises sample image sub-blocks of different color channels after preprocessing, for example, sample image sub-blocks comprising R, G, B color channels. The sample image sub-block is obtained by the following steps: performing color channel processing on the sample image to represent the sample image by color values, and obtaining a sample color image corresponding to each color channel; further, performing downsampling processing on each sample color image, and specifically performing downsampling according to aspect ratio of the sample color image to obtain a sample color thumbnail corresponding to each sample color image; cutting the sample color thumbnail corresponding to each sample color image to obtain an image sub-block set corresponding to each sample color thumbnail; and fusing all the image sub-block sets to obtain a sample image sub-block set corresponding to the sample image sub-block.
Specifically, the training process of the model is as follows: inputting the sample image sub-block set into a preset model to obtain a definition prediction label; determining a label difference value between a definition prediction label and a definition sample label, adjusting network parameters of a preset model according to the label difference value, and performing iterative training on the adjusted preset model until the label difference value converges to obtain a trained target model.
Through the mode, the definition recognition can be carried out through the trained target model, and the target definition label of the image to be recognized is obtained. Therefore, the time cost and the labor cost for manually evaluating the definition are reduced, and the recognition efficiency in recognizing the image definition is improved.
The embodiment of the application takes image recognition as an example, and further describes an image recognition method provided by the embodiment of the application.
Fig. 4 is a flow chart of another step of the image recognition method according to the embodiment of the present application, fig. 5 is a block flow chart of the image recognition method according to the embodiment of the present application, and fig. 6 is a graph of constant quality coefficient versus sharpness score according to the embodiment of the present application; fig. 7 is a schematic view of a scenario of an image recognition method according to an embodiment of the present application. For ease of understanding, embodiments of the present application are described in connection with FIGS. 4-7.
In the embodiments of the present application, description will be made from the viewpoint of an image recognition apparatus, which may be integrated in a computer device such as a terminal and/or a server in particular. For example, when the processor on the computer device executes a program corresponding to the image recognition method, the specific flow of the image recognition method is as follows:
301. and acquiring a target video.
In the embodiment of the application, the target video is acquired by the following steps: determining the attribute type of the video to be acquired; determining a target application according to the attribute type, and determining a target display window of the target application; and acquiring a target video corresponding to the target duration from the target display window.
Specifically, the attribute type may be a content type, such as a video generated by computer coding, such as a picture type of a game, a picture type of a video call, and the like, and in addition, the content type may include an application identifier corresponding to a video to be acquired; therefore, the target application of the video to be acquired is determined according to the attribute type, and the current target display window of the target application or the display window of an application picture is determined, for example, a certain game application is determined according to the attribute type of the video to be acquired, so that the picture display window of the game application can be determined; further, a video of a target duration (e.g., 10 seconds, 1 minute, 10 minutes, or a display duration of the whole application screen) is obtained from the target display window as a target video, for example, the target video is obtained by a screen recording mode.
302. A plurality of constant quality coefficients of the target video at the time of encoding are determined.
Specifically, a plurality of constant quality coefficients of the target video are determined according to the range value of the constant quality coefficients and the range of the definition score level, if the range value of the constant quality coefficients is 0-51 and the range of the definition score level is 1-10, 10 constant quality coefficients are uniformly taken from the range values (0, 51) of the constant quality coefficients as the constant quality coefficients of the target video.
303. And based on each constant quality coefficient, carrying out coding processing on the target video to obtain a sample sub-video corresponding to each constant quality coefficient.
Specifically, a coding tool (such as a multimedia coding tool FFmpeg) is used to code the target video based on each constant quality coefficient, so as to obtain a sample sub-video corresponding to each constant quality coefficient. And each sample sub-video corresponds to one definition score grade, and a plurality of sample sub-videos with different definitions are obtained.
304. Sample video frames are acquired from the sample sub-video and determined as sample images.
Specifically, framing each sample sub-video to obtain a plurality of video sub-frames corresponding to each sample sub-video; in order to ensure that the difference exists between the video subframes of the subsequent sampling, the video subframes corresponding to each sample video can be independently classified based on the content difference of the video subframes, so that the video subframes with smaller content difference are classified into the video subframe set of the same category, and a plurality of video subframe sets corresponding to each sample video frame are obtained; because the information of the video subframes between each video subframe set has a larger difference, two video subframes of one or more frames at intervals can be directly selected from each video subframe set as sample video frames of each video subframe set, so that the sample video frames of each video subframe set can be conveniently used as sample images. Notably, the same sample sub-video has a plurality of sample images of different contents corresponding to the same sharpness sample score and constant quality factor.
305. And determining a definition sample value of the sample image based on a preset definition relation table and a constant quality coefficient of the sample image.
The preset definition relation table comprises a mapping relation between constant quality coefficients and definition scores, namely, each constant quality coefficient corresponds to one definition score. It will be appreciated that each sharpness score may correspond to a constant sub-range of quality coefficients; for example, the range value of the constant quality coefficient is 0-51, the range of the sharpness score is 1-10, the sharpness score "1" corresponds to the constant quality coefficient range value (46, 51), the sharpness score "2" corresponds to the constant quality coefficient range value (41, 46), and so on, and the embodiment of the present application is not particularly limited.
The preset definition relation table needs to be established in advance, and the specific process is as follows: obtaining a constant quality coefficient range value; determining the definition score grade number, and dividing the constant quality coefficient range value according to the definition score grade number to obtain a constant quality coefficient sub-range corresponding to each definition score grade number; and establishing a mapping relation between each definition score grade and a corresponding constant quality coefficient sub-range, and generating a preset definition relation table according to the mapping relation.
Through the mode, manual labeling/scoring of the sample image is not needed, so that labor cost and time cost are saved, and the efficiency of obtaining the sample image and the definition sample score is improved.
306. Preprocessing the sample image to obtain a processed sample image sub-block set, and encoding the definition sample value of the sample image into a definition sample label.
Wherein the sample image sub-block set comprises sample image sub-blocks of different color channels after preprocessing, for example, sample image sub-blocks comprising R, G, B color channels. Specifically, the preprocessing process of the sample image is as follows: performing color channel processing on the sample image to represent the sample image by color values, and obtaining a sample color image corresponding to each color channel; further, performing downsampling processing on each sample color image, and specifically performing downsampling according to aspect ratio of the sample color image to obtain a sample color thumbnail corresponding to each sample color image; cutting the sample color thumbnail corresponding to each sample color image to obtain an image sub-block set corresponding to each sample color thumbnail; and fusing all the image sub-block sets to obtain a sample image sub-block set corresponding to the sample image sub-block.
The sharpness sample tag may be obtained by encoding the sharpness sample score, for example, by encoding the sharpness sample score in One-bit efficient coding (One-Hot) mode.
307. Training the preset model according to the sample image sub-block set and the definition sample label to obtain a trained target model.
Specifically, the training process of the model is as follows: inputting the sample image sub-block set into a preset model to obtain a definition prediction label; determining a label difference value between a definition prediction label and a definition sample label, adjusting network parameters of a preset model according to the label difference value, and performing iterative training on the adjusted preset model until the label difference value converges to obtain a trained target model.
In the embodiment of the application, the definition between different sample sub-videos is different by encoding one target video into a plurality of sample sub-videos with different constant quality coefficients. And then, a plurality of sample video subframes with different information are selected from each sample video as sample images, a plurality of sample images with different information and identical definition sample scores are obtained, and the performance and the robustness of the model can be improved by combining the plurality of sample images with different information with the same definition sample score for model training. Furthermore, when a plurality of sample images with different information and identical definition sample scores are taken as a sample combination, each sample sub-video corresponds to one sample combination, and the model is trained through the plurality of sample combinations, so that the model is comprehensively trained based on the sample images with different information and different definition, the recognition performance of the model on the images with different definition or content is improved, and the accuracy of the model in recognizing the image definition is improved.
308. And acquiring an image to be identified, and preprocessing the image to be identified to obtain a target image sub-block set corresponding to the image to be identified.
The image to be identified is an image which needs to be identified in definition. The specific acquisition mode of the image to be identified can be a directly received picture, or can be a picture generated by computer equipment, wherein a video frame is extracted from video.
Specifically, the preprocessing of the image to be identified comprises the following steps: and carrying out color channel processing on the image to be identified so as to represent the image to be identified through the color values, and obtaining an image corresponding to each color channel (each color value), namely a target color image corresponding to each color channel. For example, the image to be identified is represented by RGB color values, and a color image corresponding to the R color value, a color image corresponding to the G color value, and a color image corresponding to the B color value are obtained respectively. Further, the pretreatment process further comprises: performing downsampling processing on each target color image to obtain a processed target color thumbnail; cutting each processed target color thumbnail to obtain an image sub-block set corresponding to each processed target color thumbnail; and merging the image sub-block sets corresponding to each processed target color thumbnail to obtain a target image sub-block set corresponding to the image to be identified.
309. And carrying out definition recognition on the target image sub-block set through the target model to obtain a target definition label of the image to be recognized.
In order to improve the recognition efficiency of the definition of the image to be recognized, after the target image sub-block set corresponding to the image to be recognized is obtained, the target image sub-block set is input into the trained target model, so that the definition recognition is performed through the target model, and the target definition label of the image to be recognized is obtained. Therefore, the time cost and the labor cost for manually evaluating the definition are reduced, and the recognition efficiency in recognizing the image definition is improved.
310. And weighting the target definition label to obtain a target definition score of the image to be identified.
In order to obtain a target definition score corresponding to an image to be identified, in the embodiment of the application, after a target definition label of the image to be identified is obtained through a target model, the target definition label is a data label in a vector form, so that weighting processing is required to be performed on the target definition label. Specifically, weighting processing or weighted average processing is performed on the target definition label according to a preset weighting value, so as to obtain a target definition score of the image to be identified.
For the convenience of understanding the embodiments of the present application, the embodiments of the present application will be described with specific application scenario examples. Specifically, the application scenario example is described by performing the above steps 301-309, and with reference to fig. 5, 6 and 7.
The scheme is suitable for application scenes such as definition identification of game picture videos and definition identification during video call. The embodiment of the application takes the definition identification of the video of the game picture as an example, wherein the game can be played on a mobile phone terminal or a computer terminal. Specifically, the application scenario example is described as follows:
(1) And selecting the game type to be marked, and recording the selected game by using screen recording software on the game type to generate a screen recording file, wherein the screen recording file has the same picture quality as that of a game picture, namely the clearest picture quality.
(2) Different constant quality factor (Constant Rate Factor, CRF) coding values, called CRF values for short, are set for the video files, and the values are taken from between [0,51 ]. Specifically, assuming that the set definition has 10 files, for the CRF value of the same recording file, 10 to 15 values can be randomly and relatively uniformly taken in the CRF value interval [0,51 ]. The random means that different CRF values can be taken for the same screen recording file, so that the diversity of samples is increased; by relatively uniform, it is meant that the CRF values are not concentrated in the same area, say 10 values are all distributed between [0,10], and the values should be distributed relatively uniformly over [0,51 ].
(3) After the CRF values are selected, the original video file may be set to be encoded with an encoding tool (e.g., a multimedia encoding tool FFmpeg) to encode the different CRF values, so that each CRF value generates a video with a corresponding definition.
(4) And sampling the generated videos with different definition. Because the video has a plurality of frames in the same second, the difference of the internal tolerance of the pictures between the adjacent frames is smaller, and the frames are regarded as the same, if the frames are extracted one by one, a plurality of repeated pictures exist, so that the frames can be taken at fixed sampling intervals to obtain game pictures with different contents and different definition, namely sample images.
(5) And marking the definition of the game picture based on the CRF value. Specifically, each generated picture corresponds to a CRF value, different CRF values correspond to different resolutions, and based on this, a correspondence between the CRF value and the resolution score (for example, 1 to 10 minutes) is established, that is, a sample with the CRF value between 0 and 51 is mapped onto the image quality between 1 and 10. Firstly, the picture is the clearest when the CRF value is 0, and is set to be 10 minutes; when the CRF value is 51, the picture is the most blurred, and is set to be 1 minute; since the subjective quality of the picture decreases faster as the CRF value increases, the whole should be a quadratic curve relationship. In order to determine the quadratic curve, a value of one point (three-point determination quadratic curve) needs to be determined, wherein when the CRF value is larger than 29, the image quality of the game picture can generate a relatively obvious fuzzy sense, and therefore, the CRF value of 29 is taken as one passing score of 6 (definition score); further, the curve relationship between the CRF value and the sharpness score is drawn, and the sharpness of the intermediate node may be represented by a small number, as shown in fig. 5.
(6) Model training is carried out by using the generated marked pictures: a convolutional neural network model (such as MobileNet) is selected, the generated picture is represented by RGB color values, then downsampled according to aspect ratio (540 px after downsampled at short sides), then picture blocks of 400x400 are randomly cut out from the downsampled picture to serve as the input of the model, and the value of definition is encoded by using an One-Hot mode and then serves as the output label of the model to train. Wherein, the One-Hot mode coding means that a definition score with a value of 1-10 points is expressed by a vector of 1x10, each value of the vector represents a score, for example, the definition is 1 time division, and the vector is expressed as [1,0,0,0,0,0,0,0,0,0]; when the sharpness is decimal, then two values adjacent to the decimal value are taken and the final weighting value of the vector is made equal to the decimal value. For example, the weight is divided into 6.67 minutes and the vector is denoted as [0,0,0,0,0,0.33,0.67,0,0,0].
(7) Evaluating the definition of the picture by using the trained model: when the picture definition evaluation is carried out, the convolutional neural network in the step 6 can be used for loading the trained model parameters, then the same step in the step 6 is used for preprocessing the picture to be evaluated (RGB representation-downsampling-random clipping) to obtain a matrix of 400x400x3, the matrix is input into the model for calculation, and the model can output a 1x10 vector, namely [ p1, p2, p3, p4, p5, p6, p7, p8, p9 and p10]. The sharpness score of the final picture is the value obtained by weighted averaging the vector.
Finally, the sharpness score (score) of the game picture is a value obtained by taking a weighted average of the vectors, wherein the weighted average has the following calculation formula:
where score represents the sharpness score and k represents the weighting factor.
The scheme of the application scene example is an automatic sample labeling scheme, and for machine learning of samples needing a large amount of learning, the automatic labeling greatly improves the development efficiency and saves the development cost, and the picture definition labeled in the mode is more objective than that of manual labeling. And the subsequent deep learning model training can be performed by utilizing the sample generated by automatic labeling, and the definition evaluation of the universal picture can be performed by using the model obtained by training.
As can be seen from the foregoing, the embodiment of the present application may perform color channel decomposition on an image to be identified, so as to represent the image to be identified with different color values, obtain a target color image corresponding to each color value, decompose the target color image of each color value, so as to obtain a target image sub-block set of the image to be identified, further perform image definition identification based on the target image sub-block set through a trained target model, and perform weighting processing on the identified target definition label, so as to obtain a target definition score of the image to be identified, thereby saving labor and time costs and improving efficiency in video or image definition identification.
In order to better implement the method, the embodiment of the application also provides an image recognition device, which can integrate a server.
For example, as shown in fig. 8, the image recognition apparatus may include an acquisition unit 401, a processing unit 402, a decomposition unit 403, a recognition unit 404, and a weighting unit 405.
An acquisition unit 401 for acquiring an image to be identified;
a processing unit 402, configured to perform color channel processing on an image to be identified, so as to obtain a target color image corresponding to each color channel;
a decomposition unit 403, configured to perform decomposition processing on each target color image to obtain a target image sub-block set corresponding to the image to be identified;
the identifying unit 404 is configured to perform sharpness identification on the target image sub-block set through a target model to obtain a target sharpness tag of an image to be identified, where the target model is obtained by training a sample image sub-block set corresponding to a sample image and a sharpness sample tag, and a mapping relationship exists between the sharpness sample tag and a constant quality coefficient of the sample image;
and the weighting unit 405 is configured to perform weighting processing on the target definition label, so as to obtain a target definition score of the image to be identified.
In some embodiments, the set of target color images includes a target color image corresponding to each color channel, and the image recognition apparatus further includes a decomposition unit configured to: decomposing each target color image to obtain a target image sub-block set corresponding to the image to be identified; the identifying unit 404 is further configured to identify the sharpness of the target image sub-block set by using the target model.
In some embodiments, the decomposition unit is further configured to: performing downsampling processing on the target color image corresponding to each color channel to obtain a processed target color thumbnail; cutting each target color thumbnail to obtain an image sub-block set corresponding to each target color thumbnail; and merging the image sub-block sets corresponding to each processed target color thumbnail to obtain a target image sub-block set corresponding to the image to be identified.
From the above, the embodiment of the present application can acquire the image to be identified through the acquisition unit 401; performing color channel processing on the image to be identified through the processing unit 402 to obtain a target color image corresponding to each color channel; decomposing each target color image through a decomposing unit 403 to obtain a target image sub-block set corresponding to the image to be identified; the identifying unit 404 performs definition identification on the target image sub-block set through a target model to obtain a target definition label of the image to be identified, wherein the target model is obtained by training a sample image sub-block set corresponding to the sample image and a definition sample label, and the definition sample label has a mapping relation with a constant quality coefficient of the sample image; the target definition label is weighted by the weighting unit 405 to obtain a target definition score of the image to be identified. Therefore, the method and the device can be used for carrying out color channel decomposition on the image to be identified, realizing that the image to be identified is represented by different color values, obtaining a target color image corresponding to each color value, decomposing the target color image of each color value to obtain a target image sub-block set of the image to be identified, further carrying out image definition identification on the basis of the target image sub-block set through a trained target model, and carrying out weighting treatment on the identified target definition label to obtain a target definition score of the image to be identified, thereby saving labor and time cost and improving efficiency when video or image definition is identified.
In order to better implement the method, the embodiment of the application also provides an image recognition model training device, which can integrate a server.
For example, as shown in fig. 9, the image recognition model training apparatus may include an acquisition unit 501, a determination unit 502, an encoding unit 503, and a training unit 504.
An acquisition unit 501 for acquiring a sample image and identifying a constant quality coefficient of the sample image;
a determining unit 502, configured to determine a sharpness sample score of the sample image based on a preset sharpness relation table and a constant quality coefficient of the sample image;
the encoding unit 503 is configured to perform encoding processing on the sharpness sample score to obtain a sharpness sample label corresponding to the sharpness sample score;
the training unit 504 is configured to obtain a sample image sub-block set corresponding to the sample image, and train the preset model according to the sample image sub-block set and the sharpness sample label, so as to obtain a trained target model.
In some embodiments, the obtaining unit 501 is further configured to: acquiring a target video and determining a plurality of constant quality coefficients of the target video during encoding; based on each constant quality coefficient, encoding the target video to obtain a sample sub-video corresponding to each constant quality coefficient; sample video frames are acquired from the sample sub-video and determined as sample images.
In some embodiments, the obtaining unit 501 is further configured to: determining the attribute type of the video to be acquired; determining a target application according to the attribute type, and determining a target display window of the target application; and acquiring a target video corresponding to the target duration from the target display window.
In some embodiments, the obtaining unit 501 is further configured to:
carrying out framing treatment on the sample sub-video to obtain a plurality of video sub-frames corresponding to the sample sub-video; classifying the plurality of video subframes to obtain a video subframe set corresponding to each class; the method comprises the steps of selecting a sampling video frame from a video subframe set corresponding to each category.
In some embodiments, the image recognition model training apparatus further includes a building unit, specifically configured to:
obtaining a constant quality coefficient range value; determining the definition score grade number, and dividing the constant quality coefficient range value according to the definition score grade number to obtain a constant quality coefficient sub-range corresponding to each definition score grade number; and establishing a mapping relation between each definition score grade and a corresponding constant quality coefficient sub-range, and generating a preset definition relation table according to the mapping relation.
It should be noted that, for the detailed implementation process and the related technical effects of the image recognition model training device, please refer to the related description of the previous embodiment of the present application, and the detailed description is omitted herein.
The embodiment of the application also provides a computer device, as shown in fig. 10, which shows a schematic structural diagram of the computer device according to the embodiment of the application, specifically:
the computer device may include one or more processing cores 'processors 601, one or more computer-readable storage media's memory 602, power supply 603, and input unit 604, among other components. Those skilled in the art will appreciate that the computer device structure shown in FIG. 10 is not limiting of the computer device and may include more or fewer components than shown, or may be combined with certain components, or a different arrangement of components. Wherein:
processor 601 is the control center of the computer device and uses various interfaces and lines to connect the various parts of the overall computer device, perform various functions of the computer device and process data by running or executing software programs and/or modules stored in memory 602, and invoking data stored in memory 602. Optionally, the processor 601 may include one or more processing cores; preferably, the processor 601 may integrate an application processor and a modem processor, wherein the application processor primarily handles operating systems, user interfaces, applications, etc., and the modem processor primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 601.
The memory 602 may be used to store software programs and modules, and the processor 601 performs various functional applications and image recognition by executing the software programs and modules stored in the memory 602. The memory 602 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the computer device, etc. In addition, the memory 602 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 602 may also include a memory controller to provide access to the memory 602 by the processor 601.
The computer device further includes a power supply 603 for powering the various components, preferably, the power supply 603 can be logically coupled to the processor 601 through a power management system, such that functions of managing charging, discharging, and power consumption are performed by the power management system. The power supply 603 may also include one or more of any components, such as a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
The computer device may also include an input unit 604, which input unit 604 may be used to receive entered numerical or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the computer device may further include a display unit or the like, which is not described herein. In particular, in the embodiment of the present application, the processor 601 in the computer device loads executable files corresponding to the processes of one or more application programs into the memory 602 according to the following instructions, and the processor 601 executes the application programs stored in the memory 602, so as to implement various functions as follows:
collecting an image to be identified; performing color channel processing on the image to be identified to obtain a target color image corresponding to each color channel; decomposing each target color image to obtain a target image sub-block set corresponding to the image to be identified; performing definition recognition on the target image sub-block set through a target model to obtain a target definition label of an image to be recognized, wherein the target model is obtained by training a sample image sub-block set corresponding to a sample image and a definition sample label, and the definition sample label has a mapping relation with a constant quality coefficient of the sample image; and weighting the target definition label to obtain a target definition score of the image to be identified.
Or, acquiring a sample image and identifying a constant quality coefficient of the sample image; determining a definition sample value of the sample image based on a preset definition relation table and a constant quality coefficient of the sample image; coding the definition sample score to obtain a definition sample label corresponding to the definition sample score; and acquiring a sample image sub-block set corresponding to the sample image, and training a preset model according to the sample image sub-block set and the definition sample label to obtain a trained target model.
The specific implementation of each operation may be referred to the previous embodiments, and will not be described herein.
Therefore, the method and the device can be used for carrying out color channel decomposition on the image to be identified, realizing that the image to be identified is represented by different color values, obtaining a target color image corresponding to each color value, decomposing the target color image of each color value to obtain a target image sub-block set of the image to be identified, further carrying out image definition identification on the basis of the target image sub-block set through a trained target model, and carrying out weighting treatment on the identified target definition label to obtain a target definition score of the image to be identified, thereby saving labor and time cost and improving efficiency when video or image definition is identified.
Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.
To this end, embodiments of the present application provide a computer readable storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform the steps of any of the image recognition methods provided by the embodiments of the present application. For example, the instructions may perform the steps of:
collecting an image to be identified; performing color channel processing on the image to be identified to obtain a target color image corresponding to each color channel; decomposing each target color image to obtain a target image sub-block set corresponding to the image to be identified; performing definition recognition on the target image sub-block set through a target model to obtain a target definition label of an image to be recognized, wherein the target model is obtained by training a sample image sub-block set corresponding to a sample image and a definition sample label, and the definition sample label has a mapping relation with a constant quality coefficient of the sample image; and weighting the target definition label to obtain a target definition score of the image to be identified.
Or, acquiring a sample image and identifying a constant quality coefficient of the sample image; determining a definition sample value of the sample image based on a preset definition relation table and a constant quality coefficient of the sample image; coding the definition sample score to obtain a definition sample label corresponding to the definition sample score; and acquiring a sample image sub-block set corresponding to the sample image, and training a preset model according to the sample image sub-block set and the definition sample label to obtain a trained target model.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
Wherein the computer-readable storage medium may comprise: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.
Because the instructions stored in the computer readable storage medium can execute the steps in any image recognition method provided by the embodiments of the present application, the beneficial effects that any image recognition method provided by the embodiments of the present application can achieve can be achieved, which are detailed in the previous embodiments and are not described herein.
The foregoing has described in detail the image recognition method, the model training method, the device, the apparatus and the storage medium provided by the embodiments of the present application, and specific examples have been applied to illustrate the principles and the embodiments of the present application, where the above description of the embodiments is only for helping to understand the method and the core idea of the present application; meanwhile, as those skilled in the art will vary in the specific embodiments and application scope according to the ideas of the present application, the present description should not be construed as limiting the present application in summary.

Claims (14)

1. An image recognition method, comprising:
collecting an image to be identified;
performing color channel processing on the image to be identified to obtain a target color image set;
performing definition recognition on the target color image set through a target model to obtain a target definition label of the image to be recognized, wherein the target model is obtained by training a sample color image set corresponding to a sample image and a definition sample label, the definition sample label is obtained by coding a definition sample score, and the definition sample score has a mapping relation with a constant quality coefficient of the sample image;
and carrying out weighting treatment on the target definition label to obtain a target definition score of the image to be identified.
2. The method of claim 1, wherein the set of target color images includes a target color image corresponding to each color channel, and wherein prior to the sharpness recognition of the set of target color images by the target model, the method comprises:
decomposing each target color image to obtain a target image sub-block set corresponding to the image to be identified;
The sharpness recognition of the set of target color images by the target model includes: and carrying out definition identification on the target image sub-block set through a target model.
3. The method according to claim 1, wherein the decomposing each of the target color images to obtain a target image sub-block set corresponding to the image to be identified includes:
performing downsampling processing on the target color image corresponding to each color channel to obtain a processed target color thumbnail;
cutting each target color thumbnail to obtain an image sub-block set corresponding to each target color thumbnail;
and merging the image sub-block sets corresponding to the thumbnail of each target color to obtain the target image sub-block set corresponding to the image to be identified.
4. An image recognition model training method, comprising the steps of:
acquiring a sample image and identifying a constant quality coefficient of the sample image;
determining a definition sample score of the sample image based on a preset definition relation table and a constant quality coefficient of the sample image;
coding the definition sample score to obtain a definition sample label corresponding to the definition sample score;
And acquiring a sample image sub-block set corresponding to the sample image, and training a preset model according to the sample image sub-block set and the definition sample label to obtain a trained target model.
5. The method of claim 4, wherein the acquiring the sample image comprises:
acquiring a target video and determining a plurality of constant quality coefficients of the target video during encoding;
based on each constant quality coefficient, carrying out coding processing on the target video to obtain a sample sub-video corresponding to each constant quality coefficient;
and acquiring a sample video frame from the sample sub-video, and determining the sample video frame as a sample image.
6. The method of claim 5, wherein the acquiring the target video comprises:
determining the attribute type of the video to be acquired;
determining a target application according to the attribute type, and determining a target display window of the target application;
and acquiring a target video corresponding to the target duration from the target display window.
7. The method of claim 5, wherein the capturing sample video frames from the sample sub-video comprises:
Carrying out framing treatment on the sample sub-video to obtain a plurality of video sub-frames corresponding to the sample sub-video;
classifying the plurality of video subframes to obtain a video subframe set corresponding to each type;
and selecting the sampling video frame from the video sub-frame set corresponding to each category.
8. The method of claim 4, wherein prior to determining the sharpness sample score for the sample image based on the predetermined sharpness relationship table and the constant quality coefficient for the sample image, further comprising:
obtaining a constant quality coefficient range value;
determining the definition score grade number, and dividing the constant quality coefficient range value according to the definition score grade number to obtain a constant quality coefficient sub-range corresponding to each definition score grade number;
and establishing a mapping relation between each definition score grade and a corresponding constant quality coefficient sub-range, and generating the preset definition relation table according to the mapping relation.
9. The method of claim 4, wherein training the preset model according to the sample image sub-block set and the sharpness sample tag to obtain a trained target model comprises:
Inputting the sample image sub-block set into a preset model to obtain a definition prediction label;
acquiring a label difference value between the definition prediction label and the definition sample label, and adjusting network parameters of the preset model according to the label difference value;
and performing iterative training on the adjusted preset model until the label difference value converges to obtain a trained target model.
10. An image recognition apparatus, comprising:
the acquisition unit is used for acquiring the image to be identified;
the processing unit is used for performing color channel processing on the image to be identified to obtain a target color image set;
the identification unit is used for carrying out definition identification on the target color image set through a target model to obtain a target definition label of the image to be identified, wherein the target model is obtained by training a sample color image set corresponding to a sample image and a definition sample label, the definition sample label is obtained by coding a definition sample score, and the definition sample score has a mapping relation with a constant quality coefficient of the sample image;
and the weighting unit is used for carrying out weighting processing on the target definition label to obtain the target definition score of the image to be identified.
11. An image recognition model training device, comprising:
an acquisition unit configured to acquire a sample image and identify a constant quality coefficient of the sample image;
the determining unit is used for determining a definition sample value of the sample image based on a preset definition relation table and a constant quality coefficient of the sample image;
the coding unit is used for coding the definition sample score to obtain a definition sample label corresponding to the definition sample score;
the training unit is used for acquiring a sample image sub-block set corresponding to the sample image, and training a preset model according to the sample image sub-block set and the definition sample label to obtain a trained target model.
12. A computer device comprising a processor and a memory, the memory storing a computer program, the processor being configured to execute the computer program in the memory to perform the steps of the image recognition method of any one of claims 1 to 3 or the image recognition model training method of any one of claims 4 to 9.
13. A computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the image recognition method of any one of claims 1 to 3 or to implement the steps of the image recognition model training method of any one of claims 4 to 9.
14. A computer program product comprising computer instructions which, when executed, implement the steps of the image recognition method of any one of claims 1 to 3 or the image recognition model training method of any one of claims 4 to 9.
CN202210305631.9A 2022-03-25 2022-03-25 Image recognition method, model training method, device, equipment and storage medium Pending CN116863311A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210305631.9A CN116863311A (en) 2022-03-25 2022-03-25 Image recognition method, model training method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210305631.9A CN116863311A (en) 2022-03-25 2022-03-25 Image recognition method, model training method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116863311A true CN116863311A (en) 2023-10-10

Family

ID=88223889

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210305631.9A Pending CN116863311A (en) 2022-03-25 2022-03-25 Image recognition method, model training method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116863311A (en)

Similar Documents

Publication Publication Date Title
CN111950723B (en) Neural network model training method, image processing method, device and terminal equipment
CN111954053B (en) Method for acquiring mask frame data, computer equipment and readable storage medium
CN109844736B (en) Summarizing video content
CN111954052B (en) Method for displaying bullet screen information, computer equipment and readable storage medium
CN111954060B (en) Barrage mask rendering method, computer device and readable storage medium
CN112215171B (en) Target detection method, device, equipment and computer readable storage medium
CN110827312B (en) Learning method based on cooperative visual attention neural network
CN112132197A (en) Model training method, image processing method, device, computer equipment and storage medium
CN113518256A (en) Video processing method and device, electronic equipment and computer readable storage medium
CN113627402B (en) Image identification method and related device
CN111783712A (en) Video processing method, device, equipment and medium
CN111432206A (en) Video definition processing method and device based on artificial intelligence and electronic equipment
CN110782448A (en) Rendered image evaluation method and device
CN113822794A (en) Image style conversion method and device, computer equipment and storage medium
CN111259245B (en) Work pushing method, device and storage medium
CN111061895A (en) Image recommendation method and device, electronic equipment and storage medium
CN115063800B (en) Text recognition method and electronic equipment
CN116863311A (en) Image recognition method, model training method, device, equipment and storage medium
CN111768729A (en) VR scene automatic explanation method, system and storage medium
CN113628121B (en) Method and device for processing and training multimedia data
CN111954082B (en) Mask file structure, mask file reading method, computer device and readable storage medium
CN111954081B (en) Method for acquiring mask data, computer device and readable storage medium
CN108920512B (en) Game software scene-based recommendation method
CN112860941A (en) Cover recommendation method, device, equipment and medium
CN112749614B (en) Multimedia content identification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination