CN110929806B - Picture processing method and device based on artificial intelligence and electronic equipment - Google Patents

Picture processing method and device based on artificial intelligence and electronic equipment Download PDF

Info

Publication number
CN110929806B
CN110929806B CN201911239861.4A CN201911239861A CN110929806B CN 110929806 B CN110929806 B CN 110929806B CN 201911239861 A CN201911239861 A CN 201911239861A CN 110929806 B CN110929806 B CN 110929806B
Authority
CN
China
Prior art keywords
picture
definition
dimension
determining
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911239861.4A
Other languages
Chinese (zh)
Other versions
CN110929806A (en
Inventor
杨天舒
沈招益
高洵
刘军煜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yayue Technology Co ltd
Original Assignee
Shenzhen Yayue Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yayue Technology Co ltd filed Critical Shenzhen Yayue Technology Co ltd
Priority to CN201911239861.4A priority Critical patent/CN110929806B/en
Publication of CN110929806A publication Critical patent/CN110929806A/en
Application granted granted Critical
Publication of CN110929806B publication Critical patent/CN110929806B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a picture processing method and device based on artificial intelligence, electronic equipment and a storage medium; the method comprises the following steps: acquiring an original size picture corresponding to the multimedia material; performing feature extraction processing on the picture with the original size to obtain picture features comprising picture information and semantic information; classifying the picture features to obtain definition labels and corresponding confidence levels; and sequencing at least two definition labels according to the corresponding confidence degrees, and determining the definition of the picture with the original size according to the sequencing result. The method and the device can improve the accuracy of the definition of the obtained picture and improve the application effect on the actual application scene.

Description

Picture processing method and device based on artificial intelligence and electronic equipment
Technical Field
The present invention relates to an artificial intelligence technology, and in particular, to a picture processing method and apparatus based on artificial intelligence, an electronic device, and a storage medium.
Background
Artificial intelligence (AI, artificial Intelligence) is the theory, method and technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. Machine Learning (ML) is a branch of artificial intelligence, and mainly researches on how a computer simulates or implements Learning behaviors of human beings to acquire new knowledge or skills, and reorganizes an existing knowledge structure to continuously improve own performance.
Picture processing is an important application of machine learning, and a clearer picture is selected by a machine learning method, so that the clearer picture is used as a video cover or applied to other scenes. Specifically, since pixels in a picture are discrete, in the scheme provided by the related art, a differential is generally replaced with a gradient feature of the picture, that is, a difference between adjacent pixels in the picture is represented, and then sharpness of the picture is analyzed according to the gradient feature of the picture. However, this method is not suitable for some pictures such as solid-color pictures, and is easy to obtain wrong definition, and the accuracy of determining the definition of the picture is poor.
Disclosure of Invention
The embodiment of the invention provides a picture processing method and device based on artificial intelligence, electronic equipment and a storage medium, which can be close to human senses to obtain more accurate picture definition.
The technical scheme of the embodiment of the invention is realized as follows:
the embodiment of the invention provides a picture processing method based on artificial intelligence, which comprises the following steps:
acquiring an original size picture corresponding to the multimedia material;
performing feature extraction processing on the picture with the original size to obtain picture features comprising picture information and semantic information;
Classifying the picture features to obtain definition labels and corresponding confidence levels;
and sequencing at least two definition labels according to the corresponding confidence degrees, and determining the definition of the picture with the original size according to the sequencing result.
The embodiment of the invention provides a picture processing device based on artificial intelligence, which comprises:
the picture acquisition module is used for acquiring pictures with original sizes corresponding to the multimedia materials;
the feature extraction module is used for carrying out feature extraction processing on the picture with the original size to obtain picture features comprising picture information and semantic information;
the classification module is used for classifying the picture features to obtain definition labels and corresponding confidence levels;
and the sorting module is used for sorting at least two definition labels according to the corresponding confidence degrees and determining the definition of the picture with the original size according to the sorting result.
An embodiment of the present invention provides an electronic device, including:
a memory for storing executable instructions;
and the processor is used for realizing the image processing method based on artificial intelligence when executing the executable instructions stored in the memory.
The embodiment of the invention provides a storage medium which stores executable instructions for realizing the image processing method based on artificial intelligence when being executed by a processor.
The embodiment of the invention has the following beneficial effects:
according to the embodiment of the invention, the image with the original size is subjected to the feature extraction processing, the information in the image is reserved to the greatest extent, then the image features comprising the image information and the semantic information obtained by the feature extraction are subjected to the classification processing, so that the definition of the image is obtained, and the accuracy of the obtained definition and the applicability to a practical application scene are improved by combining the image information and the semantic information.
Drawings
FIG. 1 is a schematic diagram of an alternative architecture of an artificial intelligence based picture processing system provided by an embodiment of the present invention;
FIG. 2 is a schematic diagram of an alternative architecture of an artificial intelligence based picture processing system incorporating a blockchain network provided in accordance with an embodiment of the present invention;
FIG. 3 is a schematic diagram of an alternative architecture of a server provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of an alternative architecture of an artificial intelligence based picture processing device according to an embodiment of the present invention;
FIG. 5A is a schematic flow chart of an alternative image processing method based on artificial intelligence according to an embodiment of the present invention;
FIG. 5B is a schematic flow chart of an alternative image processing method based on artificial intelligence according to an embodiment of the present invention;
FIG. 5C is a schematic flow chart of an alternative embodiment of the present invention for training a machine learning model;
FIG. 5D is a schematic flow chart of an alternative image processing method based on artificial intelligence according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an alternative architecture of a machine learning model provided by an embodiment of the present invention;
FIG. 7 is an alternative schematic illustration of determining sharpness provided by an embodiment of the present invention.
Detailed Description
The present invention will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present invention more apparent, and the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present invention.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.
In the following description, the terms "first", "second", and the like are merely used to distinguish between similar objects and do not represent a particular ordering of the objects, it being understood that the "first", "second", or the like may be interchanged with one another, if permitted, to enable embodiments of the invention described herein to be practiced otherwise than as illustrated or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.
Before describing embodiments of the present invention in further detail, the terms and terminology involved in the embodiments of the present invention will be described, and the terms and terminology involved in the embodiments of the present invention will be used in the following explanation.
1) Picture definition: the clarity of the picture determined by human sense is affected by factors such as the overall color tone (whether there is a quart or not), whether there is a ghost on the background, whether the whole is clear or not, and whether the picture look and feel is flawed or not.
2) Picture resolution: referring to the amount of information stored in an image, i.e., how many Pixels are within an image Per Inch, the unit of resolution is Pixels Per Inch (PPI, pixels Per Inch).
3) Multimedia material: may be in a variety of media forms, for example, the multimedia material may be text, audio, or video, etc.
4) The channel is as follows: referring to the color channels of a picture, a picture can generally be represented in three dimensions, i.e., (length, width, number of channels).
5) Convolutional neural network (CNN, convolution Neural Network): the feedforward neural network with the depth structure comprises convolution calculation and is generally composed of a convolution layer, a pooling layer and a full-connection layer, and is mainly applied to the fields of image classification, image detection, image segmentation and the like.
6) Blockchain (Blockchain): the storage structure of encrypted, chained transactions formed by blocks (blocks).
7) Blockchain network (Blockchain Network): the new block is incorporated into the set of nodes of the blockchain in a consensus manner.
For picture sharpness, the related art mainly provides two ways to determine. In the first way, since the pixels in the picture are discrete, the differential is replaced by the difference, and the gradient feature of the picture represents the difference between adjacent pixels in the picture, and then the sharpness of the picture is analyzed according to the gradient feature, where the gradient feature is a canny operator, a sober operator, a laplace operator, or the like. After the gradient characteristics are obtained, the gradient characteristics can be weighted, and the obtained weighted result is compared with a set threshold value to obtain the definition of the picture; alternatively, the gradient features are fused and used as input of a machine learning model such as a support vector machine (SVM, support Vector Machine) model or a Random forest (RM) model, and the output result of the model is determined as definition. However, this approach has poor applicability to practical application scenarios, and for some special pictures, such as pure color pictures, it is not practical, so the goal of the business scenario is to give poor sharpness (e.g. blurring), but the sharpness obtained according to this approach may be high, resulting in inaccurate sharpness.
In the second mode, a convolutional neural network model is built, the marked picture data are put into the model for training, and finally the trained model is used for determining the definition of the picture. However, in this way, a picture with a fixed size needs to be used as an input of a model, but in order to obtain a picture with a fixed size, the center of the picture is usually cut according to the fixed size or the picture is directly scaled to the fixed size, the former can cause the boundary information of the lost picture, the judgment of the overall definition of the picture by the model is affected, the latter can cause the definition change due to the scaling of the picture, the training effect of the model is poor, for example, the definition is general after the picture is scaled down due to lower resolution of the picture, and the definition becomes blurred after the picture is amplified.
The embodiment of the invention provides a picture processing method, a picture processing device, electronic equipment and a storage medium based on artificial intelligence, which can improve the accuracy of the obtained definition and enhance the applicability to different application scenes.
Referring to fig. 1, fig. 1 is a schematic diagram of an alternative architecture of an artificial intelligence based image processing system 100 according to an embodiment of the present invention, in order to support an artificial intelligence based image processing application, a terminal device 400 (terminal device 400-1 and terminal device 400-2 are shown as an example) is connected to a server 200 through a network 300, where the network 300 may be a wide area network or a local area network, or a combination of both.
The terminal device 400 is configured to send an original-size picture corresponding to the multimedia material to the server 200; the server 200 is configured to perform feature extraction processing on an original size picture to obtain a picture feature including picture information and semantic information; classifying the picture features to obtain definition labels and corresponding confidence levels; sorting at least two definition labels according to the corresponding confidence level, determining the definition of the picture with the original size according to the sorting result, and sending the definition to the terminal equipment 400; the terminal device 400 is also configured to display a definition on a graphical interface 410 (graphical interface 410-1 and graphical interface 410-2 are illustratively shown), which is illustrated generally in fig. 1.
Of course, the artificial intelligence-based picture processing system 100 is not limited to the above application scenario of the definition query, for example, when the definition of the picture corresponding to the multimedia material is clear, the server 200 may store the picture in the database and set the picture as the cover of the multimedia material. For another example, the server 200 may store a plurality of pictures and corresponding resolutions in a database, and when a recommendation request corresponding to the multimedia material is obtained from the terminal device 400, send a picture corresponding to the resolution with the highest level to the terminal device 400, so as to implement picture recommendation, and enable a user of the terminal device 400 to know the content of the multimedia material through the recommended picture. The type and the grade of the definition can be set according to the actual application scene.
The embodiment of the invention can also be realized by combining a Blockchain technology, and the Blockchain (Blockchain) is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. The blockchain is essentially a decentralised database, which is a series of data blocks generated by cryptographic methods, each data block containing a batch of information of network transactions for verifying the validity (anti-counterfeiting) of the information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.
The blockchain underlying platform may include processing modules for user management, basic services, smart contracts, operation monitoring, and the like. The user management module is responsible for identity information management of all blockchain participants, including maintenance of public and private key generation (account management), key management, maintenance of corresponding relation between the real identity of the user and the blockchain address (authority management) and the like, and under the condition of authorization, supervision and audit of transaction conditions of certain real identities, and provision of rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node devices, is used for verifying the validity of a service request, recording the service request on a storage after the effective request is identified, for a new service request, the basic service firstly analyzes interface adaptation and authenticates the interface adaptation, encrypts service information (identification management) through an identification algorithm, and transmits the encrypted service information to a shared account book (network communication) in a complete and consistent manner, and records and stores the service information; the intelligent contract module is responsible for registering and issuing contracts, triggering contracts and executing contracts, a developer can define contract logic through a certain programming language, issue the contract logic to a blockchain (contract registering), invoke keys or other event triggering execution according to the logic of contract clauses to complete the contract logic, and simultaneously provide a function of registering contract upgrading; the operation monitoring module is mainly responsible for deployment in the product release process, modification of configuration, contract setting, cloud adaptation and visual output of real-time states in product operation, for example: alarms, monitoring network conditions, monitoring node device health status, etc.
Referring to fig. 2, fig. 2 is a schematic diagram of an alternative architecture of an artificial intelligence based picture processing system 110 according to an embodiment of the present invention, including a blockchain network 500 (illustratively, nodes 510-1 through 510-3), an authentication center 600, a service system 700 (illustratively, an electronic device 710 belonging to the service system 700, which may be the server 200 or the terminal device 400 in fig. 1), and a database 800, respectively, are described below.
The type of blockchain network 500 is flexible and diverse, and may be any of public, private, or federated chains, for example. Taking public chains as an example, any electronic devices of the business system, such as terminal devices and servers, can access the blockchain network 500 without authorization; taking the alliance chain as an example, an electronic device (e.g., a terminal device/a server) under the jurisdiction of the service system can access the blockchain network 500 after the service system is authorized, and at this time, the service system becomes a special node, namely a client node, in the blockchain network 500.
It is noted that the client node may provide only functionality to support business system initiated transactions (e.g., for storing data in the uplink or querying data on the chain), and may be implemented by default or selectively (e.g., depending on the specific business needs of the business system) for the functions of the native nodes of blockchain network 500, such as the ordering function, consensus services and ledger functions described below, etc. Thus, the data and business processing logic of the business system can be migrated to the blockchain network 500 to the greatest extent, and the credibility and traceability of the data and business processing process can be realized through the blockchain network 500.
Blockchain network 500 receives transactions submitted from client nodes (e.g., electronic devices 710 attributed to business system 700 shown in fig. 2) of a business system (e.g., business system 700 shown in fig. 2), performs transactions to update or query a ledger.
An exemplary application of the blockchain network is described below taking the example of a service system accessing the blockchain network to achieve a clear uplink.
The electronics 710 of business system 700 access blockchain network 500 as a client node of blockchain network 500. The electronic device 710 obtains, from the database 800, a first picture corresponding to the picture identifier according to the picture identifier input by the user. Meanwhile, the electronic device 710 generates a transaction including the picture identifier according to the instruction of the user, specifies the intelligent contract that needs to be invoked to implement the query operation and the parameters transferred to the intelligent contract in the transaction, and the transaction also carries the digital signature signed by the service system 700 (e.g., obtained by encrypting the digest of the transaction using the private key in the digital certificate of the service system 700), and broadcasts the transaction to the blockchain network 500. Wherein the digital certificate may be obtained by registration of the business system 700 with the authentication center 600.
When a node 510 in the blockchain network 500 receives a transaction, a digital signature carried by the transaction is verified, after the digital signature is verified successfully, whether the transaction system 700 has transaction permission is confirmed according to the identity of the transaction system 700 carried in the transaction, and any one verification judgment of the digital signature and permission verification can cause the transaction to fail. Signing node 510's own digital signature after verification is successful and continues to broadcast in blockchain network 500.
After receiving the transaction successfully verified, the node 510 with ordering function in the blockchain network 500 fills the transaction into a new block and broadcasts to the nodes in the blockchain network 500 providing consensus services.
Node 510 in blockchain network 500, which provides consensus services, performs a consensus process on the new block to agree on, and the node, which provides ledger functionality, appends the new block to the tail of the blockchain and performs the transaction in the new block: for a transaction which is used for inquiring and comprises a picture identifier, inquiring a key value pair corresponding to the picture identifier from a state database, and obtaining a second picture in the key value pair. It should be noted that, the service system 700 or other service systems may send the picture identifier and the corresponding picture to the blockchain network 500 in advance, and the blockchain network 500 stores the picture identifier and the corresponding picture in the blockchain and status database.
The electronic device 710 hashes the first picture to obtain a first hash value, and hashes the second picture to obtain a second hash value. When the first hash value is the same as the second hash value, it is proved that the query result of the picture identifier in the database 800 is consistent with the query result in the blockchain network 500, the electronic device 710 determines the definition of the first picture or the second picture through processing such as feature extraction and classification, and generates a transaction for submitting the definition of the first picture or the second picture, where the transaction includes the picture identifier, the electronic device 710 designates an intelligent contract that needs to be invoked to implement the update operation and parameters transferred to the intelligent contract in the transaction, and the transaction further carries a digital signature signed by the service system 700. Then, the electronic device 710 broadcasts the transaction to the blockchain network 500, and after the nodes 510 of the blockchain network are verified, blockfilled and consensus-matched, the nodes 510 providing the ledger function append the new block formed to the tail of the blockchain and execute the transaction in the new block: for transactions for updating definition, the definition and corresponding picture identification are stored in a state database in the form of key value pairs to establish an index relationship.
Exemplary applications of the electronic device provided by the embodiments of the present invention are described below. The electronic device may be implemented as a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable game device), and the like, as well as a server. In the following, an electronic device is taken as an example of a server.
Referring to fig. 3, fig. 3 is a schematic architecture diagram of a server 200 (for example, may be the server 200 shown in fig. 1) provided in an embodiment of the present invention, and the server 200 shown in fig. 3 includes: at least one processor 210, a memory 240, and at least one network interface 220. The various components in server 200 are coupled together by bus system 230. It is understood that the bus system 230 is used to enable connected communications between these components. The bus system 230 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various buses are labeled in fig. 3 as bus system 230.
The processor 210 may be an integrated circuit chip with signal processing capabilities such as a general purpose processor, such as a microprocessor or any conventional processor, or the like, a digital signal processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.
The memory 240 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like. Memory 240 optionally includes one or more storage devices that are physically located remote from processor 210.
Memory 240 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a random access Memory (RAM, random Access Memory). The memory 240 described in embodiments of the present invention is intended to comprise any suitable type of memory.
In some embodiments, memory 240 is capable of storing data to support various operations, examples of which include programs, modules and data structures, or subsets or supersets thereof, as exemplified below.
An operating system 241 including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;
network communication module 242 for reaching other computing devices via one or more (wired or wireless) network interfaces 220, exemplary network interfaces 220 include: bluetooth, wireless compatibility authentication (WiFi), and universal serial bus (USB, universal Serial Bus), etc.
In some embodiments, the image processing device based on artificial intelligence provided in the embodiments of the present invention may be implemented in software, and fig. 3 shows an image processing device 243 based on artificial intelligence stored in a memory 240, which may be software in the form of a program and a plug-in, and includes the following software modules: the image acquisition module 2431, the feature extraction module 2432, the classification module 2433, and the ordering module 2434 are logical, and thus may be arbitrarily combined or further split according to the implemented functions. The functions of the respective modules will be described hereinafter.
In other embodiments, the image processing apparatus based on artificial intelligence provided in the embodiments of the present invention may be implemented in hardware, and by way of example, the image processing apparatus based on artificial intelligence provided in the embodiments of the present invention may be a processor in the form of a hardware decoding processor that is programmed to perform the image processing method based on artificial intelligence provided in the embodiments of the present invention, for example, the processor in the form of a hardware decoding processor may employ one or more application specific integrated circuits (ASIC, application Specific Integrated Circuit), DSP, programmable logic device (PLD, programmable Logic Device), complex programmable logic device (CPLD, complex Programmable Logic Device), field programmable gate array (FPGA, field-Programmable Gate Array) or other electronic components.
The image processing method based on artificial intelligence provided by the embodiment of the invention can be executed by the server, the terminal equipment (for example, the terminal equipment 400-1 and the terminal equipment 400-2 shown in fig. 1) or the server and the terminal equipment together.
The process of implementing an artificial intelligence based picture processing method by an embedded artificial intelligence based picture processing apparatus in an electronic device will be described below in connection with the exemplary applications and structures of the electronic device described above.
Referring to fig. 4 and 5A, fig. 4 is a schematic diagram of an architecture of an artificial intelligence-based image processing apparatus 243 according to an embodiment of the present invention, which shows a flow of obtaining sharpness through a series of modules, and fig. 5A is a schematic diagram of a flow of an artificial intelligence-based image processing method according to an embodiment of the present invention, and the steps shown in fig. 5A will be described with reference to fig. 4.
In step 101, a picture of an original size corresponding to a multimedia material is acquired.
Here, the multimedia material may be text, audio, video, or the like, which is not limited in the embodiment of the present invention, and the picture size when the picture corresponding to the multimedia material is acquired is determined as the original size. When the picture is obtained, normalization processing can be carried out on the picture, and pixel values ranging from 0 to 255 in the picture are converted into a range ranging from 0 to 1, so that subsequent calculation is facilitated.
In some embodiments, the above-mentioned obtaining of the original-size picture corresponding to the multimedia material may be achieved by: acquiring an original size picture corresponding to the multimedia material uploaded by a user; or performing frame extraction processing on the multimedia material according to the acquisition frequency to obtain a picture with an original size corresponding to the multimedia material.
As an example, referring to fig. 4, in the picture obtaining module 2431, a picture corresponding to the multimedia material may be uploaded by a user, and when the multimedia material is video, the picture may also be obtained by performing frame extraction processing on the multimedia material according to an acquisition frequency, for example, one second of frame extraction. Of course, other acquisition modes may be applied to the pictures corresponding to the multimedia materials, for example, searching in the picture platform according to the names of the multimedia materials, and determining the pictures with K top ranking bits as the pictures corresponding to the multimedia materials, where K is an integer greater than 0. By the mode, flexibility of obtaining the pictures is improved.
In step 102, feature extraction processing is performed on the original-size picture, so as to obtain a picture feature including picture information and semantic information.
Here, the feature extraction process is performed on the picture with the original size to obtain a picture feature including bottom-layer picture information and high-layer semantic information, where the picture information indicates information such as color, texture, shape, etc. of the picture, and the semantic information indicates meaning of the picture, for example, the picture indicates a building. The feature extraction process may be performed on the original size picture by a trained machine learning model, the details of which are described in detail below.
In step 103, the image features are classified to obtain definition labels and corresponding confidence levels.
Here, the image features obtained in step 102 are classified, for example, multiple classification is performed through a full connection layer in a machine learning model, so as to obtain at least two definition tags and confidence degrees corresponding to the definition tags, where the confidence degrees are probabilities that the definition tags match with real definitions.
In step 104, sorting at least two definition labels according to the corresponding confidence levels, and determining the definition of the original-size picture according to the sorting result.
For example, all obtained resolutions are ranked according to the order of the confidence level from large to small, and the resolution label corresponding to the confidence level positioned at the set ranking level is determined as the resolution of the picture with the original size, wherein the set ranking level can be the first level. Of course, the final sharpness may be determined in other ways than by setting the ranking order. The application mode of the obtained definition is not limited in the embodiment of the invention, for example, when the grade of the definition of the picture reaches a set grade, the picture is determined to be the cover of the multimedia material, for example, a plurality of pictures corresponding to the multimedia material and the definition corresponding to each picture can be stored, and when the picture is recommended, the picture corresponding to the definition with the highest grade is recommended.
In any step, the image processing method based on artificial intelligence further comprises the following steps: acquiring a first picture corresponding to a picture identifier from a database; sending a request to a blockchain network according to the picture identification to acquire a second picture stored by the blockchain and corresponding to the picture identification; carrying out hash processing on the first picture to obtain a first hash value, and carrying out hash processing on the second picture to obtain a second hash value; and when the first hash value is the same as the second hash value, sending the definition of the first picture or the second picture to the blockchain network so that nodes of the blockchain network store the definition to the blockchain, and establishing an index relation between the definition and the picture identification.
In the embodiment of the invention, the picture corresponding to the multimedia material and the picture identification of the picture can be sent to the blockchain network by combining with the blockchain technology, and the persistent storage of the picture and the corresponding picture identification is realized on the blockchain. However, in the sending process, the picture sent to the blockchain network may change due to various factors (such as malicious tampering of the picture by an illegal user or automatic compression of the picture in the background, etc.), so in order to avoid sending error definition to the blockchain network, the corresponding picture is firstly obtained from the database according to the picture identifier, and is named as the first picture for convenience of distinguishing. Meanwhile, a request comprising the picture identification is sent to the blockchain network so as to acquire a second picture stored in the blockchain and corresponding to the picture identification, and it is worth noting that when a state database exists, nodes of the blockchain network respond to the request comprising the picture identification and acquire the second picture corresponding to the picture identification from the state database preferentially.
Then, the first picture is subjected to hash processing to obtain a first hash value, and the second picture is subjected to hash processing to obtain a second hash value. When the first hash value is different from the second hash value, one of the first picture and the second picture is proved to be tampered, so that the first picture and the second picture are sent to a manual checking party, if a feedback result of the manual checking party indicates that the first picture is wrong, the first picture is deleted in a database, the definition of the second picture is determined according to steps 102-104, the definition is sent to a blockchain network, and an index relation between picture identification and second picture and definition is established by a node of the blockchain network; if the feedback result of the manual checking party indicates that the second picture is wrong, determining the definition of the first picture according to steps 102-104, transmitting the definition and the first picture to the blockchain network together, and updating the original index relation of the picture identification to the picture identification-first picture-definition by the node of the blockchain network.
Otherwise, when the first hash value is different from the second hash value, determining the definition of any one of the first picture and the second picture according to steps 102-104, sending the definition to the blockchain network, storing the definition to the blockchain and state database by the nodes of the blockchain network, and establishing an index relation of picture identification-second picture-definition in the blockchain and state database. Therefore, when the node of the blockchain network acquires the query request comprising the picture identification later, the accurate picture with the original size and the corresponding definition can be returned, and the effectiveness and the accuracy of data storage are improved.
As can be seen from the above exemplary implementation of fig. 5A according to the embodiment of the present invention, the embodiment of the present invention performs feature extraction processing on an original-size picture, so as to retain information in the original-size picture to the greatest extent, and, due to the combination of semantic information, for some special pictures, such as pure color pictures, more accurate definition can be obtained, thereby improving applicability to different application scenarios.
In some embodiments, referring to fig. 5B, fig. 5B is a schematic flow chart of an alternative image processing method based on artificial intelligence according to an embodiment of the present invention, and step 102 shown in fig. 5A may be implemented by steps 201 to 204, which will be described in connection with the steps.
In step 201, a dimension reduction process is performed on the original-size picture through a dimension reduction layer of the machine learning model, so as to obtain a dimension-reduced picture.
In the embodiment of the invention, in order to extract semantic information in the pictures, a machine learning model can be used for carrying out feature extraction processing on the pictures with original sizes, and the machine learning model can be a deep learning model constructed based on a convolutional neural network. As an example, referring to fig. 4, in the feature extraction module 2432, the dimension-reducing layer of the machine learning model performs dimension-reducing processing on the original-dimension picture to obtain a dimension-reduced picture, where the dimension-reducing layer may include a Convolution (Convolition) layer and a pooling (Pool) layer, and the dimension-reducing processing is performed to reduce the calculation amount of the subsequent network layer.
In step 202, the dimension-reduced picture is sampled by an intermediate sampling layer of the machine learning model, so as to obtain a feature map.
As an example, referring to fig. 4, in the feature extraction module 2432, the intermediate Sampling layer includes a block Sampling layer and a downsampling (downsampling) layer, the block Sampling layer may be a res_block layer, the block Sampling layer includes a number of convolution layers, a batch normalization (BN, batch Normalization) layer, and an activation layer, and the block Sampling layer includes an identity mapping operation, which is a shortcut operation. And sampling the dimension-reduced picture through the middle sampling layer to obtain a feature picture, wherein the number of channels of the feature picture is the same as the number of channels set in the middle sampling layer.
To increase the complexity of the machine learning model, extracting more information in the original size picture, at least two intermediate sampling layers may be provided, the input of the next intermediate sampling layer being the output of the last intermediate sampling layer, e.g. the next layer of intermediate sampling layer 1 being intermediate sampling layer 2, then the input of intermediate sampling layer 2 is a feature map of the output of intermediate sampling layer 1. It should be noted that, as the number of layers increases, the number of channels in the intermediate sampling layer is generally set to be larger.
In step 203, the feature map is converted by the adaptive pooling layer of the machine learning model, so as to obtain a feature vector corresponding to the feature map.
As an example, referring to fig. 4, in the feature extraction module 2432, the feature map is converted by an Adaptive pooling (Adaptive Pool) layer of the machine learning model, to obtain feature vectors corresponding to the feature map. It is worth noting that when there are at least two intermediate sampling layers, a separate corresponding adaptive pooling layer is provided for each intermediate sampling layer.
In step 204, at least two feature vectors are combined to obtain a picture feature including picture information and semantic information.
Under the condition of comprising at least two self-adaptive pooling layers, combining (concat) the feature vectors corresponding to all the self-adaptive pooling layers, and obtaining a result which is the picture feature. Based on the characteristics of the convolutional neural network, the feature map obtained at each intermediate sampling layer has local invariance, and finally the picture features are obtained through merging processing, so that the picture features have stable image information and meaningful semantic information related to picture definition.
In some embodiments, the above-mentioned combining processing of at least two feature vectors may be implemented in such a way that a picture feature including picture information and semantic information is obtained: splicing at least two parameters of the feature vectors in a first dimension to obtain parameters of the picture features in the first dimension; determining the parameter of any feature vector in a second dimension as the parameter of the picture feature in the second dimension; the parameters of the feature vector in the first dimension are the same as the channel number of the corresponding feature map, and the second dimension is a dimension other than the first dimension.
Here, for each adaptive pooling layer, the parameters of the output feature vector in one dimension are the same as the number of channels of the input feature map, and for convenience of distinction, the dimension is named as a first dimension, and the dimension other than the first dimension is named as a second dimension. And when the merging processing is carried out, carrying out splicing processing on the parameters of all the feature vectors in the first dimension to obtain the parameters of the picture features in the first dimension, and simultaneously, determining the parameters corresponding to any one of the feature vectors in the second dimension as the parameters of the picture features in the second dimension, wherein the parameters of the second dimension are usually 1. Through the mode, effective fusion of information extracted from different layers in the machine learning model is achieved.
In fig. 5B, step 103 shown in fig. 5A may be implemented by step 205, specifically, in step 205, mapping processing is performed on the picture feature through the full connection layer of the machine learning model, so as to obtain a definition label and a corresponding confidence level.
As an example, referring to fig. 4, in classification module 2433, the resulting picture features are mapped through the fully connected layer of the machine learning model. It should be noted that, in order to enhance the nonlinear learning capability of the machine learning model, a first full-connection layer and a second full-connection layer may be provided, and in order to prevent overfitting, a drop (Dropout) layer may be provided between the first full-connection layer and the second full-connection layer, and during processing, the image features are sequentially input to the first full-connection layer and the drop layer, and the output result of the drop layer is input to the second full-connection layer, so as to complete mapping processing, and map the image features to the confidence degrees corresponding to the definition labels.
As can be seen from the above exemplary implementation of fig. 5B according to the embodiment of the present invention, the accuracy of the obtained definition label and confidence coefficient is improved by performing the feature extraction process and the classification process based on the machine learning model constructed by the convolutional neural network.
In some embodiments, referring to fig. 5C, fig. 5C is an alternative flow diagram of training a machine learning model provided by an embodiment of the present invention, and will be described in connection with the steps shown in fig. 5C.
In step 301, a sample picture and a corresponding sample sharpness are obtained.
Here, a sample picture and a corresponding marked sample definition are obtained, wherein the sample picture can be a picture corresponding to the multimedia material or any picture, and is determined according to an actual application scene.
In step 302, the machine learning model predicts the sample picture to obtain a sharpness tag and a corresponding prediction confidence.
The sample pictures are sequentially processed through a dimension reduction layer, an intermediate sampling layer, a self-adaptive pooling layer and a full-connection layer of the machine learning model to obtain at least two definition labels and corresponding confidence levels, and the confidence levels are named as prediction confidence levels for convenience in distinguishing.
In step 303, sorting at least two definition labels according to the corresponding prediction confidence, and determining the definition label corresponding to the prediction confidence of the set sorting order as the prediction definition of the sample picture.
For example, all the definition labels are ranked according to the order of the prediction confidence level from high to low, and the definition label corresponding to the prediction confidence level with the ranking level set, such as the first level, is determined as the prediction definition of the sample picture. Of course, this does not constitute a limitation of the embodiment of the present invention, that is, other ways of determining the sharpness of the sample picture according to the result of the sorting process may also be applied.
In step 304, a difference between the sample sharpness and the predicted sharpness is determined.
Here, different types of sharpness may be represented by different numerical values, for example, the types of sharpness include blur, general, and sharpness, and then blur may be represented by-1, general by 0, and sharpness by 1. And after the prediction definition corresponding to the sample picture is determined, determining the difference between the sample definition and the prediction definition.
In step 305, a gradient of the machine learning model is determined according to the difference, and weight parameters of the fully connected layer, the intermediate sampling layer and the dimension reduction layer are updated along a direction in which the gradient decreases.
The gradient of the loss function of the machine learning model is determined according to the obtained difference, and the weight parameters of each layer in the machine learning model are updated along the gradient descending direction, and it is worth noting that the self-adaptive pooling layer only performs self-adaptive pooling treatment, so that the self-adaptive pooling layer does not have the weight parameters which need to be updated. The embodiment of the invention does not limit the type of the loss function, for example, the loss function can be a cross entropy loss function.
In fig. 5C, step 305 may be implemented by steps 401 to 404, and will be described in connection with each step.
In step 401, a center size corresponding to the multimedia material is obtained, and a size difference value between the original size of the sample picture and the center size is determined.
There are various ways of gradient descent, if random gradient descent is used, that is, if one sample image and corresponding sample definition are used for iteration at a time, the training speed and the training accuracy of the machine learning model are adversely affected, so in the embodiment of the present invention, a way of gradient descent in small batches is applied. Specifically, a center size corresponding to the multimedia material is obtained, where the center size is a size of a picture, and one or more center sizes may be preset. Then, a size difference value between the original size of the sample picture and each set center size is determined.
In some embodiments, the above-described determination of the size difference between the original size and the center size of the sample picture may be achieved by: determining a transverse phase difference result of the original size and the center size of the sample picture, and performing absolute value processing on the transverse phase difference result to obtain a transverse phase difference value; determining a longitudinal phase difference result of the original size and the center size of the sample picture, and performing absolute value processing on the longitudinal phase difference result to obtain a longitudinal phase difference value; and summing the transverse phase difference value and the longitudinal phase difference value to obtain a dimension phase difference value.
Here, separate calculations may be performed for the lateral dimension (length of the picture) and the longitudinal dimension (width of the picture) of the picture, specifically, a phase difference result of the original dimension and the center dimension of the sample picture in the lateral direction is determined, and absolute value processing is performed on the phase difference result to obtain a lateral phase difference value; and determining a longitudinal phase difference result of the original size and the center size of the sample picture, and performing absolute value processing on the phase difference result to obtain a longitudinal phase difference value. And then, summing the transverse phase difference value and the longitudinal phase difference value to obtain a dimension phase difference value. The size difference value obtained by the method can better reflect the size difference between the original size and the center size of the sample picture.
In step 402, when the size variance value is smaller than a clustering threshold, the sample picture is added to a center class corresponding to the center size.
And when the size phase difference value is smaller than a set clustering threshold value, determining the center size corresponding to the size phase difference value, and adding the sample picture to the center class corresponding to the center size, namely clustering, wherein each center size corresponds to one center class.
In step 403, all sample pictures in the center class are scaled in size according to the center size.
In order to facilitate model processing, after clustering of each sample picture is completed, the sizes of all sample pictures in the center class are scaled (resize) to be consistent with the center size corresponding to the center class. It should be noted that, if the sample pictures are not classified into any center class after clustering, the sample pictures and the corresponding sample definition may be added to the test set to test the training effect of the machine learning model.
In step 404, according to the differences between the sample resolutions and the prediction resolutions corresponding to all the sample pictures in the center class, determining the gradient of the machine learning model, and updating the weight parameters of the full connection layer, the adaptive pooling layer, the intermediate sampling layer and the dimension reduction layer along the gradient descending direction.
The center class obtained by clustering is a batch, the number of sample pictures included in the center class is batch_size, and when training is performed, the gradient of the loss function of the machine learning model is determined according to the difference between the sample definition and the prediction definition corresponding to all the sample pictures in the center class, and the weight parameters of each layer of the machine learning model are updated along the gradient descending direction, namely small-batch gradient descent is performed. The training process is repeated until a convergence condition is met, such as reaching a set number of iterations or reaching a set accuracy.
As can be seen from the above exemplary implementation of fig. 5C according to the embodiment of the present invention, the embodiment of the present invention trains the machine learning model according to the samples, optimizes the weight parameters of each layer of the machine learning model, and improves the accuracy of processing according to the trained machine learning model.
In some embodiments, referring to fig. 5D, fig. 5D is a schematic flow chart of an alternative image processing method based on artificial intelligence according to an embodiment of the present invention, and step 104 shown in fig. 5A may be implemented through steps 501 to 504, which will be described in connection with the steps.
In step 501, at least two definition tags are ranked according to the corresponding confidence levels.
In step 502, when the definition label corresponding to the confidence coefficient with the largest value is clear or fuzzy, determining the definition label corresponding to the confidence coefficient with the largest value as the definition of the picture with the original size; wherein, the types of the definition labels include: blurred, general and clear.
For ease of understanding, the description is given with the type of definition label (definition) including blur, general, and clarity, it should be understood that more or fewer types may be set depending on the actual application scenario. In an actual labeling scene, the number of pictures with the common definition is usually more, namely, the distribution is uneven, so after the sorting processing is performed, if the definition label corresponding to the confidence coefficient with the largest value is clear or fuzzy, the definition label corresponding to the confidence coefficient with the largest value is determined to be the definition of the picture with the original size.
In step 503, when the sharpness label corresponding to the confidence level with the largest value is general and the confidence level with the largest value is greater than or equal to the confidence level threshold, the sharpness label corresponding to the confidence level with the largest value is determined as the sharpness of the picture with the original size.
Here, a confidence threshold is set for a general definition tag, and the set range of the confidence threshold may be [0.35,0.5]. When the definition label corresponding to the confidence coefficient with the largest value is general and the confidence coefficient with the largest value is larger than or equal to the set confidence coefficient threshold value, the definition label with the general definition is proved to be more reliable, and the definition of the picture with the original size is determined to be general.
In step 504, when the sharpness label corresponding to the confidence level with the largest value is general and the confidence level with the largest value is smaller than the confidence level threshold, the sharpness label corresponding to the confidence level with the second largest value is determined as the sharpness of the picture with the original size.
When the definition label corresponding to the confidence coefficient with the largest value is general and the confidence coefficient with the largest value is smaller than the set confidence coefficient threshold value, determining that the general definition label is not credible, and determining the definition label corresponding to the confidence coefficient with the second largest value as the definition of the picture with the original size. It should be noted that, steps 501 to 504 are not only applicable to the reasoning stage of the definition of the corresponding picture of the multimedia material, but also applicable to the prediction stage of the definition of the sample picture.
In some embodiments, after step 104, further comprising: determining the definition of at least two pictures corresponding to the multimedia material; wherein the type of sharpness comprises the following sequentially increasing levels: blur, general and clear; when the picture corresponding to the definition with the highest grade is only one, determining the picture corresponding to the definition with the highest grade as the cover of the multimedia material; when at least two pictures corresponding to the definition with the highest level are provided, determining the picture corresponding to the definition with the highest level as a target picture, determining the confidence coefficient of the target picture corresponding to the definition with the highest level, and determining the target picture corresponding to the confidence coefficient with the highest value as the cover of the multimedia material.
Here, the application of the picture sharpness is explained taking the cover setting as an example. Firstly, determining the definition of at least two pictures corresponding to the multimedia material, wherein the type of definition comprises blurring, general definition and definition, and the grades are sequentially increased. When the picture corresponding to the definition with the highest grade is only one, the picture corresponding to the definition with the highest grade is directly determined to be the cover of the multimedia material; when at least two pictures corresponding to the definition with the highest level are provided, the picture corresponding to the definition with the highest level is named as a target picture for convenience of distinguishing, the confidence level of the picture corresponding to the definition with the highest level is determined, and the target picture corresponding to the confidence level with the highest value is determined as the cover of the multimedia material. For example, if the definition of the picture 1 corresponding to the multimedia material is clear, the definition of the corresponding picture 2 is clear, and the definition of the corresponding picture 3 is general, then the picture 1 and the picture 2 are determined as target pictures, and the confidence level of the definition of the type corresponding to the picture 1 is further determined, for example, 0.8, and the confidence level of the definition of the type corresponding to the picture 2 is determined, for example, 0.7, and since the confidence level of the picture 1 is higher, the confidence level is more reliable, the picture 1 is set as the cover of the multimedia material. The cover of the multimedia material can be presented on the search page when the user searches the multimedia material, although other presentation modes can exist.
As can be seen from the above exemplary implementation of fig. 5D according to the embodiment of the present invention, the embodiment of the present invention combines the distribution characteristics of the definitions, sets a confidence threshold for a general definition, and performs confidence re-judgment, thereby improving the accuracy of the finally determined definition.
In the following, an exemplary application of the embodiment of the present invention in a practical application scenario will be described.
An alternative architecture diagram of the machine learning model shown in fig. 6 is provided in an embodiment of the present invention, and in fig. 6, the machine learning model is a deep learning model constructed based on a convolutional neural network, and will be described in connection with each network layer. The backbone network of the machine learning model mainly consists of a block Sampling layer, a downsampling (downsampling) layer, an Adaptive pooling (Adaptive Pool) layer and a full-Connected (FC) layer. The block sampling layer is a Res_block layer, which can be composed of a convolution layer, a BN layer and an activation layer, wherein the convolution layer in the Res_block layer can select a plurality of convolution kernels of 5*5, 3*3 and 1*1, and the Res_block layer also comprises a shortcut operation; the Down Sampling layer may be composed of a convolution layer or a pooling layer with a step length of 2; the Adaptive Pool layer can convert feature images with any size into corresponding feature vectors, so that the input of the machine learning model can be pictures with any size, wherein the parameters of the feature vectors in the first dimension are the same as the channel number of the corresponding feature images.
The role of the various network layers in the machine learning model is described below in connection with the processing of pictures through the machine learning model. Firstly, carrying out normalization processing on a picture, converting the range of pixel values in the picture from 0 to 255 to 0 to 1, wherein the normalized picture can be represented by (w, h, c), w is the length of the picture, h is the width of the picture, and c is the channel number of the picture. And inputting the normalized picture into a first convolution layer and a pooling layer in the machine learning model, and reducing the dimension, thereby reducing the calculation amount of a subsequent network layer. Then, the feature map obtained by passing through the first res_block layer and the Down Sampling layer can be expressed as (w_1, h_1, c_1), wherein c_1 is the number of channels set in the convolution layer in the res_block layer, and the main purpose of the Down Sampling layer is also to reduce the dimension and improve the robustness of the machine learning model. After obtaining the feature map, on one hand, the feature map is used as an input of a next res_block layer, and on the other hand, the feature map is processed through an Adaptive Pool layer to obtain feature vectors expressed as (c_1, 1). The res_block layer and the Down Sampling layer correspond to the above intermediate Sampling layers, 4 intermediate Sampling layers are shown in fig. 6, and fewer or more intermediate Sampling layers can be set according to different practical application scenes.
In fig. 6, the number of channels set for the convolution layers in the 2 nd, 3 rd and 4 th res_block layers is c_2, c_3 and c_4, respectively, and after passing through the corresponding Down Sampling layers, feature maps indicated as (w_2, h_2, c_2), (w_3, h_3, c_3) and (w_4, h_4, c_4) are obtained, respectively, and it is worth noting that since the size of the input picture is not fixed, w and h in the feature map may change with the size change of the input picture, but the number of channels of the convolution layers in the res_block layers is fixed, so that c_1, c_2, c_3 and c_4 are fixed. And (3) respectively obtaining the feature vectors of (c_2, 1), (c_3, 1) and (c_4, 1) after the feature maps of (w_2, h_2, c_2), (w_3, h_3, c_3) and (w_4, h_4) pass through the corresponding Adaptive Pool layers, wherein the dimension of (c_2, 1) in the first dimension of (c_2, 1) is the first dimension of the above, and the dimension of (1) in the second dimension of the above. Then, the 4 feature vectors are combined according to the first dimension to obtain feature vectors (c_1+c_2+c_3+c_4, 1), and the feature vectors obtained by the combination are the picture features. And (c_1+c_2+c_3+c_4, 1) the feature vectors pass through a first full-connection layer and a discarding (Dropout) layer, and then pass through a second full-connection layer to carry out definition classification, so as to obtain a definition label and a corresponding confidence coefficient, wherein the discarding layer aims at preventing overfitting.
Since the input of the machine learning model (deep learning model) shown in fig. 6 is not limited in size, in the conventional manner, the number of lots selected during training is usually 1 for such machine learning model, but this adversely affects the training speed and the training accuracy. Therefore, in the embodiment of the invention, the sample pictures in the model training set are clustered according to the picture sizes, and the sample pictures with similar sizes are classified into one type. Specifically, at least one center size is set, and a size difference value between the size of the sample picture and the center size is calculated, for example, the size of the sample picture is w a And h a The center dimension is w b And h b The size phase difference value is |w a -w b |+|h a -h b | a. The invention relates to a method for producing a fibre-reinforced plastic composite. And when the size difference value is smaller than the set clustering threshold value, classifying the sample pictures into the class corresponding to the center size. After clustering is completed, the sizes of all sample pictures in each class are scaled to be consistent with the center size corresponding to the class, the class forms a batch of training, the number of the sample pictures included in the class is the batch number, and therefore, the training speed and the training precision can be improved through gradient descent of small batches. It should be noted that the loss function adopted by the machine learning model may be a cross entropy loss function or other loss functions, and in addition, for sample pictures not classified into any kind, the sample pictures may be added to the test set for testing The training effect of the machine learning model is tested.
The machine learning model shown in fig. 6 may be applied to various application scenarios related to the definition of a picture, for example, the picture corresponds to a multimedia material (such as a video), after the definition of the picture is obtained, the picture and the corresponding definition may be stored together, and when a user searches the multimedia material, the picture corresponding to the definition with the highest level (such as the definition is the definition) is recommended to the user. For another example, a cover condition may be set, and when the definition of the picture is clear, the picture is determined to be a cover of the multimedia material; when the definition of the picture is fuzzy or general, rejecting the picture, prompting the user to re-upload the picture or re-extract the picture in the multimedia material.
The embodiment of the invention provides a schematic diagram for determining definition as shown in fig. 7, wherein the definition type comprises blurring, general and definition examples, and because in an actual application scene, the definition is more pictures of a general type, sample pictures for training and actual pictures are kept in the same distribution, so that the sample definition is most of the sample pictures of the general type, and the condition of uneven sample distribution can be generated during training. In order to improve accuracy of the determined definition, in fig. 7, confidence judgment is performed in different manners for different definition labels output by the machine learning model. Specifically, when the definition label corresponding to the confidence coefficient with the largest numerical value output by the machine learning model is clear or fuzzy, determining the definition label corresponding to the confidence coefficient with the largest numerical value as the definition of the picture; when the definition label corresponding to the confidence coefficient with the largest value is general and the confidence coefficient with the largest value is larger than or equal to the confidence coefficient threshold value, setting the definition of the picture as general; and when the definition label corresponding to the confidence coefficient with the largest value is general and the confidence coefficient with the largest value is smaller than the confidence coefficient threshold value, determining the definition label corresponding to the confidence coefficient with the second largest value as the definition of the picture. The confidence threshold may be set according to the actual application scenario, specifically may be set in a range of 0.35-0.5, and fig. 7 illustrates that the confidence threshold is 0.5. And the recall rate of clear and fuzzy types and the overall accuracy of the machine learning model are improved through a confidence coefficient re-judging mode.
Continuing with the description below of an exemplary architecture in which the artificial intelligence based picture processing device 243 provided by embodiments of the present invention is implemented as a software module, in some embodiments, as shown in fig. 3, the software module stored in the artificial intelligence based picture processing device 243 of the memory 240 may include: a picture obtaining module 2431, configured to obtain a picture of an original size corresponding to the multimedia material; the feature extraction module 2432 is configured to perform feature extraction processing on the original-size picture to obtain a picture feature including picture information and semantic information; the classifying module 2433 is used for classifying the picture features to obtain definition labels and corresponding confidence levels; the sorting module 2434 is configured to sort at least two of the sharpness labels according to the corresponding confidence levels, and determine the sharpness of the original-sized picture according to the result of the sorting process.
In some embodiments, feature extraction module 2432 is further to: performing dimension reduction processing on the original-dimension picture through a dimension reduction layer of the machine learning model to obtain a dimension-reduced picture; sampling the dimension-reduced picture through an intermediate sampling layer of the machine learning model to obtain a feature map; converting the feature map through a self-adaptive pooling layer of the machine learning model to obtain a feature vector corresponding to the feature map; and combining at least two feature vectors according to the dimension of the channel number to obtain the picture features comprising picture information and semantic information.
In some embodiments, feature extraction module 2432 is further to: splicing at least two parameters of the feature vectors in a first dimension to obtain parameters of the picture features in the first dimension; determining the parameter of any feature vector in a second dimension as the parameter of the picture feature in the second dimension; the parameters of the feature vector in the first dimension are the same as the channel number of the corresponding feature map, and the second dimension is a dimension other than the first dimension;
the classification module 2433 is further configured to: and mapping the picture features through a full connection layer of the machine learning model to obtain definition labels and corresponding confidence degrees.
In some embodiments, the artificial intelligence based picture processing device 243 further includes: the sample acquisition module is used for acquiring a sample picture and corresponding sample definition; the prediction module is used for performing prediction processing on the sample picture through the machine learning model to obtain a definition label and a corresponding prediction confidence; the sample ordering module is used for ordering at least two definition labels according to the corresponding prediction confidence degrees, and determining the definition label corresponding to the prediction confidence degrees with the set ordering order as the prediction definition of the sample picture; a difference determination module for determining a difference between the sample sharpness and the predicted sharpness; and the gradient descent module is used for determining the gradient of the machine learning model according to the difference and updating the weight parameters of the full-connection layer, the middle sampling layer and the dimension reduction layer along the gradient descent direction.
In some embodiments, the gradient descent module is further to: acquiring a center size corresponding to the multimedia material, and determining a size phase difference value between the original size of the sample picture and the center size; when the size variance value is smaller than a clustering threshold, adding the sample picture to a center class corresponding to the center size; performing size scaling on all sample pictures in the center class according to the center size; and determining the gradient of the machine learning model according to the difference between the sample definition and the prediction definition corresponding to all the sample pictures in the center class.
In some embodiments, the ordering module 2434 is further to: when the definition label corresponding to the confidence coefficient with the largest numerical value is clear or fuzzy, determining the definition label corresponding to the confidence coefficient with the largest numerical value as the definition of the picture with the original size; when the definition label corresponding to the confidence coefficient with the largest value is general and the confidence coefficient with the largest value is larger than or equal to a confidence coefficient threshold value, determining the definition label corresponding to the confidence coefficient with the largest value as the definition of the picture with the original size; when the definition label corresponding to the confidence coefficient with the largest value is general and the confidence coefficient with the largest value is smaller than the confidence coefficient threshold value, determining the definition label corresponding to the confidence coefficient with the second largest value as the definition of the picture with the original size; wherein, the types of the definition labels include: blurred, general and clear.
In some embodiments, the artificial intelligence based picture processing device 243 further includes: the definition determining module is used for determining the definition of at least two pictures corresponding to the multimedia material; the first cover determining module is used for determining the picture corresponding to the definition with the highest grade as the cover of the multimedia material when the picture corresponding to the definition with the highest grade is only one picture; the second cover determining module is used for determining the picture corresponding to the definition with the highest grade as a target picture when the pictures corresponding to the definition with the highest grade are at least two, determining the confidence coefficient of the target picture corresponding to the definition with the highest grade, and determining the target picture corresponding to the confidence coefficient with the highest value as the cover of the multimedia material; wherein the type of sharpness comprises the following sequentially increasing levels: blurred, general and clear.
In some embodiments, the artificial intelligence based picture processing device 243 further includes: the first acquisition module is used for acquiring a first picture corresponding to the picture identification from the database; the second acquisition module is used for sending a request to the blockchain network according to the picture identification so as to acquire a second picture stored in the blockchain and corresponding to the picture identification; the hash module is used for carrying out hash processing on the first picture to obtain a first hash value and carrying out hash processing on the second picture to obtain a second hash value; and the uplink module is used for sending the definition of the first picture or the second picture to the blockchain network when the first hash value is the same as the second hash value, so that nodes of the blockchain network store the definition to the blockchain, and an index relation between the definition and the picture identification is established.
Embodiments of the present invention provide a storage medium storing executable instructions, wherein the executable instructions are stored, which when executed by a processor, cause the processor to perform the artificial intelligence based picture processing method provided by the embodiments of the present invention, for example, the artificial intelligence based picture processing method as shown in fig. 5A, 5B or 5D.
In some embodiments, the storage medium may be FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; but may be a variety of devices including one or any combination of the above memories.
In some embodiments, the executable instructions may be in the form of programs, software modules, scripts, or code, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and they may be deployed in any form, including as stand-alone programs or as modules, components, subroutines, or other units suitable for use in a computing environment.
As an example, the executable instructions may, but need not, correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, in one or more scripts in a hypertext markup language (HTML, hyper Text Markup Language) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
As an example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices located at one site or, alternatively, distributed across multiple sites and interconnected by a communication network.
In summary, the embodiment of the invention processes the picture with the original size, thereby reserving the information in the picture with the original size to the greatest extent, extracting the picture information and the semantic information in the picture, synthesizing the picture information and the semantic information to obtain the definition label, improving the accuracy of the obtained definition, and improving the applicability to the actual application scene; the feature extraction and classification of the pictures are carried out through the trained machine learning model, so that the labor cost is saved; after the definition of the picture is obtained, the definition can be applied to application scenes such as cover selection, picture recommendation and the like.
The foregoing is merely exemplary embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and scope of the present invention are included in the protection scope of the present invention.

Claims (11)

1. An artificial intelligence based picture processing method, comprising:
acquiring an original size picture corresponding to the multimedia material;
performing dimension reduction treatment on the picture with the original dimension to obtain a dimension-reduced picture;
sampling the dimension-reduced picture to obtain a feature picture, wherein the number of channels of the feature picture obtained based on the pictures with different sizes is the same;
performing self-adaptive pooling treatment on the feature images with different sizes to obtain feature vectors corresponding to the feature images;
combining at least two feature vectors to obtain picture features comprising picture information and semantic information;
classifying the picture features to obtain definition labels and corresponding confidence levels;
and sequencing at least two definition labels according to the corresponding confidence degrees, and determining the definition of the picture with the original size according to the sequencing result.
2. The picture processing method as claimed in claim 1, wherein,
the dimension reduction processing is completed through a dimension reduction layer of a machine learning model;
the sampling is accomplished through an intermediate sampling layer of the machine learning model;
The adaptive pooling process is accomplished by an adaptive pooling layer of the machine learning model.
3. The picture processing method as claimed in claim 1, wherein,
combining at least two feature vectors to obtain a picture feature comprising picture information and semantic information, wherein the combining comprises the following steps:
splicing at least two parameters of the feature vectors in a first dimension to obtain parameters of the picture features in the first dimension;
determining the parameter of any feature vector in a second dimension as the parameter of the picture feature in the second dimension;
the parameters of the feature vector in the first dimension are the same as the channel number of the corresponding feature map, and the second dimension is a dimension other than the first dimension;
the classifying the picture features to obtain definition labels and corresponding confidence levels comprises the following steps:
and mapping the picture features through a full connection layer of the machine learning model to obtain definition labels and corresponding confidence degrees.
4. A picture processing method as claimed in claim 3, further comprising:
acquiring a sample picture and corresponding sample definition;
Carrying out prediction processing on the sample picture through the machine learning model to obtain a definition label and a corresponding prediction confidence;
sequencing at least two definition labels according to the corresponding prediction confidence, and determining the definition label corresponding to the prediction confidence of the set sequencing order as the prediction definition of the sample picture;
determining a difference between the sample sharpness and the predicted sharpness;
and determining the gradient of the machine learning model according to the difference, and updating the weight parameters of the full-connection layer, the middle sampling layer and the dimension reduction layer along the gradient descending direction.
5. The picture processing method as claimed in claim 4, wherein the determining the gradient of the machine learning model from the difference comprises:
acquiring a center size corresponding to the multimedia material, and determining a size phase difference value between the original size of the sample picture and the center size;
when the size variance value is smaller than a clustering threshold, adding the sample picture to a center class corresponding to the center size;
performing size scaling on all sample pictures in the center class according to the center size;
And determining the gradient of the machine learning model according to the difference between the sample definition and the prediction definition corresponding to all the sample pictures in the center class.
6. The picture processing method according to any one of claims 1 to 5, wherein,
wherein, the types of the definition labels include: blur, general and clear;
the determining the definition of the original size picture according to the sequencing result includes:
when the definition label corresponding to the confidence coefficient with the largest numerical value is clear or fuzzy, determining the definition label corresponding to the confidence coefficient with the largest numerical value as the definition of the picture with the original size;
when the definition label corresponding to the confidence coefficient with the largest value is general and the confidence coefficient with the largest value is larger than or equal to a confidence coefficient threshold value, determining the definition label corresponding to the confidence coefficient with the largest value as the definition of the picture with the original size;
and when the definition label corresponding to the confidence coefficient with the largest value is general and the confidence coefficient with the largest value is smaller than the confidence coefficient threshold value, determining the definition label corresponding to the confidence coefficient with the second largest value as the definition of the picture with the original size.
7. The picture processing method according to any one of claims 1 to 5, wherein,
the type of sharpness includes the following sequentially increasing levels: blur, general and clear;
the method further comprises the steps of:
determining the definition of at least two pictures corresponding to the multimedia material;
when the picture corresponding to the definition with the highest grade is only one, determining the picture corresponding to the definition with the highest grade as the cover of the multimedia material;
when at least two pictures corresponding to the definition with the highest level are provided, determining the picture corresponding to the definition with the highest level as a target picture, and
and determining the confidence coefficient of the target picture corresponding to the definition with the highest level, and determining the target picture corresponding to the confidence coefficient with the highest value as the cover of the multimedia material.
8. The picture processing method according to any one of claims 1 to 5, further comprising:
acquiring a first picture corresponding to a picture identifier from a database;
sending a request to a blockchain network according to the picture identification to acquire a second picture stored by the blockchain and corresponding to the picture identification;
carrying out hash processing on the first picture to obtain a first hash value, and carrying out hash processing on the second picture to obtain a second hash value;
When the first hash value is the same as the second hash value, sending the sharpness of the first picture or the second picture to the blockchain network so that
And the node of the blockchain network stores the definition into the blockchain, and establishes an index relation between the definition and the picture identification.
9. An artificial intelligence based picture processing apparatus, comprising:
the picture acquisition module is used for acquiring pictures with original sizes corresponding to the multimedia materials;
the feature extraction module is used for carrying out dimension reduction processing on the picture with the original dimension to obtain a dimension-reduced picture; sampling the dimension-reduced picture to obtain a feature picture, wherein the number of channels of the feature picture obtained based on the pictures with different sizes is the same; performing self-adaptive pooling treatment on the feature images with different sizes to obtain feature vectors corresponding to the feature images; combining at least two feature vectors to obtain picture features comprising picture information and semantic information;
the classification module is used for classifying the picture features to obtain definition labels and corresponding confidence levels;
And the sorting module is used for sorting at least two definition labels according to the corresponding confidence degrees and determining the definition of the picture with the original size according to the sorting result.
10. An electronic device, comprising:
a memory for storing executable instructions;
a processor for implementing the artificial intelligence based picture processing method of any one of claims 1 to 8 when executing executable instructions stored in the memory.
11. A computer readable storage medium storing executable instructions for implementing the artificial intelligence based picture processing method according to any one of claims 1 to 8 when executed by a processor.
CN201911239861.4A 2019-12-06 2019-12-06 Picture processing method and device based on artificial intelligence and electronic equipment Active CN110929806B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911239861.4A CN110929806B (en) 2019-12-06 2019-12-06 Picture processing method and device based on artificial intelligence and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911239861.4A CN110929806B (en) 2019-12-06 2019-12-06 Picture processing method and device based on artificial intelligence and electronic equipment

Publications (2)

Publication Number Publication Date
CN110929806A CN110929806A (en) 2020-03-27
CN110929806B true CN110929806B (en) 2023-07-21

Family

ID=69858156

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911239861.4A Active CN110929806B (en) 2019-12-06 2019-12-06 Picture processing method and device based on artificial intelligence and electronic equipment

Country Status (1)

Country Link
CN (1) CN110929806B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111553431A (en) * 2020-04-30 2020-08-18 上海眼控科技股份有限公司 Picture definition detection method and device, computer equipment and storage medium
CN111597361B (en) * 2020-05-19 2021-09-14 腾讯科技(深圳)有限公司 Multimedia data processing method, device, storage medium and equipment
CN111798414A (en) * 2020-06-12 2020-10-20 北京阅视智能技术有限责任公司 Method, device and equipment for determining definition of microscopic image and storage medium
CN112614110B (en) * 2020-12-24 2022-11-04 Oppo(重庆)智能科技有限公司 Method and device for evaluating image quality and terminal equipment
CN112926689A (en) * 2021-03-31 2021-06-08 珠海格力电器股份有限公司 Target positioning method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108205661A (en) * 2017-12-27 2018-06-26 浩云科技股份有限公司 A kind of ATM abnormal human face detection based on deep learning
CN109102491A (en) * 2018-06-28 2018-12-28 武汉大学人民医院(湖北省人民医院) A kind of gastroscope image automated collection systems and method
CN110188627A (en) * 2019-05-13 2019-08-30 睿视智觉(厦门)科技有限公司 A kind of facial image filter method and device
CN110457878A (en) * 2019-08-14 2019-11-15 北京中电普华信息技术有限公司 A kind of identity identifying method based on block chain, apparatus and system
CN110472681A (en) * 2019-08-09 2019-11-19 北京市商汤科技开发有限公司 The neural metwork training scheme and image procossing scheme of knowledge based distillation
CN110533097A (en) * 2019-08-27 2019-12-03 腾讯科技(深圳)有限公司 A kind of image definition recognition methods, device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108205661A (en) * 2017-12-27 2018-06-26 浩云科技股份有限公司 A kind of ATM abnormal human face detection based on deep learning
CN109102491A (en) * 2018-06-28 2018-12-28 武汉大学人民医院(湖北省人民医院) A kind of gastroscope image automated collection systems and method
CN110188627A (en) * 2019-05-13 2019-08-30 睿视智觉(厦门)科技有限公司 A kind of facial image filter method and device
CN110472681A (en) * 2019-08-09 2019-11-19 北京市商汤科技开发有限公司 The neural metwork training scheme and image procossing scheme of knowledge based distillation
CN110457878A (en) * 2019-08-14 2019-11-15 北京中电普华信息技术有限公司 A kind of identity identifying method based on block chain, apparatus and system
CN110533097A (en) * 2019-08-27 2019-12-03 腾讯科技(深圳)有限公司 A kind of image definition recognition methods, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN110929806A (en) 2020-03-27

Similar Documents

Publication Publication Date Title
CN110929806B (en) Picture processing method and device based on artificial intelligence and electronic equipment
CN110084377B (en) Method and device for constructing decision tree
CN107766940B (en) Method and apparatus for generating a model
CN112749749B (en) Classification decision tree model-based classification method and device and electronic equipment
KR102504498B1 (en) Method and apparatus for verifying medical fact
CN113761261A (en) Image retrieval method, image retrieval device, computer-readable medium and electronic equipment
CN105956469A (en) Method and device for identifying file security
WO2015036531A2 (en) Knowledge management system
CN111274482A (en) Intelligent education system and method based on virtual reality and big data
CN110046297A (en) Recognition methods, device and the storage medium of O&M violation operation
CN114580794B (en) Data processing method, apparatus, program product, computer device and medium
CN114219971A (en) Data processing method, data processing equipment and computer readable storage medium
CN114239863B (en) Training method of machine learning model, prediction method and device thereof, and electronic equipment
CN113569111B (en) Object attribute identification method and device, storage medium and computer equipment
CN115114329A (en) Method and device for detecting data stream abnormity, electronic equipment and storage medium
CN112861009A (en) Artificial intelligence based media account recommendation method and device and electronic equipment
CN110837657B (en) Data processing method, client, server and storage medium
CN117435999A (en) Risk assessment method, apparatus, device and medium
CN113362852A (en) User attribute identification method and device
CN112749686B (en) Image detection method, image detection device, computer equipment and storage medium
CN111461091B (en) Universal fingerprint generation method and device, storage medium and electronic device
CN111615178B (en) Method and device for identifying wireless network type and model training and electronic equipment
CN112163635B (en) Image classification method, device, server and medium based on deep learning
CN113572913B (en) Image encryption method, device, medium and electronic equipment
CN117591770B (en) Policy pushing method and device and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40022650

Country of ref document: HK

TA01 Transfer of patent application right

Effective date of registration: 20221125

Address after: 1402, Floor 14, Block A, Haina Baichuan Headquarters Building, No. 6, Baoxing Road, Haibin Community, Xin'an Street, Bao'an District, Shenzhen, Guangdong 518133

Applicant after: Shenzhen Yayue Technology Co.,Ltd.

Address before: Room 1601-1608, Floor 16, Yinke Building, 38 Haidian Street, Haidian District, Beijing

Applicant before: Tencent Technology (Beijing) Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant