WO2023207823A1 - 类别描述的特征信息的获取方法、图像的处理方法及设备 - Google Patents

类别描述的特征信息的获取方法、图像的处理方法及设备 Download PDF

Info

Publication number
WO2023207823A1
WO2023207823A1 PCT/CN2023/089990 CN2023089990W WO2023207823A1 WO 2023207823 A1 WO2023207823 A1 WO 2023207823A1 CN 2023089990 W CN2023089990 W CN 2023089990W WO 2023207823 A1 WO2023207823 A1 WO 2023207823A1
Authority
WO
WIPO (PCT)
Prior art keywords
category
feature information
categories
information
image
Prior art date
Application number
PCT/CN2023/089990
Other languages
English (en)
French (fr)
Inventor
卢禹宁
刘健庄
田新梅
Original Assignee
华为技术有限公司
中国科学技术大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司, 中国科学技术大学 filed Critical 华为技术有限公司
Publication of WO2023207823A1 publication Critical patent/WO2023207823A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures

Definitions

  • the present application relates to the field of artificial intelligence, and in particular, to a method for obtaining characteristic information of a category description, an image processing method and a device.
  • the latest progress in the vision-language model points out that the features of images of the same category are similar to the features of language descriptions. Therefore, if you need to determine the category of the object in the image from C categories, you can obtain the features of the C categories.
  • the category description corresponding to each category. Since the characteristics of the category description in text form of the same category are similar to the characteristics of the image, the characteristics of the category description of each category can be used to assist in the identification of objects in the image.
  • the current method is to manually design the category description template for each category, and combine the manually designed category description template with the category name to obtain the category description.
  • the manually designed category description template is "This is a XXX" and the category name is "Cat”
  • the category description is "This is a cat”.
  • manually designed category description templates will introduce human bias
  • the manually designed category description templates are not necessarily optimal for the image recognition task, and in order to obtain a suitable category description template, manual repetition is required and time-consuming. Try out multiple category description templates.
  • Embodiments of the present application provide a method for obtaining feature information of category descriptions, an image processing method and related equipment, automatically learning the feature information of at least two category descriptions corresponding to each category, and the goal of iterative update includes improving the image
  • the accuracy of the recognition task is conducive to obtaining a category description that is more consistent with the recognition task; it is also conducive to improving the adaptability to different images in the same category to further improve the accuracy of the image recognition process.
  • embodiments of the present application provide a method for obtaining characteristic information of category descriptions, which can be used in the field of image processing in the field of artificial intelligence.
  • the method includes: a first network device obtains the corresponding first category of C categories.
  • Feature information described by K categories, C and K are both integers greater than or equal to 2.
  • the category description of the first category includes the category description template and the first category; as an example, for example, C categories are C cats of different breeds , the value of K is 3, then the three different category description templates can be respectively "This is a XX", "This is a cat, the specific breed is XX” and "This is a cat with the breed XX” "Cat”, the category name of the first category is "American Shorthair”, then the three different category descriptions corresponding to the first category are "This is a American Shorthair”, "This is a cat, the specific breed is American Shorthair” and “This is a cat of American Shorthair”.
  • the first network device generates predicted category information of the image based on the feature information described by the K categories corresponding to each first category and the feature information of the image, and the predicted category of the image pointed to by the predicted category information is included in C categories; specifically, The first network device obtains the similarity between the features of the training image and the high-level features corresponding to each of the C categories in the target feature information set, and calculates the K corresponding to each first category according to the first loss function.
  • the feature information described by each category is updated until the convergence condition is met.
  • the goal of using the first loss function to iteratively update includes improving the similarity between the predicted category information and the correct category information of the image.
  • the feature information described by K categories corresponding to each first category in C categories is obtained, and based on the feature information described by K categories corresponding to each first category and the feature information of the image, an image is generated. Predict category information. Based on the correct category information of the image, the predicted category information and the first loss function, the feature information described by K categories is automatically updated until the convergence conditions are met.
  • the goals of iterative update using the first loss function include improving prediction The similarity between the category information and the correct category information of the image; through the aforementioned scheme, the feature information described by K categories corresponding to each category can be automatically learned, and the goal of iterative update includes improving the accuracy of the image recognition task, with It is beneficial to obtain category descriptions that better match the recognition task; due to various changes in objects of the same category in different images, the most suitable category descriptions corresponding to different images of the same category may be different. Obtained in this solution The K category descriptions corresponding to each category are conducive to improving the adaptability to different images in the same category to further improve the accuracy of the image recognition process.
  • the feature information described by the K categories corresponding to the first category includes first feature information and second feature information, and the position of the feature information of the first category in the first feature information The position of the feature information of the first category in the second feature information is different.
  • the feature information of the category descriptions adapted to different images in the same category may be different.
  • the position of the feature information of the first category in the first feature information is different from the position of the feature information of the first category in the second feature information, which is beneficial to improving the diversity of the finally obtained feature information described by at least two categories. , to improve the adaptability to different images in the same category, which in turn helps improve the accuracy of image recognition results.
  • the feature information described by the K categories includes high-level features described by the K categories, and the high-level features described by the categories are generated by the hidden layer/first neural network in the first neural network.
  • the first neural network is used to perform feature updates,on category descriptions.
  • the first network device generates predicted category information of the image based on the feature information of the K category descriptions corresponding to each first category and the feature information of the image, including: the first network device can generate predicted category information of the image based on the K category descriptions corresponding to the first category.
  • the high-level features of the first category are modeled using a target model to determine the distribution information of the high-level features described by the category corresponding to the first category.
  • the target model can use a Gaussian distribution model or a Gaussian mixture. Distribution model, von Mises distribution model or other types of models, etc.
  • the first network device performs a sampling operation according to the distribution information of the high-level features described by the categories corresponding to each first category, and obtains a set of feature information.
  • the set of feature information includes high-level features corresponding to each first category; exemplary, If a Gaussian distribution model is used to model the high-level features described by the category corresponding to the first category, the first network device can perform the sampling operation based on the mean and variance of the high-level features described by the K categories corresponding to the first category. , at least one high-level feature obtained by sampling obeys the distribution of high-level features described by the category corresponding to the first category.
  • the first network device generates predicted category information of the image based on the feature information and feature information set of the image.
  • the sampling operation can be performed to obtain the high-level features corresponding to each first category based on the distribution information of the high-level features described by the category corresponding to each first category, and based on the sampled high-level features and the image Feature information generates predicted category information of the image; since the predicted features of the image are used to generate the function value of the first loss function, that is, the function value of the first loss function is obtained based on the high-level features obtained by sampling, the purpose of iterative update includes reducing The function value of the first loss function, that is, the purpose of iterative update includes high-level features based on sampling (that is, high-level features around the high-level features described by K categories) can also obtain more accurate predicted category information, that is, a more accurate prediction method is established.
  • a high update standard is conducive to obtaining better feature information of category description, which in turn is conducive to improving the accuracy of image recognition
  • the first network device obtains the characteristic information described by K categories corresponding to each first category among the C categories, including: the first network device obtains the characteristic information described by each first category.
  • the underlying features of the corresponding K category descriptions are vectorized category descriptions; input the underlying features of the category descriptions into the first neural network, and update the underlying features of the category descriptions through the first neural network to obtain High-level features described by categories.
  • the first network device updates the feature information described by the K categories corresponding to each first category, including: the first network device keeps the parameters of the first neural network unchanged according to the function of the first loss function. Gradient updates are performed on the underlying features of the K category description templates corresponding to the C categories to obtain updated underlying features of the K category descriptions corresponding to each first category.
  • the first network device updates the feature information described by K categories corresponding to each first category according to the first loss function, including: the first network device updates the feature information described by the K categories corresponding to each first category, including: The loss function and the second loss function update the feature information of the K category descriptions corresponding to each first category.
  • the goal of using the second loss function for iterative update includes reducing the gap between the feature information of the K category description templates. Similarity.
  • the function value of the second loss function is also used to update the feature information of the K category descriptions corresponding to each first category.
  • the goal of using the second loss function for iterative update includes reducing at least two category description templates.
  • the similarity between the feature information that is, amplifying the distance between the feature information described by the K categories corresponding to each first category, that is, further improving the feature information described by the K categories corresponding to each first category. Diversity to improve the adaptability to different images in the same category, which in turn helps improve the accuracy of image recognition results.
  • the function value of the first loss function is greater than or equal to the function value of the objective function.
  • the objective function is the distance between the predicted category information and the correct category information of the image.
  • the goals of the iterative update include Reduce the function value of the first loss function.
  • the goal of updating includes reducing the function value of the first loss function, that is, the goal of iterative updating includes reducing the function value of the objective function, and the objective function is the distance between the predicted category information and the correct category information of the image. , that is, reducing the distance between the predicted category information and the correct category information of the image, that is, while maintaining the correct training target, a simpler loss function can be used instead, which is conducive to expanding the implementation of this solution flexibility.
  • embodiments of the present application provide an image processing method, which can be used in the field of image processing in the field of artificial intelligence.
  • the method includes: the second network device performs feature extraction on the image to obtain feature information of the image; according to the C
  • the feature information of the category description and the feature information of the image corresponding to each first category in the category generate predicted category information of the image.
  • the predicted category of the image pointed to by the predicted category information is included in C categories, and C is an integer greater than or equal to 2. .
  • the category description of the first category includes a category description template and a first category
  • the feature information corresponding to each first category in the C categories is obtained based on the feature information of at least two category descriptions corresponding to each first category
  • the characteristic information described by at least two categories corresponding to each first category is obtained by iteratively updating using the first loss function.
  • the goal of using the first loss function to iteratively update includes improving the predicted category information and the correct category information of the image. similarity between.
  • the second network device can also be used to perform the steps performed by the first network device in the first aspect and each possible implementation manner of the first aspect. At least two categories corresponding to each of the above first categories
  • the described characteristic information can be obtained through the methods in the first aspect and various possible implementations of the first aspect, the specific implementation of the steps in each possible implementation of the second aspect, the meanings of the nouns and the beneficial effects thereof, You can refer to the first aspect and will not go into details here.
  • embodiments of the present application provide a device for acquiring characteristic information of category descriptions, which can be used in the field of image processing in the field of artificial intelligence.
  • the device includes: an acquisition module, configured to acquire the first category associated with each of the C categories.
  • Corresponding feature information of at least two category descriptions, C is an integer greater than or equal to 2
  • the category description of the first category includes a category description template and the first category
  • the generation module is used to generate the feature information based on at least two categories corresponding to each first category.
  • the feature information described by each category and the feature information of the image are used to generate predicted category information of the image.
  • the predicted category of the image pointed to by the predicted category information is included in C categories;
  • the update module is used to calculate each of the first loss function based on the first loss function.
  • the feature information described by at least two categories corresponding to one category is updated until the convergence condition is met.
  • the goal of using the first loss function to iteratively update includes improving the similarity between the predicted category information and the correct
  • the device for obtaining the characteristic information of the category description can also be used to perform the steps performed by the first network device in the first aspect and each possible implementation manner of the first aspect.
  • the meanings of the nouns and the beneficial effects brought about please refer to the first aspect, and will not be repeated here.
  • embodiments of the present application provide an image processing device, which can be used in the field of image processing in the field of artificial intelligence.
  • the device includes: an acquisition module for feature extraction of images to obtain feature information of the image; a generation module, Used to generate predicted category information of the image based on the feature information of the category description corresponding to each first category in the C categories and the feature information of the image.
  • the predicted category of the image pointed to by the predicted category information is included in the C categories, and C is An integer greater than or equal to 2; wherein, the category description of the first category includes a category description template and the first category, and the feature information corresponding to each first category in the C categories is based on at least two corresponding to each first category
  • the characteristic information of the category description is obtained.
  • the characteristic information of at least two category descriptions corresponding to each first category is obtained by iteratively updating using the first loss function.
  • the goal of using the first loss function for iterative updating includes improving the predicted category information. and the correct category information of the image.
  • the image processing device can also be used to perform the steps performed by the first network device in the second aspect and each possible implementation manner of the second aspect.
  • the details of the steps in each possible implementation manner of the fourth aspect Reality For the present method, the meaning of nouns and the beneficial effects, please refer to the second aspect and will not be repeated here.
  • inventions of the present application provide a computer program product.
  • the computer program product includes a program. When the program is run on a computer, it causes the computer to execute the method described in the first aspect or the second aspect.
  • inventions of the present application provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program that, when run on a computer, causes the computer to execute the above-mentioned first aspect or second aspect. the method described.
  • embodiments of the present application provide a network device, including a processor and a memory.
  • the processor is coupled to the memory.
  • the memory is used to store programs; the processor is used to execute the program in the memory, so that the network device executes the program. The method described in the first aspect above.
  • embodiments of the present application provide a network device, including a processor and a memory.
  • the processor is coupled to the memory.
  • the memory is used to store programs; the processor is used to execute the program in the memory, so that the network device executes the program. The method described in the second aspect above.
  • the present application provides a chip system, which includes a processor and is used to support a network device to implement the functions involved in the above aspects, for example, sending or processing data involved in the above methods and/or information.
  • the chip system further includes a memory, and the memory is used to store necessary program instructions and data of the second network device or communication device.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • Figure 1 is a schematic structural diagram of the artificial intelligence main framework provided by the embodiment of the present application.
  • Figure 2a is a system architecture diagram of a system for obtaining feature information of category descriptions provided by an embodiment of the present application
  • Figure 2b is a schematic flowchart of a method for obtaining feature information of category descriptions provided by an embodiment of the present application
  • Figure 3 is a schematic flowchart of a method for obtaining feature information of category descriptions provided by an embodiment of the present application
  • Figure 4 is a schematic diagram of the characteristic information of the category description in the method for obtaining the characteristic information of the category description provided by the embodiment of the present application;
  • Figure 5 is a schematic diagram of the feature distribution of high-level features described by categories corresponding to multiple categories provided by the embodiment of the present application;
  • Figure 6 is a schematic flowchart of determining distribution information of high-level features described in categories corresponding to multiple categories provided by an embodiment of the present application;
  • Figure 7 is a schematic flowchart of a method for obtaining feature information of category descriptions provided by an embodiment of the present application.
  • Figure 8 is a schematic flow chart of an image processing method provided by an embodiment of the present application.
  • Figure 9a is a schematic diagram illustrating the beneficial effects of the method for obtaining feature information of category descriptions provided by the embodiment of the present application.
  • Figure 9b is a schematic diagram illustrating the beneficial effects of the method for obtaining feature information of category descriptions provided by the embodiment of the present application.
  • Figure 10 is a schematic structural diagram of a device for obtaining feature information of category descriptions provided by an embodiment of the present application.
  • Figure 11 is a schematic structural diagram of an image processing device provided by an embodiment of the present application.
  • Figure 12 is a schematic structural diagram of a network device provided by an embodiment of the present application.
  • Figure 13 is a schematic structural diagram of a network device provided by an embodiment of the present application.
  • Figure 14 is a schematic structural diagram of a chip provided by an embodiment of the present application.
  • Figure 1 shows a structural schematic diagram of the artificial intelligence main framework.
  • the following is from the “intelligent information chain” (horizontal axis) and “IT value chain” ( The above artificial intelligence theme framework is elaborated on the two dimensions of vertical axis).
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensation process of "data-information-knowledge-wisdom".
  • the "IT value chain” reflects the value that artificial intelligence brings to the information technology industry, from the underlying infrastructure of human intelligence and information (providing and processing technology implementation) to the systematic industrial ecological process.
  • Infrastructure provides computing power support for artificial intelligence systems, enables communication with the external world, and supports it through basic platforms.
  • computing power is provided by a smart chip, which can specifically use a central processing unit (CPU), an embedded neural network processor (neural-network processing unit, NPU), a graphics processor ( Graphics processing unit (GPU), application specific integrated circuit (ASIC) or field programmable gate array (FPGA) and other hardware acceleration chips;
  • the basic platform includes distributed computing framework and network and other related platforms Guarantee and support can include cloud storage and computing, interconnection networks, etc.
  • sensors communicate with the outside world to obtain data, which are provided to smart chips in the distributed computing system provided by the basic platform for calculation.
  • Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, and text, as well as IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
  • machine learning and deep learning can symbolize and formalize intelligent information modeling, extraction, and Preprocessing, training, etc.
  • Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formal information to perform machine thinking and problem solving based on reasoning control strategies. Typical functions are search and matching.
  • Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of further data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, and image processing. identification, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Its application fields mainly include: intelligent terminals, intelligent manufacturing, Smart transportation, smart home, smart healthcare, smart security, autonomous driving, smart city, etc.
  • the embodiments of the present application can be applied to various application fields in the field of artificial intelligence, and are specifically used to identify items in images in various application fields. If the second network device determines the predicted category of the image from C categories, it can obtain the feature information of the category description corresponding to each of the C categories in advance, and then based on the category description corresponding to each of the C categories The similarity between the feature information and the feature information of the image determines the predicted category of the image.
  • Figure 2a is a system architecture diagram of a system for obtaining feature information of category descriptions provided by an embodiment of the present application.
  • the acquisition 200 of feature information of category descriptions includes a first network device 210, a database 220, a second Network device 230 and data storage system 240, the second network device 230 includes a computing module 231.
  • the database 220 stores a training data set.
  • the first network device 210 can obtain the characteristic information described by at least two categories corresponding to each first category in the C categories, and use the training data set to match each first category.
  • the feature information described by at least two categories corresponding to the category is iteratively updated until the convergence conditions are met.
  • Figure 2b is a schematic flowchart of a method for obtaining feature information of category descriptions provided by an embodiment of the present application.
  • the first network device obtains characteristic information of at least two category descriptions corresponding to each first category in C categories, C is an integer greater than or equal to 2, and the category description of the first category includes a category description template and First category; A2.
  • the first network device generates predicted category information of the image based on the feature information described by at least two categories corresponding to each first category and the feature information of the image.
  • the predicted category of the image pointed to by the predicted category information is included in C categories; A3.
  • the first network device updates the feature information described by at least two categories corresponding to each first category according to the first loss function until the convergence condition is met, and uses the first loss function to perform iterative updates.
  • the goals include improving the similarity between the predicted class information and the correct class information of the image.
  • the embodiment of the present application can automatically learn the feature information of at least two category descriptions corresponding to each category, and the goal of iterative update includes improving the accuracy of image recognition tasks, which is conducive to obtaining category descriptions that better match the recognition tasks. And because objects of the same category have various changes in different images, the most suitable category descriptions corresponding to different images of the same category may be different. In this solution, at least two category descriptions corresponding to each category are obtained. It is beneficial to improve the fit with different images in the same category to further improve the accuracy of the image recognition process.
  • the feature information of the category description corresponding to each first category in the C categories obtained by the first network device 210 can be deployed in the second network device 230 in various forms.
  • the second network device 230 can call data, codes, etc. in the data storage system 240, and can also store data, instructions, etc. in the data storage system 240.
  • the data storage system 240 may be placed in the second network device 230, or the data storage system 240 may be an external memory relative to the second network device 230.
  • the second network device 230 can be configured in the client device, and the "user" can directly interact with the second network device 230 for data.
  • the client device is a mobile phone or tablet
  • the second network device 230 can be a module used for image recognition in the main processor (Host CPU) of the mobile phone or tablet.
  • the second network device 230 can also be a mobile phone or tablet.
  • the graphics processing unit (GPU) or neural network processor (NPU) in the tablet is mounted on the main processor as a co-processor, and the main processor distributes tasks.
  • Figure 2a is only a schematic architectural diagram of two image processing systems provided by embodiments of the present invention, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the second network device 230 and the client device 250 may be independent devices, and the second network device 230 performs data interaction with the client device through a configured I/O interface.
  • "User" The image can be input through the I/O interface 212 on the client device, and the second network device 230 returns the predicted category of the image to the client device through the I/O interface and provides it to the user.
  • Figure 3 is a flow diagram of a method for obtaining the characteristic information of the category description provided by the embodiment of the present application.
  • the acquisition of the characteristic information of the category description provided by the embodiment of the present application is Methods can include:
  • the first network device obtains feature information described by K categories corresponding to each first category among the C categories.
  • the first network device obtains characteristic information described by K categories corresponding to each first category in C categories, where C is an integer greater than or equal to 2, and K is an integer greater than or equal to 2.
  • the feature information described by the K categories corresponding to each first category includes high-level features described by the aforementioned K categories, and may also include low-level features described by the aforementioned K categories.
  • the first network device can first obtain the underlying features of the K category descriptions corresponding to the first category, and the category description The underlying features of are obtained by vectorizing the category description.
  • a category description for each first category includes a category description template and the name of the first category.
  • the first network device inputs the underlying features described by each category corresponding to the first category into the first neural network, through The first neural network updates the low-level features of the category description to obtain high-level features of each category description corresponding to the first category.
  • the first neural network can use a text encoder, and the high-level features of the category description are Hidden layer in the first neural network/Features generated by the first neural network.
  • the hidden layer in the first neural network refers to the output of any intermediate layer of the first neural network.
  • the first network device executes the above method for each first category among the C categories, and can obtain feature information described by K categories corresponding to each first category among the C categories.
  • the first network device can initialize the underlying features of the K category description templates Features, the first network device vectorizes the name of the first category to obtain the underlying features of the name of the first category; the first network device combines the feature information of each category description template in the K category description templates with the first Combining the feature information of the category names, the underlying features described by the K categories corresponding to the first category can be obtained.
  • the first network device can perform vectorization processing on the K category description templates to obtain the underlying features of the K category description templates; the first network device performs vectorization processing on the K category description templates; The name of the category is vectorized to obtain the underlying features of the name of the first category; the first network device combines the feature information of each category description template in the K category description templates with the feature information of the name of the first category, and can Obtain the underlying features described by K categories corresponding to the first category.
  • the first network device may initialize K category description templates, and the category description templates may also be called prompt templates.
  • Each category description template among the K category description templates can be combined with the category name (class name) of the first category to obtain K category descriptions (category description) corresponding to the first category; the first network device can first The K category descriptions corresponding to the categories are vectorized to obtain the underlying features of the K category descriptions corresponding to the first category.
  • the first network device may perform the above operation on each first category among the C categories to obtain K category descriptions corresponding to each first category among the C categories. It should be noted that the category descriptions in K corresponding to different categories in the C categories may be the same or different.
  • the underlying features described by K categories corresponding to the same first category include the first underlying feature and the second underlying feature, the position of the underlying feature of the name of the first category in the first underlying feature and the first category
  • the low-level features of the names have different positions in the second low-level features;
  • the high-level features described by K categories corresponding to the same first category include the first high-level features and the second high-level features, and the names of the first categories
  • the position of the high-level feature in the first high-level feature and the position of the high-level feature of the name of the first category in the second high-level feature may be different.
  • the K category descriptions may include a first category description and a second category description, and the position of the name of the first category in the first category description and the position of the name of the first category in the second category description may be different.
  • Figure 4 is a schematic diagram of the feature information of the category description in the method for obtaining the feature information of the category description provided by the embodiment of the present application. It should be noted that in Figure 4 The feature information of the category description is displayed in a visual manner, as shown in the figure. In the feature information of different category descriptions, the position of the feature information of the category name can be different. It should be understood that the example in Figure 4 is for convenience only. Understanding this plan does not limit this plan.
  • the first network device determines the distribution information of the high-level features of the category description corresponding to the first category based on the high-level features described by at least two categories corresponding to the first category.
  • the distribution of high-level features described by multiple categories corresponding to the same category is relatively concentrated, that is, the high-level features described by multiple categories corresponding to the same category are distributed in the feature space. Adjacent, so the first network device can, based on the high-level features described by the K categories corresponding to each first category among the C categories, calculate the high-level features described by the categories corresponding to each first category among the C categories. The distribution is modeled to determine the distribution information of the high-level features described by the categories corresponding to each first category in the C categories.
  • Figure 5 is a schematic diagram of the feature distribution of high-level features described by multiple categories corresponding to multiple categories provided in the embodiment of the present application.
  • Figure 5 is a schematic diagram of the feature distribution corresponding to multiple categories. Obtained by visualizing the high-level features described by the category.
  • Each point in Figure 5 represents the high-level features described by a category.
  • the high-level features described by multiple categories corresponding to the same category are close to each other. Different categories The high-level features described by the corresponding categories are far away. It should be understood that the example in Figure 5 is only for convenience of understanding this solution and is not used to limit this solution.
  • the first network device can use the target model to model the high-level features described by the category corresponding to the first category to determine Used to describe the distribution information of high-level features described by the category corresponding to the first category; among them, the target model can adopt a Gaussian distribution model, a mixed Gaussian distribution model, a von Mises distribution model or other types of models, etc., No exhaustive list will be made here.
  • the target model can adopt a Gaussian distribution model, a mixed Gaussian distribution model, a von Mises distribution model or other types of models, etc., No exhaustive list will be made here.
  • the first network device needs to obtain the mean and variance of the high-level features described by K categories corresponding to the first category (i.e. An example of the distribution information of high-level features described by the category corresponding to the first category). It should be noted that if other models are used for modeling, other types of distribution information need to be obtained, which is not limited here.
  • ⁇ (P K ) represents the mean vector of high-level features described by K categories corresponding to the first category
  • P k represents the k-th category description template among the K category description templates corresponding to the first category
  • K is the The number of category description templates corresponding to a category
  • w 1:C (P k ) includes C high-level features corresponding to the k-th category description template
  • each high-level feature in w 1:C (P k ) includes the k-th category description template.
  • the high-level features of the text description of k category description templates combined with the category name of one category.
  • ⁇ (P K ) represents the covariance matrix of high-level features described by the K categories corresponding to the first category
  • (w 1:C (P k )- ⁇ ) T represents the transformation of w 1:C (P k )- ⁇ position
  • formula (1) and formula (2) both use the same category description template for C categories as an example, and are not used to limit this solution.
  • the first network device performs the above operation for each first category among the C categories, so as to obtain the distribution information of high-level features described by the category corresponding to each first category among the C categories.
  • Figure 6 is a schematic flowchart of determining the distribution information of high-level features described in categories corresponding to multiple categories provided by an embodiment of the present application.
  • Section 6 A network device can obtain the underlying features of multiple category description templates and combine the underlying features of each category description template with the underlying features of a category name to obtain the underlying features of a category description.
  • Figure 6 shows dog, bird and There are three categories of cats.
  • the first network device inputs the underlying features described by multiple categories corresponding to each of the three categories into the first neural network, and obtains the multiple category descriptions output by the first neural network corresponding to each of the three categories.
  • High level features The first network device can obtain the distribution information of the high-level features described by multiple categories corresponding to each category based on the high-level features described by multiple categories corresponding to each category. It should be understood that the example in Figure 6 is only for convenience in understanding this solution. , is not used to limit this plan.
  • the first network device performs a sampling operation according to the distribution information of the high-level features described by the categories corresponding to each first category, and obtains a target feature information set.
  • the target feature information set includes high-level features corresponding to each first category.
  • the first network device can be based on the high-level description of the K categories corresponding to the first category. Distribution information of features, sampling to obtain at least one high-level feature. For example, if a Gaussian distribution model is used to model the high-level features described by the category corresponding to the first category, the first network device can calculate the mean and variance of the high-level features described by the K categories corresponding to the first category, The sampling operation is performed, and at least one high-level feature obtained by sampling obeys the distribution of the high-level features described by the category corresponding to the first category.
  • the first network device performs the foregoing operation on each first category among the C categories to obtain at least one target feature information set.
  • Each target feature information set includes a high-level feature corresponding to each first category among the C categories. , that is, each target feature information set includes C high-level features that correspond to C categories one-to-one.
  • the high-level features corresponding to each first category in the target feature information set and the K high-level features corresponding to each first category obtained in step 301 may be different.
  • the first network device generates predicted category information of the training image based on the high-level features of the category description corresponding to each first category and the feature information of the training image.
  • the predicted category of the training image pointed to by the predicted category information is included in C categories.
  • steps 302 and 303 are both optional steps. If steps 302 and 303 are executed, the first network device can obtain the characteristic information of the training image, and generate training data based on the target characteristic information set and the characteristic information of the training image. The predicted category information of the image, and the predicted category of the training image pointed to by the predicted category information is included in C categories.
  • the first network device may be configured with multiple training data.
  • each training data may include a training image and correct category information of the training image.
  • the correct category of the object in the training image includes Among the C categories, that is, the correct category of the training image pointed to by the correct category information is included in the C categories.
  • the first network device can input the training image into the second neural network to obtain the characteristics of the training image generated by the second neural network.
  • each training data may directly include feature information of a training image and correct category information of the training image.
  • the first network device obtains the similarity between the features of the training image and the high-level features corresponding to each category in the C categories in the target feature information set, and generates predicted category information of the training image based on the aforementioned similarity; wherein, the training image
  • the training image The higher the similarity between the features of C and the high-level features corresponding to one of the C categories (for convenience of description, hereafter referred to as the "second category"), the higher the probability that the predicted category of the training image is the second category. big.
  • the distribution information of high-level features described by categories corresponding to each first category since technical personnel found during research that the distribution of high-level features described by multiple categories corresponding to the same category is relatively concentrated, it is possible to execute based on the distribution information of high-level features described by categories corresponding to each first category.
  • Sampling operation is performed to obtain the high-level features corresponding to each first category, and based on the sampled high-level features and the feature information of the image, the predicted category information of the image is generated; since the predicted features of the image are used to generate the function value of the first loss function , that is, the function value of the first loss function is obtained based on the high-level features obtained by sampling.
  • the purpose of the iterative update includes reducing the function value of the first loss function.
  • the purpose of the iterative update includes the high-level features based on sampling (i.e., K categories High-level features around the described high-level features) can also obtain more accurate predicted category information, that is, a higher update standard is set, which is conducive to obtaining better category-described feature information, which in turn is conducive to improving the accuracy of image recognition. .
  • the first network device can also obtain a first feature information set.
  • the first feature information set includes C high-level features corresponding to C categories one-to-one.
  • the first feature Each high-level feature in the information set is obtained by averaging the high-level features described by K categories corresponding to each first category. Then, based on the first feature information set and the feature information of the training image, predicted category information of the training image is generated.
  • the first network device can also obtain a second set of feature information.
  • the second set of feature information includes C high-level features corresponding to C categories one-to-one.
  • Each high-level feature in the second set of feature information The feature is a high-level feature selected from the high-level features described by the K categories corresponding to each first category. Then, based on the second feature information set and the feature information of the training image, predicted category information of the training image is generated.
  • the first network device generates a function value of the first loss function, and the goal of using the first loss function to iteratively update includes improving the similarity between the predicted category information of the training image and the correct category information of the training image.
  • the first network device can generate the function value of the first loss function based on the predicted category information of the training image and the correct category information of the training image, wherein the goal of using the first loss function for iterative update includes improving training The similarity between the predicted class information of the image and the correct class information of the training image.
  • the first loss function can directly use the distance between the predicted category information of the training image and the correct category information of the training image.
  • the aforementioned distance can be cosine distance, Euclidean distance, L1 distance, or L2 distance. Or other types of distances, etc., which are not exhaustive here.
  • the first loss function can directly use the similarity between the predicted category information of the training image and the correct category information of the training image.
  • the aforementioned similarity can be cosine similarity, similarity based on Euclidean distance, or other types of similarity. Wait, I won’t be exhaustive here.
  • the objective function is the distance between the predicted category information and the correct category information of the image
  • the goal of iteratively updating using the objective function may include reducing the value of the objective function.
  • step 304 can be repeated countless times to obtain countless prediction category information of the training image, and then generate countless targets.
  • the sum of the function values of the function is then used to perform subsequent update operations using the sum of the function values of countless objective functions.
  • the function value of the first loss function can be calculated instead.
  • the function value of the first loss function is greater than or equal to the function value of the objective function.
  • the objective function is the predicted category information and image
  • the distance between the correct category information of The goal of iteratively updating a loss function includes reducing the function value of the objective function, that is, reducing the distance between the predicted category information and the correct category information of the image, which can also be called improving the predicted category information of the training image and the correct category of the training image. similarity between information.
  • the following uses the Gaussian distribution model for modeling as an example, and discloses an example of the sum of function values of countless objective functions and the calculation formula of the first loss function:
  • L(P K ) represents the sum of function values of countless objective functions
  • xi represents the i-th training image
  • y i represents the correct category information of the i-th training image
  • log(%) is a logarithmic function
  • w 1:C is a random vector obeying the Gaussian distribution N( ⁇ (P K ), ⁇ (P K )), ⁇ (P K ) and ⁇ (P K ) can refer to the descriptions in the above formulas (1) and (2).
  • e is a natural constant
  • z i represents the characteristic information of the i-th training image
  • is a predefined hyperparameter
  • w c represents multiple high-level features corresponding to the c-th category in w 1:C .
  • the function value of the first loss function can be calculated instead. Since the function value of the first loss function is greater than or equal to the function value of the objective function,
  • the goal of iterative update includes reducing the function value of the first loss function, that is, the goal of iterative update includes reducing the function value of the objective function, and the objective function is the difference between the predicted category information and the correct category information of the image.
  • distance that is, reducing the distance between the predicted category information and the correct category information of the image, that is, while maintaining the correct training target, a simpler loss function can be used instead, which is conducive to expanding the implementation of this solution flexibility.
  • the first network device generates a function value of the second loss function, and the goal of using the second loss function for iterative updating includes reducing the similarity between the feature information of the K category description templates.
  • the first network device can also generate a function value of the second loss function, wherein the function value of the second loss function can be based on the feature information of any two category description templates among the K category description templates.
  • the similarity between feature information is obtained, or the second loss function can be obtained based on the distance between the feature information of any two category description templates among the feature information of K category description templates.
  • L so represents the function value of the second loss function
  • K represents the number of category description templates
  • g(P i ) represents the high-level features of the i-th category description template among the K category description templates
  • g(P i ) represents K
  • the high-level features of the jth category description template in each category description template, ⁇ g(P i ), g(P j )> represent the cosine distance between g(P i ) and g(P j ).
  • the first network device updates the feature information described by the K categories corresponding to each first category according to the function value of the first loss function.
  • step 306 is an optional step. If step 306 is executed, the first network device can perform a weighted summation of the function value of the first loss function and the function value of the second loss function to obtain the total loss function. function value, and update the feature information described by the K categories corresponding to each first category according to the function value of the total loss function.
  • the function value of the second loss function is also used to update the feature information of the K category descriptions corresponding to each first category.
  • the goal of using the second loss function to iteratively update includes narrowing down at least two category descriptions.
  • the similarity between the feature information of the template is to amplify the feature information described by the K categories corresponding to each first category. distance between them, that is, to further increase the diversity of feature information described by the K categories corresponding to each first category, so as to improve the adaptability to different images in the same category, which in turn is conducive to improving the accuracy of image recognition results. .
  • the first network device may update the feature information described by the K categories according to the function value of the first loss function.
  • Figure 7 is a schematic flow chart of a method for obtaining feature information of category descriptions provided by an embodiment of the present application. Figure 7 can be understood in conjunction with the above description of Figure 6.
  • a sampling operation can be performed based on the distribution information of the high-level features described by the category corresponding to each category to obtain a high-level feature corresponding to each category.
  • the first network device obtains the feature information of the training image and calculates the similarity between the feature information of the training image and a high-level feature corresponding to each category to generate predicted category information of the training image.
  • the training image The feature information of has the greatest similarity with a high-level feature corresponding to the cat category. It should be understood that the example in Figure 7 is only for convenience of understanding this solution and is not used to limit this solution.
  • the first network device can perform gradient derivation on the function value of the first loss function (or the function value of the total loss function), and perform backpropagation on the derivation result to update the categories corresponding to the C categories. Described characteristic information.
  • a second neural network may also be configured.
  • the parameters of the first neural network and/or the parameters of the second neural network may be updated.
  • the first network device may keep the parameters of the first neural network and/or the parameters of the second neural network unchanged.
  • the first network device can perform backpropagation on the derivation results to directly update the underlying features of the K category description templates. Since the corresponding first category in the C categories The underlying features of each category description include the underlying features of the category description template, that is, the update of the underlying features of the K category descriptions corresponding to each of the first categories in the C categories is realized, and then the update of the underlying features with the category description template is also realized. Update of high-level features described by K categories corresponding to each first category in C categories.
  • the first network device can perform backpropagation on the derivation results to directly update the underlying features described by the K categories corresponding to each first category in the C categories, thereby also realizing the Update of high-level features described by K categories corresponding to each first category in C categories.
  • the first network device repeatedly performs the above steps to iteratively update the feature information described by K categories corresponding to each first category until the convergence condition is met; the convergence condition may include reaching the first loss function (optionally, It also includes the convergence condition of the second loss function), or the number of iterative updates reaches a preset number.
  • the characteristic information described by at least two categories corresponding to each first category in the C categories is obtained, and based on the characteristic information described by at least two categories corresponding to each first category and the characteristic information of the image, Generate the predicted category information of the image, and automatically update the feature information described by at least two categories based on the correct category information, predicted category information and the first loss function of the image until the convergence conditions are met, and use the first loss function to iteratively update
  • the goals include improving the similarity between the predicted category information and the correct category information of the image; through the aforementioned scheme, it is possible to Automatically learn the feature information of at least two category descriptions corresponding to each category, and the goals of iterative update include improving the accuracy of image recognition tasks, which is conducive to obtaining category descriptions that better match the recognition tasks, and because objects of the same category There are various changes in different images, resulting in that the most suitable category descriptions corresponding to different images of the same category may be different. In this solution, at least two category descriptions corresponding to each category are obtained, which
  • Figure 8 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • the image processing method provided by an embodiment of the present application may include:
  • the second network device performs feature extraction on the image to obtain feature information of the image.
  • the second network device can input the image to be processed into the second neural network to obtain the feature information of the image generated by the second neural network.
  • the second network device generates predicted category information of the image based on the feature information of the category description corresponding to each first category in the C categories and the feature information of the image.
  • the predicted category of the image pointed to by the predicted category information is included in the C categories.
  • Category, the category description of the first category includes a category description template and a first category
  • the feature information corresponding to each first category in the C categories is obtained based on the feature information of at least two category descriptions corresponding to each first category
  • the characteristic information described by at least two categories corresponding to each first category is obtained by iteratively updating using the first loss function.
  • the goal of using the first loss function to iteratively update includes improving the predicted category information and the correct category information of the image. similarity between.
  • step 802 for the specific implementation of step 802, please refer to the description of step 304 in the corresponding embodiment in Figure 3, and will not be described again here.
  • the characteristic information described by the category corresponding to each first category in the C categories can be the mean, median or other value of "the characteristic information described by at least two categories corresponding to each first category” etc., there is no limit here.
  • the "feature information described by at least two categories corresponding to each first category” in the corresponding embodiment of Figure 8 is obtained based on the method corresponding to Figure 3.
  • the meaning of each noun in Figure 8 can be referred to Figure 3
  • the description in the corresponding embodiment will not be repeated here.
  • FIG. 9a is a schematic diagram illustrating the beneficial effects of the method for obtaining feature information of category descriptions provided by an embodiment of the present application.
  • the asterisk group combines the manually designed category description template with C categories to obtain the category description; the reference group 1 autonomously learns the feature information of a category description; the reference group 2 directly conducts image classification training through It can be seen from the example in Figure 9a that the highest accuracy score can be achieved by using the feature information described by at least two categories obtained using the solution provided by the embodiment of the present application.
  • Figure 9b is a beneficial effect of the method for obtaining the feature information of the category description provided by the embodiment of the present application.
  • Figure 9b shows the experimental results on multiple data sets. Each ordinate in Figure 9b is a different data set. As shown in the figure, compared with the manually designed category description template and the reference group , using the feature information described by at least two categories obtained by the solution provided by the embodiment of this application, can achieve the highest accuracy score.
  • Figure 10 is a schematic structural diagram of a device for obtaining feature information of category descriptions provided by an embodiment of the present application.
  • the device 1000 for obtaining feature information of category descriptions includes: an acquisition module 1001, used to obtain C categories Feature information of at least two category descriptions corresponding to each first category in The feature information described by at least two categories corresponding to the first category and the feature information of the image generate predicted category information of the image, and the predicted category of the image pointed to by the predicted category information is included in C categories; the update module 1003 is used to generate the predicted category information of the image according to the first category.
  • the loss function updates the feature information described by at least two categories corresponding to each first category until the convergence condition is met.
  • the goal of using the first loss function to iteratively update includes improving the predicted category information and the correct category information of the image. similarity between.
  • the feature information described by at least two categories corresponding to the first category includes first feature information and second feature information, the position of the first category feature information in the first feature information and the first feature information.
  • the position of the category feature information in the second feature information is different.
  • the feature information described by at least two categories includes high-level features described by at least two categories.
  • the high-level features described by the categories are features generated by the hidden layer/neural network in the neural network.
  • the neural network is used to Feature updates to category descriptions;
  • the generation module 1002 is specifically configured to: determine the distribution information of the high-level features described by the category corresponding to the first category based on the high-level features described by at least two categories corresponding to the first category; based on the categories corresponding to each first category The described distribution information of high-level features is performed, and a sampling operation is performed to obtain a feature information set.
  • the feature information set includes high-level features corresponding to each first category; based on the feature information and feature information set of the image, predicted category information of the image is generated.
  • the acquisition module 1001 is specifically configured to: acquire the underlying features of at least two category descriptions corresponding to each first category, where the underlying features of the category descriptions are category descriptions in vectorized form; convert the category descriptions into The low-level features of the category description are input into the neural network, and the low-level features of the category description are updated through the neural network to obtain the high-level features of the category description;
  • the update module 1003 is specifically used to perform gradient updates on the underlying features of at least two category description templates corresponding to the C categories according to the function value of the first loss function while keeping the parameters of the neural network unchanged, so as to obtain Updated underlying features described by at least two categories corresponding to each first category.
  • the update module 1003 is specifically configured to update the feature information described by at least two categories corresponding to each first category according to the first loss function and the second loss function, using the second loss
  • the goal of the iterative update of the function includes narrowing the similarity between the feature information of at least two category description templates.
  • the function value of the first loss function is greater than or equal to the function value of the objective function.
  • the objective function is the distance between the predicted category information and the correct category information of the image.
  • the goal of the iterative update includes reducing the first loss.
  • the function value of the function is the distance between the predicted category information and the correct category information of the image.
  • the processing device 1100 includes: an acquisition module 1101, which is used to extract features of the image and obtain the feature information of the image; a generation module 1102, which is used to describe the feature information and the image according to the category description corresponding to each first category in the C categories.
  • Feature information generates predicted category information of the image.
  • the predicted category of the image pointed to by the predicted category information is included in C categories, and C is an integer greater than or equal to 2.
  • the category description of the first category includes a category description template and a first category
  • the feature information corresponding to each first category in the C categories is obtained based on the feature information of at least two category descriptions corresponding to each first category
  • the characteristic information described by at least two categories corresponding to each first category is obtained by iteratively updating using the first loss function.
  • the goal of using the first loss function to iteratively update includes improving the predicted category information and the correct category information of the image. similarity between.
  • the feature information described by at least two categories corresponding to the first category includes first feature information and second feature information, the position of the first category feature information in the first feature information and the first feature information.
  • the position of the category feature information in the second feature information is different.
  • the feature information of category descriptions corresponding to C categories is obtained by iteratively updating using a first loss function and a second loss function, and the second loss function indicates the feature information of at least two category description templates. the distance between.
  • Figure 12 is a schematic structural diagram of a network device provided by an embodiment of the present application.
  • the network device 1200 includes: a receiver 1201 and a transmitter. 1202.
  • Processor 1203 and memory 1204 (the number of processors 1203 in the second network device 1200 may be one or more, and one processor is taken as an example in Figure 12), where the processor 1203 may include an application processor 12031 and communications processor 12032.
  • the receiver 1201, the transmitter 1202, the processor 1203, and the memory 1204 may be connected through a bus or other means.
  • Memory 1204 may include read-only memory and random access memory and provides instructions and data to processor 1203 .
  • a portion of memory 1204 may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1204 stores processor and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, where the operating instructions may include various operating instructions for implementing various operations.
  • Processor 1203 controls the operation of network equipment.
  • various components of network equipment are coupled together through a bus system.
  • the bus system may also include a power bus, a control bus, a status signal bus, etc.
  • various buses are called bus systems in the figure.
  • the methods disclosed in the above embodiments of the present application can be applied to the processor 1203 or implemented by the processor 1203.
  • the processor 1203 may be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 1203 .
  • the above-mentioned processor 1203 may be a general-purpose processor, a digital signal processor (DSP), a microprocessor or a microcontroller, and may further include an application specific integrated circuit (application specific integrated circuit). circuit (ASIC), field-programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field-programmable gate array
  • the processor 1203 can implement or execute the various methods, steps and logical block diagrams disclosed in the embodiments of this application.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory 1204.
  • the processor 1203 reads the information in the memory 1204 and completes the steps of the above method in combination with its hardware.
  • the receiver 1201 may be configured to receive input numeric or character information, and generate signal input related to related settings and function control of the second network device.
  • the transmitter 1202 can be used to output numeric or character information through the first interface; the transmitter 1202 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1202 can also include a display device such as a display screen .
  • the processor 1203 is configured to execute the image processing method executed by the second network device in the corresponding embodiments of FIGS. 3 to 7 .
  • the specific manner in which the application processor 12031 in the processor 1203 performs the above steps is based on the same concept as the various method embodiments corresponding to Figures 3 to 7 in this application, and the technical effects it brings are the same as those in this application.
  • the method embodiments corresponding to Figures 3 to 7 are the same. For details, please refer to the descriptions in the method embodiments shown above in this application, and will not be described again here.
  • FIG. 13 is a schematic structural diagram of the network device provided by the embodiment of the present application.
  • the network device 1300 is implemented by one or more servers.
  • the first network The device 1300 may vary greatly due to different configurations or performance, and may include one or more central processing units (CPU) 1322 (for example, one or more processors) and memory 1332, one or more A storage medium 1330 (eg, one or more mass storage devices) that stores applications 1342 or data 1344.
  • the memory 1332 and the storage medium 1330 may be short-term storage or persistent storage.
  • the program stored in the storage medium 1330 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the first network device. Furthermore, the central processor 1322 may be configured to communicate with the storage medium 1330 and execute a series of instruction operations in the storage medium 1330 on the first network device 1300 .
  • the first network device 1300 may also include one or more power supplies 1326, one or more wired or wireless network interfaces 1350, one or more input and output interfaces 1358, and/or, one or more operating systems 1341, such as Windows ServerTM , Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and so on.
  • operating systems 1341 such as Windows ServerTM , Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and so on.
  • the central processor 1322 is configured to execute the method for obtaining the characteristic information of the category description performed by the first network device in the embodiment corresponding to FIG. 8 .
  • the specific manner in which the central processor 1322 performs the above steps is based on the same concept as the various method embodiments corresponding to Figure 8 in this application, and the technical effects it brings are the same as those in each method implementation corresponding to Figure 8 in this application.
  • the examples are the same.
  • An embodiment of the present application also provides a computer program product that, when run on a computer, causes the computer to perform the steps performed by the first network device in the method described in the embodiments shown in Figures 3 to 7, or ,make The computer performs the steps performed by the second network device in the method described in the embodiment shown in FIG. 8 .
  • Embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores a program for signal processing.
  • the program When the program is run on a computer, it causes the computer to execute the steps shown in Figures 3 to 7.
  • the steps performed by the first network device in the method described in the embodiment are shown, or the computer is caused to perform the steps performed by the second network device in the method described in the embodiment shown in FIG. 8 .
  • the second network device, the first network device, the second network device or the communication device provided by the embodiment of the present application may specifically be a chip.
  • the chip includes: a processing unit and a communication unit.
  • the processing unit may be, for example, a processor.
  • the communication unit The unit may be, for example, an input/output interface, a pin, or a circuit.
  • the processing unit can execute the computer execution instructions stored in the storage unit, so that the chip performs the method for obtaining the characteristic information of the category description described in the embodiments shown in FIGS. 3 to 7, or to cause the chip to perform the implementation shown in FIG. 8.
  • the example describes the image processing method.
  • the storage unit is a storage unit within the chip, such as a register, cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.
  • ROM Read-
  • Figure 14 is a structural schematic diagram of a chip provided by an embodiment of the present application.
  • the chip can be represented as a neural network processor NPU 140.
  • the NPU 140 serves as a co-processor and is mounted to the main CPU (Host). CPU), tasks are allocated by the Host CPU.
  • the core part of the NPU is the arithmetic circuit 140.
  • the arithmetic circuit 1403 is controlled by the controller 1404 to extract the matrix data in the memory and perform multiplication operations.
  • the computing circuit 1403 internally includes multiple processing units (Process Engine, PE).
  • arithmetic circuit 1403 is a two-dimensional systolic array.
  • the arithmetic circuit 1403 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition.
  • arithmetic circuit 1403 is a general-purpose matrix processor.
  • the arithmetic circuit obtains the corresponding data of matrix B from the weight memory 1402 and caches it on each PE in the arithmetic circuit.
  • the operation circuit takes matrix A data and matrix B from the input memory 1401 to perform matrix operations, and the partial result or final result of the matrix is stored in an accumulator (accumulator) 1408 .
  • the unified memory 1406 is used to store input data and output data.
  • the weight data directly passes through the storage unit access controller (Direct Memory Access Controller, DMAC) 1405, and the DMAC is transferred to the weight memory 1402.
  • Input data is also transferred to unified memory 1406 via DMAC.
  • DMAC Direct Memory Access Controller
  • BIU is the Bus Interface Unit, that is, the bus interface unit 1410, which is used for the interaction between the AXI bus and the DMAC and the Instruction Fetch Buffer (IFB) 1409.
  • IFB Instruction Fetch Buffer
  • the bus interface unit 1410 (Bus Interface Unit, BIU for short) is used to fetch the memory 1409 to obtain instructions from the external memory, and is also used for the storage unit access controller 1405 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • BIU Bus Interface Unit
  • DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1406 or the weight data to the weight memory 1402 or the input data to the input memory 1401 .
  • the vector calculation unit 1407 includes multiple arithmetic processing units, and performs operations on the output of the arithmetic circuit if necessary. Further processing, such as vector multiplication, vector addition, exponential operations, logarithm operations, size comparison, etc. Mainly used for non-convolutional/fully connected layer network calculations in neural networks, such as Batch Normalization, pixel-level summation, upsampling of feature planes, etc.
  • vector calculation unit 1407 can store the processed output vectors to unified memory 1406 .
  • the vector calculation unit 1407 can apply a linear function and/or a nonlinear function to the output of the operation circuit 1403, such as linear interpolation on the feature plane extracted by the convolution layer, or a vector of accumulated values, to generate an activation value.
  • vector calculation unit 1407 generates normalized values, pixel-wise summed values, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 1403, such as for use in a subsequent layer in a neural network.
  • the instruction fetch buffer 1409 connected to the controller 1404 is used to store instructions used by the controller 1404;
  • the unified memory 1406, input memory 1401, weight memory 1402 and instruction fetch memory 1409 are all On-Chip memories. External memory is private to the NPU hardware architecture.
  • each layer in the first neural network and the second neural network shown in the above method embodiments can be performed by the operation circuit 1403 or the vector calculation unit 1407.
  • the processor mentioned in any of the above places may be a general central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control program execution of the method of the first aspect.
  • the device embodiments described above are only illustrative.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physically separate.
  • the physical unit can be located in one place, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the connection relationship between modules indicates that there are communication connections between them, which can be specifically implemented as one or more communication buses or signal lines.
  • the present application can be implemented by software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memories, Special components, etc. to achieve. In general, all functions performed by computer programs can be easily implemented with corresponding hardware. Moreover, the specific hardware structures used to implement the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. circuit etc. However, for this application, software program implementation is a better implementation in most cases. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology.
  • the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to cause a computer device (which can be a personal computer, a first network device, or a network device, etc.) to execute various embodiments of the present application. method described.
  • a computer device which can be a personal computer, a first network device, or a network device, etc.
  • the computer program product includes one or more computer instructions.
  • the computer may be General purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, a computer, a first network device Or the data center transmits to another website, computer, first network equipment or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means.
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that a computer can store, or a data storage device such as a first network device, a data center, etc. integrated with one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state disk (Solid State Disk, SSD)), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

一种类别描述的特征信息的获取方法、图像的处理方法及设备,该方法可用于人工智能领域的图像处理领域中,方法包括:获取与C个类别中每个第一类别对应的至少两个类别描述的特征信息;根据每个第一类别对应的至少两个类别描述的特征信息和图像的特征信息,生成图像的预测类别信息,图像的预测类别包括于C个类别;根据第一损失函数,对与每个第一类别对应的至少两个类别描述的特征信息进行更新。自动学习每个类别所对应的至少两个类别描述的特征信息,且迭代更新的目标包括提高预测类别信息和正确类别信息之间的相似度,有利于获得与识别任务更为匹配的类别描述,有利于提高与同一类别中的不同图像的适配度。

Description

类别描述的特征信息的获取方法、图像的处理方法及设备
本申请要求于2022年04月29日提交中国专利局、申请号为202210491778.1、发明名称为“类别描述的特征信息的获取方法、图像的处理方法及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,尤其涉及一种类别描述的特征信息的获取方法、图像的处理方法及设备。
背景技术
视觉语言模型(vision-language model)的最新进展指出,同一类别的图像的特征和语言描述的特征相似,因此,若需要从C个类别中确定图像中物体的类别,可以获取与C个类别中每个类别对应的类别描述,由于同一类别的文本形式的类别描述的特征和图像的特征类似,则可以借助每个类别的类别描述的特征来辅助对图像中物体的识别。
目前采用的方式是人工设计每种类别的类别描述模板,将人工设计的类别描述模板分别和类别名称组合得到类别描述。作为示例,例如人工设计的类别描述模板为“这是一只XXX”,类别名称为“猫咪”,则类别描述为“这是一只猫咪”。
但由于人工设计的类别描述模板会引入人为偏见,因此人工设计的类别描述模板对该图像识别任务来说不一定是最优的,且为了能够得到合适的类别描述模板,需要人工重复且耗时的对多个类别描述模板进行尝试。
发明内容
本申请实施例提供了一种类别描述的特征信息的获取方法、图像的处理方法以及相关设备,自动学习每个类别所对应的至少两个类别描述的特征信息,且迭代更新的目标包括提高图像识别任务的准确率,有利于获得与识别任务更为匹配的类别描述;还有利于提高与同一类别中的不同图像的适配度,以进一步提高图像识别过程的准确性。
为解决上述技术问题,本申请实施例提供以下技术方案:
第一方面,本申请实施例提供一种类别描述的特征信息的获取方法,可用于人工智能领域中图像处理领域中,方法包括:第一网络设备获取与C个类别中每个第一类别对应的K个类别描述的特征信息,C和K均为大于或等于2的整数,第一类别的类别描述包括类别描述模板和第一类别;作为示例,例如C个类别为C个不同品种的猫咪,K的取值为3,则3个不同的类别描述模板可以分别为“这是一只XX”、“这是一只猫咪,具体的品种是XX”和“这是一只品种为XX的猫咪”,第一类别的类别名称为“美短”,则与第一类别对应的3种不同的类别描述分别为“这是一只美短”、“这是一只猫咪,具体的品种是美短”和“这是一只品种为美短的猫咪”。
第一网络设备根据每个第一类别对应的K个类别描述的特征信息和图像的特征信息,生成图像的预测类别信息,预测类别信息指向的图像的预测类别包括于C个类别;具体的, 第一网络设备获取训练图像的特征与目标特征信息集合中的C个类别中每个类别所对应的高层特征之间的相似度,根据第一损失函数,对与每个第一类别对应的K个类别描述的特征信息进行更新,直至满足收敛条件,利用第一损失函数进行迭代更新的目标包括提高预测类别信息和图像的正确类别信息之间的相似度。
本实现方式中,获取与C个类别中每个第一类别对应的K个类别描述的特征信息,根据每个第一类别对应的K个类别描述的特征信息和图像的特征信息,生成图像的预测类别信息,根据图像的正确类别信息、预测类别信息和第一损失函数,对K个类别描述的特征信息进行自动更新,直至满足收敛条件,利用第一损失函数进行迭代更新的目标包括提高预测类别信息和图像的正确类别信息之间的相似度;通过前述方案,能够自动学习每个类别所对应的K个类别描述的特征信息,且迭代更新的目标包括提高图像识别任务的准确率,有利于获得与识别任务更为匹配的类别描述;由于同一类别的物体在不同的图像中存在各种变化,导致与同一类别的不同图像所对应的最适配的类别描述可能不同,本方案中获取每种类别对应的K个类别描述,有利于提高与同一类别中的不同图像的适配度,以进一步提高图像识别过程的准确性。
在第一方面的一种可能实现方式中,与第一类别对应的K个类别描述的特征信息包括第一特征信息和第二特征信息,第一类别的特征信息在第一特征信息中的位置和第一类别的特征信息在第二特征信息中的位置不同。
本实现方式中,由于姿势、形变和光照条件等因素的影响,同一类别的图像中不同的图像存在多样性,而则同一类别中不同的图像所适配的类别描述的特征信息可以不同,本方案中第一类别的特征信息在第一特征信息中的位置和第一类别的特征信息在第二特征信息中的位置不同,有利于提高最终得到的至少两种类别描述的特征信息的多样性,以提高与同一类别中不同的图像的适配度,进而有利于提高图像识别结果的准确度。
在第一方面的一种可能实现方式中,K个类别描述的特征信息包括K个类别描述的高层特征,类别描述的高层特征为第一神经网络中的隐含层/第一神经网络生成的特征,第一神经网络用于对类别描述进行特征更新。
第一网络设备根据每个第一类别对应的K个类别描述的特征信息和图像的特征信息,生成图像的预测类别信息,包括:第一网络设备可以根据与第一类别对应的K个类别描述的高层特征,采用目标模型对第一类别所对应的类别描述的高层特征进行建模,以确定与第一类别对应的类别描述的高层特征的分布信息,目标模型可以采用高斯分布模型、混合高斯分布模型、冯米塞斯(von Mises)分布模型或其他类型的模型等。
第一网络设备根据每个第一类别所对应的类别描述的高层特征的分布信息,执行采样操作,得到特征信息集合,特征信息集合包括与每个第一类别对应的高层特征;示例性的,若采用高斯分布模型对第一类别所对应的类别描述的高层特征进行建模,第一网络设备可以根据与第一类别所对应的K个类别描述的高层特征的均值和方差,执行该采样操作,采样得到的至少一个高层特征服从第一类别所对应的类别描述的高层特征的分布。第一网络设备根据图像的特征信息和特征信息集合,生成图像的预测类别信息。
本实现方式中,由于技术人员在研究中发现同一类别所对应的多个类别描述的高层特 征分布较为集中,因此可以根据每个第一类别所对应的类别描述的高层特征的分布信息,执行采样操作以得到每个第一类别对应的高层特征,并根据采样得到的高层特征和图像的特征信息,生成图像的预测类别信息;由于图像的预测特征用于生成第一损失函数的函数值的,也即基于采样得到的高层特征得到第一损失函数的函数值,迭代更新的目的包括降低第一损失函数的函数值,也即迭代更新的目的包括基于采样的高层特征(也即K个类别描述的高层特征周围的高层特征)也能够得到更加准确的预测类别信息,也即设立了更高的更新标准,有利于得到更优的类别描述的特征信息,进而有利于提高图像识别的准确率。
在第一方面的一种可能实现方式中,第一网络设备获取与C个类别中每个第一类别对应的K个类别描述的特征信息,包括:第一网络设备获取与每个第一类别对应的K个类别描述的底层特征,类别描述的底层特征为向量化形式的类别描述;将类别描述的底层特征输入第一神经网络,通过第一神经网络对类别描述的底层特征进行更新,得到类别描述的高层特征。
第一网络设备对与每个第一类别对应的K个类别描述的特征信息进行更新,包括:第一网络设备在保持第一神经网络的参数不变的前提下,根据第一损失函数的函数值对与C个类别对应的K个类别描述模板的底层特征进行梯度更新,以得到与每个第一类别对应的K个类别描述的更新后的底层特征。
本实现方式中,在对类别描述模板的底层特征进行梯度更新的过程中,保持第一神经网络的参数不变,有利于减少需要更新的参数的数量,进而有利于更快的得到合适的特征信息,有利于提高获取类别描述的特征信息的效率。
在第一方面的一种可能实现方式中,第一网络设备根据第一损失函数,对与每个第一类别对应的K个类别描述的特征信息进行更新,包括:第一网络设备根据第一损失函数和第二损失函数,对与每个第一类别对应的K个类别描述的特征信息进行更新,采用第二损失函数进行迭代更新的目标包括缩小K个类别描述模板的特征信息之间的相似度。
本实现方式中,还采用第二损失函数的函数值对每个第一类别对应的K个类别描述的特征信息进行更新,采用第二损失函数进行迭代更新的目标包括缩小至少两个类别描述模板的特征信息之间的相似度,也即放大每个第一类别对应的K个类别描述的特征信息之间的距离,也即进一步提高每个第一类别对应的K个类别描述的特征信息的多样性,以提高与同一类别中不同的图像的适配度,进而有利于提高图像识别结果的准确度。
在第一方面的一种可能实现方式中,第一损失函数的函数值大于或等于目标函数的函数值,目标函数为预测类别信息和图像的正确类别信息之间的距离,迭代更新的目标包括降低第一损失函数的函数值。
本实现方式中,若直接计算无数个目标函数的函数值的和比较困难,则可以替换计算第一损失函数的函数值,由于第一损失函数的函数值大于或等于目标函数的函数值,迭代更新的目标包括降低所述第一损失函数的函数值,也即迭代更新的目标包括降低目标函数的函数值,而目标函数为所述预测类别信息和所述图像的正确类别信息之间的距离,也即缩小预测类别信息和所述图像的正确类别信息之间的距离,也即在保持了正确的训练目标的前提下,能够替换采用更为简单的损失函数,有利于扩展本方案实现的灵活性。
第二方面,本申请实施例提供一种图像的处理方法,可用于人工智能领域中图像处理领域中,方法包括:第二网络设备对图像进行特征提取,得到图像的特征信息;根据与C个类别中每个第一类别对应的类别描述的特征信息和图像的特征信息,生成图像的预测类别信息,预测类别信息指向的图像的预测类别包括于C个类别,C为大于或等于2的整数。其中,第一类别的类别描述包括类别描述模板和第一类别,与C个类别中每个第一类别对应的特征信息基于每个第一类别所对应的至少两个类别描述的特征信息得到,每个第一类别所对应的至少两个类别描述的特征信息为利用第一损失函数进行迭代更新后得到,利用第一损失函数进行迭代更新的目标包括提高预测类别信息和图像的正确类别信息之间的相似度。
本申请第二方面中,第二网络设备还可以用于执行第一方面以及第一方面的各个可能实现方式中第一网络设备执行的步骤,上述每个第一类别所对应的至少两个类别描述的特征信息可以通过第一方面以及第一方面的各个可能实现方式中的方法得到,第二方面的各个可能实现方式中的步骤的具体实现方式、名词的含义以及所带来的有益效果,均可以参阅第一方面,此处不再赘述。
第三方面,本申请实施例提供一种类别描述的特征信息的获取装置,可用于人工智能领域中图像处理领域中,装置包括:获取模块,用于获取与C个类别中每个第一类别对应的至少两个类别描述的特征信息,C为大于或等于2的整数,第一类别的类别描述包括类别描述模板和第一类别;生成模块,用于根据每个第一类别对应的至少两个类别描述的特征信息和图像的特征信息,生成图像的预测类别信息,预测类别信息指向的图像的预测类别包括于C个类别;更新模块,用于根据第一损失函数,对与每个第一类别对应的至少两个类别描述的特征信息进行更新,直至满足收敛条件,利用第一损失函数进行迭代更新的目标包括提高预测类别信息和图像的正确类别信息之间的相似度。
本申请第三方面中,类别描述的特征信息的获取装置还可以用于执行第一方面以及第一方面的各个可能实现方式中第一网络设备执行的步骤,第三方面的各个可能实现方式中的步骤的具体实现方式、名词的含义以及所带来的有益效果,均可以参阅第一方面,此处不再赘述。
第四方面,本申请实施例提供一种图像的处理装置,可用于人工智能领域中图像处理领域中,装置包括:获取模块,用于对图像进行特征提取,得到图像的特征信息;生成模块,用于根据与C个类别中每个第一类别对应的类别描述的特征信息和图像的特征信息,生成图像的预测类别信息,预测类别信息指向的图像的预测类别包括于C个类别,C为大于或等于2的整数;其中,第一类别的类别描述包括类别描述模板和第一类别,与C个类别中每个第一类别对应的特征信息基于每个第一类别所对应的至少两个类别描述的特征信息得到,每个第一类别所对应的至少两个类别描述的特征信息为利用第一损失函数进行迭代更新后得到,利用第一损失函数进行迭代更新的目标包括提高预测类别信息和图像的正确类别信息之间的相似度。
本申请第四方面中,图像的处理装置还可以用于执行第二方面以及第二方面的各个可能实现方式中第一网络设备执行的步骤,第四方面的各个可能实现方式中的步骤的具体实 现方式、名词的含义以及所带来的有益效果,均可以参阅第二方面,此处不再赘述。
第五方面,本申请实施例提供了一种计算机程序产品,计算机程序产品包括程序,当该程序在计算机上运行时,使得计算机执行上述第一方面或第二方面所述的方法。
第六方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面所述的方法。
第七方面,本申请实施例提供了一种网络设备,包括处理器和存储器,处理器与存储器耦合,存储器,用于存储程序;处理器,用于执行存储器中的程序,使得该网络设备执行上述第一方面所述的方法。
第八方面,本申请实施例提供了一种网络设备,包括处理器和存储器,处理器与存储器耦合,存储器,用于存储程序;处理器,用于执行存储器中的程序,使得该网络设备执行上述第二方面所述的方法。
第九方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持网络设备实现上述各个方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据和/或信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存第二网络设备或通信设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。
附图说明
图1为本申请实施例提供的人工智能主体框架的一种结构示意图;
图2a为本申请实施例提供的类别描述的特征信息的获取系统的一种系统架构图;
图2b为本申请实施例提供的类别描述的特征信息的获取方法的一种流程示意图;
图3为本申请实施例提供的类别描述的特征信息的获取方法的一种流程示意图;
图4为本申请实施例提供的类别描述的特征信息的获取方法中类别描述的特征信息的一种示意图;
图5为本申请实施例提供的多个类别所对应的类别描述的高层特征的特征分布的一种示意图;
图6为本申请实施例提供的确定多个类别所对应的类别描述的高层特征的分布信息的一种流程示意图;
图7为本申请实施例提供的类别描述的特征信息的获取方法的一种流程示意图;
图8为本申请实施例提供的图像的处理方法的一种流程示意图;
图9a为本申请实施例提供的类别描述的特征信息的获取方法的有益效果的一种示意图;
图9b为本申请实施例提供的类别描述的特征信息的获取方法的有益效果的一种示意图;
图10为本申请实施例提供的类别描述的特征信息的获取装置的一种结构示意图;
图11为本申请实施例提供的图像的处理装置的一种结构示意图;
图12为本申请实施例提供的网络设备的一种结构示意图;
图13为本申请实施例提供的网络设备的一种结构示意图
图14为本申请实施例提供的芯片的一种结构示意图。
具体实施方式
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
首先对人工智能系统总体工作流程进行描述,请参见图1,图1示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片提供,该智能芯片具体可以采用中央处理器(central processing unit,CPU)、嵌入式神经网络处理器(neural-network processing unit,NPU)、图形处理器(graphics processing unit,GPU)、专用集成电路(application specific integrated circuit,ASIC)或现场可编程门阵列(field programmable gate array,FPGA)等硬件加速芯片;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。
(2)数据
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。
(3)数据处理
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、 预处理、训练等。
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。
(5)智能产品及行业应用
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶、智慧城市等。
本申请实施例可以应用于人工智能领域的各个应用领域中,具体用于对各个应用领域的图像中的物品进行识别。若第二网络设备是从C个类别中确定图像的预测类别,则可以预先获取与C个类别中每个类别对应的类别描述的特征信息,进而根据C个类别中每个类别对应的类别描述的特征信息和图像的特征信息之间的相似度,确定图像的预测类别。
为了能够自动学习与C个类别中每个类别对应的至少两个类别描述的特征信息,本申请实施例提供了一种类别描述的特征信息的获取方法,在介绍前述方法之前,先参阅图2a,图2a为本申请实施例提供的类别描述的特征信息的获取系统的一种系统架构图,在图2a中,类别描述的特征信息的获取200包括第一网络设备210、数据库220、第二网络设备230和数据存储系统240,第二网络设备230中包括计算模块231。
其中,数据库220中存储有训练数据集合,第一网络设备210可以获取与C个类别中每个第一类别对应的至少两个类别描述的特征信息,并利用训练数据集合对与每个第一类别对应的至少两个类别描述的特征信息进行迭代更新,直至满足收敛条件。
具体的,请参阅图2b,图2b为本申请实施例提供的类别描述的特征信息的获取方法的一种流程示意图。其中,A1、第一网络设备获取与C个类别中每个第一类别对应的至少两个类别描述的特征信息,C为大于或等于2的整数,第一类别的类别描述包括类别描述模板和第一类别;A2、第一网络设备根据每个第一类别对应的至少两个类别描述的特征信息和图像的特征信息,生成图像的预测类别信息,预测类别信息指向的图像的预测类别包括于C个类别;A3、第一网络设备根据第一损失函数,对与每个第一类别对应的至少两个类别描述的特征信息进行更新,直至满足收敛条件,利用第一损失函数进行迭代更新的目标包括提高预测类别信息和图像的正确类别信息之间的相似度。
本申请实施例,能够自动学习每个类别所对应的至少两个类别描述的特征信息,且迭代更新的目标包括提高图像识别任务的准确率,有利于获得与识别任务更为匹配的类别描述,且由于同一类别的物体在不同的图像中存在各种变化,导致与同一类别的不同图像所对应的最适配的类别描述可能不同,本方案中获取每种类别对应的至少两个类别描述,有利于提高与同一类别中的不同图像的适配度,以进一步提高图像识别过程的准确性。
第一网络设备210得到的与C个类别中每个第一类别对应的类别描述的特征信息可以部署于各种形态的第二网络设备230中。其中,第二网络设备230可以调用数据存储系统240中的数据、代码等,也可以将数据、指令等存入数据存储系统240中。数据存储系统240可以置于第二网络设备230中,也可以为数据存储系统240相对第二网络设备230是外部存储器。
本申请的一些实施例中,请参阅图2a,第二网络设备230可以配置于客户设备中,“用户”可以直接与第二网络设备230进行数据交互。作为示例,例如当客户设备为手机或平板时,第二网络设备230可以为手机或平板的主处理器(Host CPU)中用于进行图像识别的模块,第二网络设备230也可以为手机或平板中的图形处理器(graphics processing unit,GPU)或者神经网络处理器(NPU),GPU或NPU作为协处理器挂载到主处理器上,由主处理器分配任务。
值得注意的,图2a仅是本发明实施例提供的两种图像处理系统的架构示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制。例如,在本申请的另一些实施例中,第二网络设备230和客户设备250可以为分别独立的设备,第二网络设备230通过配置的I/O接口与客户设备进行数据交互,“用户”可以通过客户设备上的I/O接口212输入图像,第二网络设备230通过I/O接口将图像的预测类别返回给客户设备,提供给用户。
结合上述描述,下面开始对本申请实施例提供的类别描述的特征信息的迭代更新阶段和应用阶段的具体实现流程进行描述。
一、类别描述的特征信息的迭代更新阶段
本申请实施例中,具体的,请参阅图3,图3为本申请实施例提供的类别描述的特征信息的获取方法的一种流程示意图,本申请实施例提供的类别描述的特征信息的获取方法可以包括:
301、第一网络设备获取与C个类别中每个第一类别对应的K个类别描述的特征信息。
本申请实施例中,第一网络设备获取与C个类别中每个第一类别对应的K个类别描述的特征信息,其中,C为大于或等于2的整数,K为大于或等于2的整数,与每个第一类别对应的K个类别描述的特征信息包括前述K个类别描述的高层特征,还可以包括前述K个类别描述的底层特征。
具体的,针对C个类别中的任意一个类别(为方便描述,后续称为“第一类别”),第一网络设备可以先获取与第一类别对应的K个类别描述的底层特征,类别描述的底层特征为对该类别描述进行向量化处理后得到,每个第一类别的一个类别描述包括一个类别描述模板和第一类别的名称。
作为示例,例如C个类别为C个不同品种的猫咪,K的取值为3,则3个不同的类别描述模板可以分别为“这是一只XX”、“这是一只猫咪,具体的品种是XX”和“这是一只品种为XX的猫咪”,第一类别的类别名称为“美短”,则与第一类别对应的3种不同的类别描述分别为“这是一只美短”、“这是一只猫咪,具体的品种是美短”和“这是一只品种为美短的猫咪”,此处举例仅为方便理解本方案,不用于限定本方案。
第一网络设备将与第一类别对应的每个类别描述的底层特征输入第一神经网络,通过 第一神经网络对类别描述的底层特征进行更新,得到与第一类别对应的每个类别描述的高层特征,其中,第一神经网络可以采用文本编码器(text encoder),类别描述的高层特征为第一神经网络中的隐含层/第一神经网络生成的特征,第一神经网络中的隐含层指的是第一神经网络任意中间层的输出。
第一网络设备对C个类别中的每个第一类别均执行上述方法,可以得到与C个类别中每个第一类别对应的K个类别描述的特征信息。
更具体的,针对“获取C个类别中任意一个第一类别所对应的K个类别描述的底层特征”的过程,在一种实现方式中,第一网络设备可以初始化K个类别描述模板的底层特征,第一网络设备对第一类别的名称的进行向量化处理,得到第一类别的名称的底层特征;第一网络设备将K个类别描述模板中每个类别描述模板的特征信息与第一类别的名称的特征信息结合,可以得到与第一类别对应的K个类别描述的底层特征。
在另一种实现方式中,第一网络设备在获取K个类别描述模板之后,可以对K个类别描述模板进行向量化处理,得到K个类别描述模板的底层特征;第一网络设备对第一类别的名称的进行向量化处理,得到第一类别的名称的底层特征;第一网络设备将K个类别描述模板中每个类别描述模板的特征信息与第一类别的名称的特征信息结合,可以得到与第一类别对应的K个类别描述的底层特征。
在一种实现方式中,第一网络设备可以初始化K个类别描述模板,类别描述模板也可以称为提示(prompt)模板。可以将K个类别描述模板中每个类别描述模板分别与第一类别的类别名称(class name)结合,得到与第一类别对应的K个类别描述(category description);第一网络设备可以第一类别所对应的K个类别描述进行向量化处理,得到与第一类别对应的K个类别描述的底层特征。
第一网络设备可以对C个类别中每个第一类别均执行上述操作,得到C个类别中每个第一类别所对应的K个类别描述。需要说明的是,C个类别中的不同类别所对应的K中类别描述可以相同或不同。
可选地,与同一个第一类别对应的K个类别描述的底层特征包括第一底层特征和第二底层特征,第一类别的名称的底层特征在第一底层特征中的位置和第一类别的名称的底层特征在第二底层特征中的位置不同;对应的,与同一个第一类别对应的K个类别描述的高层特征包括第一高层特征和第二高层特征,第一类别的名称的高层特征在第一高层特征中的位置和第一类别的名称的高层特征在第二高层特征中的位置可以不同。
进一步地,K个类别描述可以包括第一类别描述和第二类别描述,第一类别的名称在第一类别描述中的位置和第一类别的名称在第二类别描述中的位置可以不同。
为更直观地理解本方案,请参阅图4,图4为本申请实施例提供的类别描述的特征信息的获取方法中类别描述的特征信息的一种示意图,需要说明的是,图4中是采用可视的方式对类别描述的特征信息进行展示,如图所示,在不同的类别描述的特征信息中,类别名称的特征信息的位置可以不同,应理解,图4中的示例仅为方便理解本方案,不用于限定本方案。
本申请实施例中,由于姿势、形变和光照条件等因素的影响,同一类别的图像中不同 的图像存在多样性,而则同一类别中不同的图像所适配的类别描述的特征信息可以不同,本方案中第一类别的特征信息在第一特征信息中的位置和第一类别的特征信息在第二特征信息中的位置不同,有利于提高最终得到的至少两种类别描述的特征信息的多样性,以提高与同一类别中不同的图像的适配度,进而有利于提高图像识别结果的准确度。
302、第一网络设备根据与第一类别对应的至少两个类别描述的高层特征,确定与第一类别对应的类别描述的高层特征的分布信息。
本申请的一些实施例中,由于技术人员在研究中发现,同一类别所对应的多个类别描述的高层特征分布较为集中,也即同一类别所对应的多个类别描述的高层特征在特征空间是相邻的,因此第一网络设备可以根据C个类别中每个第一类别所对应的K个类别描述的高层特征,对C个类别中每个第一类别所对应的类别描述的高层特征的分布进行建模,以确定与C个类别中每个第一类别对应的类别描述的高层特征的分布信息。
为更直观地理解本方案,请参阅图5,图5为本申请实施例提供的多个类别所对应的类别描述的高层特征的特征分布的一种示意图,图5是对多个类别所对应的类别描述的高层特征进行可视化处理后得到的,图5中的每个点代表一个类别描述的高层特征,如图5所示,同一类别所对应的多个类别描述的高层特征临近,不同类别所对应的类别描述的高层特征距离较远,应理解,图5中的示例仅为方便理解本方案,不用于限定本方案。
针对C个类别中的任意一个类别(为方便描述,后续称为“第一类别”),第一网络设备可以采用目标模型对第一类别所对应的类别描述的高层特征进行建模,以确定用于描述第一类别所对应的类别描述的高层特征的分布信息;其中,目标模型可以采用高斯分布模型、混合高斯分布模型、冯米塞斯(von Mises)分布模型或其他类型的模型等,此处不做穷举。作为示例,若采用高斯模型对第一类别所对应的类别描述的高层特征进行建模,则第一网络设备需要获取第一类别所对应的K个类别描述的高层特征的均值和方差(也即第一类别所对应的类别描述的高层特征的分布信息的一种示例),需要说明的是,若采用其他模型进行建模,则需要获取其他类型的分布信息,此处不做限定。
为进一步理解本方案,以下公开了第一类别所对应的K个类别描述的高层特征的均值和方差的计算公式的一个示例:

其中,μ(PK)代表第一类别所对应的K个类别描述的高层特征的均值向量,Pk代表第一类别所对应的K个类别描述模板中第k个类别描述模板,K是第一类别所对应的类别描述模板的数量,w1:C(Pk)包括与第k个类别描述模板对应的C个高层特征,w1:C(Pk)中的每个高层特征包括第k个类别描述模板与一个类别的类别名称结合后的文本描述的高层特征。
Σ(PK)代表第一类别所对应的K个类别描述的高层特征的协方差矩阵,(w1:C(Pk)-μ)T代表w1:C(Pk)-μ的转置,式(2)中其它元素的含义与上述对式(1)中元素的含义一致, 此处不做赘述,应理解,式(1)和式(2)中均以C个类别采用相同的类别描述模板为例的一种示例,不用于限定本方案。
第一网络设备对C个类别中的每个第一类别均执行上述操作,从而能够得到C个类别中每个第一类别所对应的类别描述的高层特征的分布信息。
为更直观地理解本方案,请参阅图6,图6为本申请实施例提供的确定多个类别所对应的类别描述的高层特征的分布信息的一种流程示意图,如图6所示,第一网络设备可以获取多个类别描述模板的底层特征,将每个类别描述模板的底层特征与一个类别名称的底层特征进行结合,得到一个类别描述的底层特征,图6中示出狗、鸟和猫这三种类别。
第一网络设备将与3个类别中每个类别对应的多个类别描述的底层特征输入第一神经网络,得到第一神经网络输出的与3个类别中每个类别对应的多个类别描述的高层特征。第一网络设备根据每个类别所对应的多个类别描述的高层特征,可以得到每个类别所对应的类别描述的高层特征的分布信息,应理解,图6中的示例仅为方便理解本方案,不用于限定本方案。
303、第一网络设备根据每个第一类别所对应的类别描述的高层特征的分布信息,执行采样操作,得到目标特征信息集合,目标特征信息集合包括与每个第一类别对应的高层特征。
本申请的一些实施例中,针对C个类别中的任意一个类别(为方便描述,后续称为“第一类别”),第一网络设备可以根据第一类别所对应的K个类别描述的高层特征的分布信息,采样得到至少一个高层特征。示例性的,若采用高斯分布模型对第一类别所对应的类别描述的高层特征进行建模,第一网络设备可以根据与第一类别所对应的K个类别描述的高层特征的均值和方差,执行该采样操作,采样得到的至少一个高层特征服从第一类别所对应的类别描述的高层特征的分布。
第一网络设备对C个类别中的每个第一类别均执行前述操作,得到至少一个目标特征信息集合,每个目标特征信息集合包括C个类别中每个第一类别所对应的一个高层特征,也即每个目标特征信息集合包括与C个类别一一对应的C个高层特征。
需要说明的是,目标特征信息集合中每个第一类别所对应的高层特征和步骤301中获取的每个第一类别所对应的K个高层特征可以不同。
304、第一网络设备根据每个第一类别对应的类别描述的高层特征和训练图像的特征信息,生成训练图像的预测类别信息,预测类别信息指向的训练图像的预测类别包括于C个类别。
本申请实施例中,步骤302和303均为可选步骤,若执行步骤302和303,第一网络设备可以获取训练图像的特征信息,并根据目标特征信息集合和训练图像的特征信息,生成训练图像的预测类别信息,预测类别信息指向的训练图像的预测类别包括于C个类别。
具体的,第一网络设备上可以配置有多个训练数据,在一种实现方式中,每个训练数据可以包括一个训练图像和该训练图像的正确类别信息,该训练图像中物体的正确类别包括于C个类别中,也即该正确类别信息指向的训练图像的正确类别包括于C个类别。第一网络设备可以将训练图像输入第二神经网络,得到第二神经网络生成的训练图像的特征。 在另一种实现方式中,每个训练数据可以直接包括一个训练图像的特征信息和该训练图像的正确类别信息。
第一网络设备获取训练图像的特征与目标特征信息集合中的C个类别中每个类别所对应的高层特征之间的相似度,根据前述相似度生成训练图像的预测类别信息;其中,训练图像的特征与C个类别中某个类别(为方便描述,后续称为“第二类别”)所对应的高层特征之间的相似度越高,该训练图像的预测类别为第二类别的概率越大。
本申请实施例中,由于技术人员在研究中发现同一类别所对应的多个类别描述的高层特征分布较为集中,因此可以根据每个第一类别所对应的类别描述的高层特征的分布信息,执行采样操作以得到每个第一类别对应的高层特征,并根据采样得到的高层特征和图像的特征信息,生成图像的预测类别信息;由于图像的预测特征用于生成第一损失函数的函数值的,也即基于采样得到的高层特征得到第一损失函数的函数值,迭代更新的目的包括降低第一损失函数的函数值,也即迭代更新的目的包括基于采样的高层特征(也即K个类别描述的高层特征周围的高层特征)也能够得到更加准确的预测类别信息,也即设立了更高的更新标准,有利于得到更优的类别描述的特征信息,进而有利于提高图像识别的准确率。
若不执行步骤303和304,在一种实现方式中,第一网络设备也可以获取第一特征信息集合,第一特征信息集合包括与C个类别一一对应的C个高层特征,第一特征信息集合中的每个高层特征为对每个第一类别对应的K个类别描述的高层特征进行平均得到。进而根据第一特征信息集合和训练图像的特征信息,生成训练图像的预测类别信息,前述步骤的具体实现方式可以参阅上述描述,此处不做赘述。
在另一种实现方式中,第一网络设备也可以获取第二特征信息集合,第二特征信息集合包括与C个类别一一对应的C个高层特征,第二特征信息集合中的每个高层特征为从每个第一类别对应的K个类别描述的高层特征中选取出的一个高层特征。进而根据第二特征信息集合和训练图像的特征信息,生成训练图像的预测类别信息,前述步骤的具体实现方式可以参阅上述描述,此处不做赘述。
305、第一网络设备生成第一损失函数的函数值,利用第一损失函数进行迭代更新的目标包括提高训练图像的预测类别信息和训练图像的正确类别信息之间的相似度。
本申请实施例中,第一网络设备可以根据训练图像的预测类别信息和训练图像的正确类别信息,生成第一损失函数的函数值,其中,利用第一损失函数进行迭代更新的目标包括提高训练图像的预测类别信息和训练图像的正确类别信息之间的相似度。
进一步地,在一种实现方式中,第一损失函数可以直接采用训练图像的预测类别信息和训练图像的正确类别信息之间的距离,前述距离可以为余弦距离、欧式距离、L1距离、L2距离或其他类型的距离等,此处不做穷举。或者,第一损失函数可以直接采用训练图像的预测类别信息和训练图像的正确类别信息之间的相似度,前述相似度可以为余弦相似度、基于欧式距离得到的相似度或其他类型的相似度等等,此处不做穷举。
在另一种实现方式中,目标函数为预测类别信息和图像的正确类别信息之间的距离,则利用目标函数进行迭代更新的目标可以包括缩小目标函数的值。由于在实际计算中,可以重复执行步骤304无数次,得到训练图像的无数个预测类别信息,进而生成无数个目标 函数的函数值的和,进而利用无数个目标函数的函数值的和执行后续更新操作。但由于无数个目标函数的函数值的和无法计算,则可以替换计算第一损失函数的函数值,第一损失函数的函数值大于或等于目标函数的函数值,目标函数为预测类别信息和图像的正确类别信息之间的距离,利用第一损失函数进行迭代更新的目标包括降低第一损失函数的函数值,由于第一损失函数的函数值大于或等于目标函数的函数值,也即利用第一损失函数进行迭代更新的目标包括降低目标函数的函数值,也即缩小预测类别信息和图像的正确类别信息之间的距离,也可以称为提高训练图像的预测类别信息和训练图像的正确类别信息之间的相似度。
为进一步理解本方案,以下以采用高斯分布模型进行建模为例,公开了无数个目标函数的函数值的和以及第一损失函数的计算公式的一个示例:
其中,L(PK)代表无数个目标函数的函数值的和,xi代表第i个训练图像,yi代表第i个训练图像的正确类别信息,代表对训练数据(xi,yi)的所有可能性计算数学期望,log(…)是对数函数,代表对w1:C的所有可能性计算数学期望,w1:C是服从高斯分布N(μ(PK),Σ(PK))的随机向量,μ(PK)和Σ(PK)的含义可以参阅上述式(1)和式(2)中的描述,e是自然常数,zi代表第i个训练图像的特征信息,是zi的转置,代表w1:C中与yi这一类别对应的多个高层特征,τ是预先定义的超参数,代表C个类别的的进行求和,wc代表w1:C中与第c个类别对应的多个高层特征,应理解,式(3)中的示例仅为方便理解本方案,不用于限定本方案。
其中,代表第一损失函数,∑c...是求和符号,log(…)是对数函数,e是自然常数,代表当执行到第i组训练数据(xi,yi)时,第一类别所对应的K个类别描述的高层特征的均值,μc(PK)代表C个类别中第c个类别所对应的K个类别描述的高层特征的均值,是基于多个Ai,j得到,Ai,j=Σiijjijji,Σii代 表C个类别中第i个类别和第i个类别之间的协方差矩阵,Σjj代表第i个类别和C个类别中第j个类别之间的协方差矩阵,Σij代表第i个类别和第j个类别之间的协方差矩阵,Σji代表第j个类别和第i个类别之间的协方差矩阵,式(4)中其它元素的含义可以参阅上述对式(1)至式(3)的描述进行理解,此处不做赘述,应理解,式(4)中的示例仅为方便理解本方案,不用于限定本方案。
本申请实施例中,若直接计算无数个目标函数的函数值的和比较困难,则可以替换计算第一损失函数的函数值,由于第一损失函数的函数值大于或等于目标函数的函数值,迭代更新的目标包括降低所述第一损失函数的函数值,也即迭代更新的目标包括降低目标函数的函数值,而目标函数为所述预测类别信息和所述图像的正确类别信息之间的距离,也即缩小预测类别信息和所述图像的正确类别信息之间的距离,也即在保持了正确的训练目标的前提下,能够替换采用更为简单的损失函数,有利于扩展本方案实现的灵活性。
306、第一网络设备生成第二损失函数的函数值,采用第二损失函数进行迭代更新的目标包括缩小K个类别描述模板的特征信息之间的相似度。
本申请的一些实施例中,第一网络设备还可以生成第二损失函数的函数值,其中,第二损失函数的函数值可以基于K个类别描述模板的特征信息中任意两个类别描述模板的特征信息之间的相似度得到,或者,第二损失函数可以基于K个类别描述模板的特征信息中任意两个类别描述模板的特征信息之间的距离得到,对于“相似度”和“距离”这两个概念的描述可以参阅步骤305中的描述,此处不做赘述。
为进一步理解本方案,以下公开了第二损失函数的函数值的公式的一个示例:
其中,Lso代表第二损失函数的函数值,K代表类别描述模板的数量,g(Pi)代表K个类别描述模板中第i个类别描述模板的高层特征,g(Pi)代表K个类别描述模板中第j个类别描述模板的高层特征,<g(Pi),g(Pj)>代表g(Pi)和g(Pj)之间的余弦距离,应理解,式(5)中的示例仅为方便理解本方案,不用于限定本方案。
307、第一网络设备根据第一损失函数的函数值,对与每个第一类别对应的K个类别描述的特征信息进行更新。
本申请实施例中,步骤306为可选步骤,若执行步骤306,则第一网络设备可以对第一损失函数的函数值和第二损失函数的函数值进行加权求和,得到总的损失函数的函数值,并根据总的损失函数的函数值对与每个第一类别对应的K个类别描述的特征信息进行更新。
本申请实施例中,还采用第二损失函数的函数值对每个第一类别对应的K个类别描述的特征信息进行更新,采用第二损失函数进行迭代更新的目标包括缩小至少两个类别描述模板的特征信息之间的相似度,也即放大每个第一类别对应的K个类别描述的特征信息之 间的距离,也即进一步提高每个第一类别对应的K个类别描述的特征信息的多样性,以提高与同一类别中不同的图像的适配度,进而有利于提高图像识别结果的准确度。
若不执行步骤306,则第一网络设备可以根据第一损失函数的函数值对K个类别描述的特征信息进行更新。为更直观地理解本方案,请参阅图7,图7为本申请实施例提供的类别描述的特征信息的获取方法的一种流程示意图,图7可以结合上述对图6的描述进行理解,在得到每个类别所对应的类别描述的高层特征的分布信息之后,可以根据每个类别所对应的类别描述的高层特征的分布信息执行采样操作,得到每个类别所对应的一个高层特征。第一网络设备获取训练图像的特征信息,计算训练图像的特征信息和每个类别所对应的一个高层特征之间的相似度,以生成训练图像的预测类别信息,如图7所示,训练图像的特征信息和猫这一类别所对应的一个高层特征的相似度最大,应理解,图7中的示例仅为方便理解本方案,不用于限定本方案。
具体的,第一网络设备可以对第一损失函数的函数值(或总的损失函数的函数值)进行梯度求导,并对求导结果进行反向传播,以更新与C个类别对应的类别描述的特征信息。
进一步地,由于第一网络设备上会配置有第一神经网络,可选地,还可以配置有第二神经网络。则在一种实现方式中,在对求导结果进行反向传播的过程中,可以对第一神经网络的参数和/或第二神经网络的参数进行更新。在另一种实现方式中,在对求导结果进行反向传播的过程中,第一网络设备可以保持第一神经网络的参数和/或第二神经网络的参数不变。
本申请实施例中,在对类别描述模板的底层特征进行梯度更新的过程中,保持第一神经网络的参数不变,有利于减少需要更新的参数的数量,进而有利于更快的得到合适的特征信息,有利于提高获取类别描述的特征信息的效率。
更具体的,在一种实现方式中,第一网络设备可以对求导结果进行反向传播,以直接更新K个类别描述模板的底层特征,由于与C个类别中每个第一类别对应的每个类别描述的底层特征中均包括类别描述模板的底层特征,也即实现了对与C个类别中每个第一类别对应的K个类别描述的底层特征的更新,进而也实现了对与C个类别中每个第一类别对应的K个类别描述的高层特征的更新。
在另一种实现方式中,第一网络设备可以对求导结果进行反向传播,以直接更新与C个类别中每个第一类别对应的K个类别描述的底层特征,进而也实现了对与C个类别中每个第一类别对应的K个类别描述的高层特征的更新。
第一网络设备重复执行上述步骤,以实现对与每个第一类别对应的K个类别描述的特征信息进行迭代更新,直至满足收敛条件;收敛条件可以包括达到第一损失函数(可选地,还包括第二损失函数)的收敛条件,或者,迭代更新的次数达到预设次数。
本申请实施例中,获取与C个类别中每个第一类别对应的至少两个类别描述的特征信息,根据每个第一类别对应的至少两个类别描述的特征信息和图像的特征信息,生成图像的预测类别信息,根据图像的正确类别信息、预测类别信息和第一损失函数,对至少两个类别描述的特征信息进行自动更新,直至满足收敛条件,利用第一损失函数进行迭代更新的目标包括提高预测类别信息和图像的正确类别信息之间的相似度;通过前述方案,能够 自动学习每个类别所对应的至少两个类别描述的特征信息,且迭代更新的目标包括提高图像识别任务的准确率,有利于获得与识别任务更为匹配的类别描述,且由于同一类别的物体在不同的图像中存在各种变化,导致与同一类别的不同图像所对应的最适配的类别描述可能不同,本方案中获取每种类别对应的至少两个类别描述,有利于提高与同一类别中的不同图像的适配度,以进一步提高图像识别过程的准确性。
二、类别描述的特征信息的应用阶段
本申请实施例中,具体的,请参阅图8,图8为本申请实施例提供的图像的处理方法的一种流程示意图,本申请实施例提供的图像的处理方法可以包括:
801、第二网络设备对图像进行特征提取,得到图像的特征信息。
本申请实施例中,第二网络设备可以将待处理的图像输入第二神经网络,得到第二神经网络生成的图像的特征信息。
802、第二网络设备根据与C个类别中每个第一类别对应的类别描述的特征信息和图像的特征信息,生成图像的预测类别信息,预测类别信息指向的图像的预测类别包括于C个类别,第一类别的类别描述包括类别描述模板和第一类别,与C个类别中每个第一类别对应的特征信息基于每个第一类别所对应的至少两个类别描述的特征信息得到,每个第一类别所对应的至少两个类别描述的特征信息为利用第一损失函数进行迭代更新后得到,利用第一损失函数进行迭代更新的目标包括提高预测类别信息和图像的正确类别信息之间的相似度。
本申请实施例中,步骤802的具体实现方式可以参阅图3对应实施例中步骤304的描述,此处不做赘述。其中,“与C个类别中每个第一类别对应的类别描述的特征信息”可以为“每个第一类别所对应的至少两个类别描述的特征信息”的均值、中位值或其他值等,此处不做限定。需要说明的是,图8对应实施例中“每个第一类别所对应的至少两个类别描述的特征信息”为基于图3对应的方法得到,图8中各个名词的含义均可参阅图3对应实施例中的描述,此处不做赘述。
接下来结合实验数据对本申请提供的类别描述的特征信息的获取方法所带来的有益效果进行展示。请参阅图9a,图9a为本申请实施例提供的类别描述的特征信息的获取方法的有益效果的一种示意图。如图所示,星号组为将人工设计的类别描述模板和C个类别结合以得到类别描述;参照组一为自主学习一个类别描述的特征信息;参照组二为直接进行图像分类训练,通过图9a中的示例可以看出,采用本申请实施例提供的方案得到的至少两个类别描述的特征信息,能够实现最高的准确度评分。
本申请中同一类别的类别名称的特征信息在不同类别描述的特征信息的位置可以不同,请参阅图9b,图9b为本申请实施例提供的类别描述的特征信息的获取方法的有益效果的一种示意图。图9b中示出了在多个数据集上的实验结果,图9b中每个纵坐标上均为不同的数据集,如图所示,与采用人工设计的类别描述模板和参照二组相比,采用本申请实施例提供的方案得到的至少两个类别描述的特征信息,能够实现最高的准确度评分。
在图1至图8所对应的实施例的基础上,为了更好的实施本申请实施例的上述方案, 下面还提供用于实施上述方案的相关设备。具体参阅图10,图10为本申请实施例提供的类别描述的特征信息的获取装置的一种结构示意图,类别描述的特征信息的获取装置1000包括:获取模块1001,用于获取与C个类别中每个第一类别对应的至少两个类别描述的特征信息,C为大于或等于2的整数,第一类别的类别描述包括类别描述模板和第一类别;生成模块1002,用于根据每个第一类别对应的至少两个类别描述的特征信息和图像的特征信息,生成图像的预测类别信息,预测类别信息指向的图像的预测类别包括于C个类别;更新模块1003,用于根据第一损失函数,对与每个第一类别对应的至少两个类别描述的特征信息进行更新,直至满足收敛条件,利用第一损失函数进行迭代更新的目标包括提高预测类别信息和图像的正确类别信息之间的相似度。
在一种可能的设计中,与第一类别对应的至少两个类别描述的特征信息包括第一特征信息和第二特征信息,第一类别的特征信息在第一特征信息中的位置和第一类别的特征信息在第二特征信息中的位置不同。
在一种可能的设计中,至少两个类别描述的特征信息包括至少两个类别描述的高层特征,类别描述的高层特征为神经网络中的隐含层/神经网络生成的特征,神经网络用于对类别描述进行特征更新;
生成模块1002,具体用于:根据与第一类别对应的至少两个类别描述的高层特征,确定与第一类别对应的类别描述的高层特征的分布信息;根据每个第一类别所对应的类别描述的高层特征的分布信息,执行采样操作,得到特征信息集合,特征信息集合包括与每个第一类别对应的高层特征;根据图像的特征信息和特征信息集合,生成图像的预测类别信息。
在一种可能的设计中,获取模块1001,具体用于:获取与每个第一类别对应的至少两个类别描述的底层特征,类别描述的底层特征为向量化形式的类别描述;将类别描述的底层特征输入神经网络,通过神经网络对类别描述的底层特征进行更新,得到类别描述的高层特征;
更新模块1003,具体用于在保持神经网络的参数不变的前提下,根据第一损失函数的函数值对与C个类别对应的至少两个类别描述模板的底层特征进行梯度更新,以得到与每个第一类别对应的至少两个类别描述的更新后的底层特征。
在一种可能的设计中,更新模块1003,具体用于根据第一损失函数和第二损失函数,对与每个第一类别对应的至少两个类别描述的特征信息进行更新,采用第二损失函数进行迭代更新的目标包括缩小至少两个类别描述模板的特征信息之间的相似度。
在一种可能的设计中,第一损失函数的函数值大于或等于目标函数的函数值,目标函数为预测类别信息和图像的正确类别信息之间的距离,迭代更新的目标包括降低第一损失函数的函数值。
需要说明的是,类别描述的特征信息的获取装置1000中各模块/单元之间的信息交互、执行过程等内容,与本申请中图3至图7对应的各个方法实施例基于同一构思,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
参阅图11,图11为本申请实施例提供的图像的处理装置的一种结构示意图,图像的 处理装置1100包括:获取模块1101,用于对图像进行特征提取,得到图像的特征信息;生成模块1102,用于根据与C个类别中每个第一类别对应的类别描述的特征信息和图像的特征信息,生成图像的预测类别信息,预测类别信息指向的图像的预测类别包括于C个类别,C为大于或等于2的整数。
其中,第一类别的类别描述包括类别描述模板和第一类别,与C个类别中每个第一类别对应的特征信息基于每个第一类别所对应的至少两个类别描述的特征信息得到,每个第一类别所对应的至少两个类别描述的特征信息为利用第一损失函数进行迭代更新后得到,利用第一损失函数进行迭代更新的目标包括提高预测类别信息和图像的正确类别信息之间的相似度。
在一种可能的设计中,与第一类别对应的至少两个类别描述的特征信息包括第一特征信息和第二特征信息,第一类别的特征信息在第一特征信息中的位置和第一类别的特征信息在第二特征信息中的位置不同。
在一种可能的设计中,与C个类别对应的类别描述的特征信息是利用第一损失函数和第二损失函数进行迭代更新后得到,第二损失函数指示至少两个类别描述模板的特征信息之间的距离。
需要说明的是,图像的处理装置1100中各模块/单元之间的信息交互、执行过程等内容,与本申请中图8对应的各个方法实施例基于同一构思,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
接下来介绍本申请实施例提供的一种网络设备,请参阅图12,图12为本申请实施例提供的网络设备的一种结构示意图,具体的,网络设备1200包括:接收器1201、发射器1202、处理器1203和存储器1204(其中第二网络设备1200中的处理器1203的数量可以一个或多个,图12中以一个处理器为例),其中,处理器1203可以包括应用处理器12031和通信处理器12032。在本申请的一些实施例中,接收器1201、发射器1202、处理器1203和存储器1204可通过总线或其它方式连接。
存储器1204可以包括只读存储器和随机存取存储器,并向处理器1203提供指令和数据。存储器1204的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1204存储有处理器和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。
处理器1203控制网络设备的操作。具体的应用中,网络设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。
上述本申请实施例揭示的方法可以应用于处理器1203中,或者由处理器1203实现。处理器1203可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1203中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1203可以是通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器,还可进一步包括专用集成电路(application specific integrated  circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。该处理器1203可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1204,处理器1203读取存储器1204中的信息,结合其硬件完成上述方法的步骤。
接收器1201可用于接收输入的数字或字符信息,以及产生与第二网络设备的相关设置以及功能控制有关的信号输入。发射器1202可用于通过第一接口输出数字或字符信息;发射器1202还可用于通过第一接口向磁盘组发送指令,以修改磁盘组中的数据;发射器1202还可以包括显示屏等显示设备。
本申请实施例中,处理器1203,用于执行图3至图7对应实施例中的第二网络设备执行的图像的处理方法。需要说明的是,处理器1203中的应用处理器12031执行上述各个步骤的具体方式,与本申请中图3至图7对应的各个方法实施例基于同一构思,其带来的技术效果与本申请中图3至图7对应的各个方法实施例相同,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
本申请实施例还提供了一种网络设备,请参阅图13,图13是本申请实施例提供的网络设备一种结构示意图,具体的,网络设备1300由一个或多个服务器实现,第一网络设备1300可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1322(例如,一个或一个以上处理器)和存储器1332,一个或一个以上存储应用程序1342或数据1344的存储介质1330(例如一个或一个以上海量存储设备)。其中,存储器1332和存储介质1330可以是短暂存储或持久存储。存储在存储介质1330的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对第一网络设备中的一系列指令操作。更进一步地,中央处理器1322可以设置为与存储介质1330通信,在第一网络设备1300上执行存储介质1330中的一系列指令操作。
第一网络设备1300还可以包括一个或一个以上电源1326,一个或一个以上有线或无线网络接口1350,一个或一个以上输入输出接口1358,和/或,一个或一个以上操作系统1341,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
本申请实施例中,中央处理器1322,用于执行图8对应实施例中的第一网络设备执行的类别描述的特征信息的获取方法。需要说明的是,中央处理器1322执行上述各个步骤的具体方式,与本申请中图8对应的各个方法实施例基于同一构思,其带来的技术效果与本申请中图8对应的各个方法实施例相同,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行如前述图3至图7所示实施例描述的方法中第一网络设备所执行的步骤,或者,使 得计算机执行如前述图8所示实施例描述的方法中第二网络设备所执行的步骤。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于进行信号处理的程序,当其在计算机上运行时,使得计算机执行如前述图3至图7所示实施例描述的方法中第一网络设备所执行的步骤,或者,使得计算机执行如前述图8所示实施例描述的方法中第二网络设备所执行的步骤。
本申请实施例提供的第二网络设备、第一网络设备、第二网络设备或通信设备具体可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使芯片执行上述图3至图7所示实施例描述的类别描述的特征信息的获取方法,或者,以使芯片执行上述图8所示实施例描述的图像的处理方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。
具体的,请参阅图14,图14为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 140,NPU 140作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路140,通过控制器1404控制运算电路1403提取存储器中的矩阵数据并进行乘法运算。
在一些实现中,运算电路1403内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路1403是二维脉动阵列。运算电路1403还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路1403是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器1402中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器1401中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)1408中。
统一存储器1406用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(Direct Memory Access Controller,DMAC)1405,DMAC被搬运到权重存储器1402中。输入数据也通过DMAC被搬运到统一存储器1406中。
BIU为Bus Interface Unit即,总线接口单元1410,用于AXI总线与DMAC和取指存储器(Instruction Fetch Buffer,IFB)1409的交互。
总线接口单元1410(Bus Interface Unit,简称BIU),用于取指存储器1409从外部存储器获取指令,还用于存储单元访问控制器1405从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器1406或将权重数据搬运到权重存储器1402中或将输入数据数据搬运到输入存储器1401中。
向量计算单元1407包括多个运算处理单元,在需要的情况下,对运算电路的输出做 进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如Batch Normalization(批归一化),像素级求和,对特征平面进行上采样等。
在一些实现中,向量计算单元1407能将经处理的输出的向量存储到统一存储器1406。例如,向量计算单元1407可以将线性函数和/或非线性函数应用到运算电路1403的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元1407生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路1403的激活输入,例如用于在神经网络中的后续层中的使用。
控制器1404连接的取指存储器(instruction fetch buffer)1409,用于存储控制器1404使用的指令;
统一存储器1406,输入存储器1401,权重存储器1402以及取指存储器1409均为On-Chip存储器。外部存储器私有于该NPU硬件架构。
其中,上述各个方法实施例中所示的第一神经网络和第二神经网络中各层的运算可以由运算电路1403或向量计算单元1407执行。
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述第一方面方法的程序执行的集成电路。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,第一网络设备,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是 通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、第一网络设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、第一网络设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的第一网络设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。

Claims (22)

  1. 一种类别描述的特征信息的获取方法,其特征在于,所述方法包括:
    获取与C个类别中每个第一类别对应的至少两个类别描述的特征信息,所述C为大于或等于2的整数,所述第一类别的类别描述包括类别描述模板和所述第一类别;
    根据每个所述第一类别对应的至少两个类别描述的特征信息和图像的特征信息,生成所述图像的预测类别信息,所述预测类别信息指向的所述图像的预测类别包括于所述C个类别;
    根据第一损失函数,对与每个第一类别对应的所述至少两个类别描述的特征信息进行更新,直至满足收敛条件,利用所述第一损失函数进行迭代更新的目标包括提高所述预测类别信息和所述图像的正确类别信息之间的相似度。
  2. 根据权利要求1所述的方法,其特征在于,与所述第一类别对应的所述至少两个类别描述的特征信息包括第一特征信息和第二特征信息,所述第一类别的特征信息在所述第一特征信息中的位置和所述第一类别的特征信息在所述第二特征信息中的位置不同。
  3. 根据权利要求1或2所述的方法,其特征在于,所述至少两个类别描述的特征信息包括所述至少两个类别描述的高层特征,所述类别描述的高层特征为神经网络中的隐含层/所述神经网络生成的特征,所述神经网络用于对所述类别描述进行特征更新;
    所述根据每个所述第一类别对应的至少两个类别描述的特征信息和图像的特征信息,生成所述图像的预测类别信息,包括:
    根据与所述第一类别对应的至少两个类别描述的高层特征,确定与所述第一类别对应的所述类别描述的高层特征的分布信息;
    根据每个所述第一类别所对应的所述类别描述的高层特征的分布信息,执行采样操作,得到特征信息集合,所述特征信息集合包括与每个所述第一类别对应的所述高层特征;
    根据图像的特征信息和所述特征信息集合,生成所述图像的预测类别信息。
  4. 根据权利要求3所述的方法,其特征在于,所述获取与C个类别中每个第一类别对应的至少两个类别描述的特征信息,包括:
    获取与每个所述第一类别对应的至少两个类别描述的底层特征,所述类别描述的底层特征为向量化形式的所述类别描述;
    将所述类别描述的底层特征输入所述神经网络,通过所述神经网络对所述类别描述的底层特征进行更新,得到所述类别描述的高层特征;
    所述对与每个第一类别对应的所述至少两个类别描述的特征信息进行更新,包括:
    在保持所述神经网络的参数不变的前提下,根据所述第一损失函数的函数值对与所述C个类别对应的至少两个所述类别描述模板的底层特征进行梯度更新,以得到与每个第一类别对应的所述至少两个类别描述的更新后的底层特征。
  5. 根据权利要求1或2所述的方法,其特征在于,所述根据第一损失函数,对与每个第一类别对应的所述至少两个类别描述的特征信息进行更新,包括:
    根据所述第一损失函数和第二损失函数,对与每个第一类别对应的所述至少两个类别描述的特征信息进行更新,采用所述第二损失函数进行迭代更新的目标包括缩小至少两个 类别描述模板的特征信息之间的相似度。
  6. 根据权利要求1或2所述的方法,其特征在于,所述第一损失函数的函数值大于或等于目标函数的函数值,所述目标函数为所述预测类别信息和所述图像的正确类别信息之间的距离,所述迭代更新的目标包括降低所述第一损失函数的函数值。
  7. 一种图像的处理方法,其特征在于,所述方法包括:
    对图像进行特征提取,得到所述图像的特征信息;
    根据与C个类别中每个第一类别对应的类别描述的特征信息和图像的特征信息,生成所述图像的预测类别信息,所述预测类别信息指向的所述图像的预测类别包括于所述C个类别,所述C为大于或等于2的整数;
    其中,所述第一类别的类别描述包括类别描述模板和所述第一类别,与所述C个类别中每个第一类别对应的特征信息基于每个所述第一类别所对应的至少两个类别描述的特征信息得到,每个所述第一类别所对应的至少两个类别描述的特征信息为利用第一损失函数进行迭代更新后得到,利用所述第一损失函数进行迭代更新的目标包括提高所述预测类别信息和所述图像的正确类别信息之间的相似度。
  8. 根据权利要求7所述的方法,其特征在于,与所述第一类别对应的所述至少两个类别描述的特征信息包括第一特征信息和第二特征信息,所述第一类别的特征信息在所述第一特征信息中的位置和所述第一类别的特征信息在所述第二特征信息中的位置不同。
  9. 根据权利要求7或8所述的方法,其特征在于,与所述C个类别对应的所述类别描述的特征信息是利用所述第一损失函数和第二损失函数进行迭代更新后得到,所述第二损失函数指示至少两个所述类别描述模板的特征信息之间的距离。
  10. 一种类别描述的特征信息的获取装置,其特征在于,所述装置包括:
    获取模块,用于获取与C个类别中每个第一类别对应的至少两个类别描述的特征信息,所述C为大于或等于2的整数,所述第一类别的类别描述包括类别描述模板和所述第一类别;
    生成模块,用于根据每个所述第一类别对应的至少两个类别描述的特征信息和图像的特征信息,生成所述图像的预测类别信息,所述预测类别信息指向的所述图像的预测类别包括于所述C个类别;
    更新模块,用于根据第一损失函数,对与每个第一类别对应的所述至少两个类别描述的特征信息进行更新,直至满足收敛条件,利用所述第一损失函数进行迭代更新的目标包括提高所述预测类别信息和所述图像的正确类别信息之间的相似度。
  11. 根据权利要求10所述的装置,其特征在于,与所述第一类别对应的所述至少两个类别描述的特征信息包括第一特征信息和第二特征信息,所述第一类别的特征信息在所述第一特征信息中的位置和所述第一类别的特征信息在所述第二特征信息中的位置不同。
  12. 根据权利要求10或11所述的装置,其特征在于,所述至少两个类别描述的特征信息包括所述至少两个类别描述的高层特征,所述类别描述的高层特征为神经网络中的隐含层/所述神经网络生成的特征,所述神经网络用于对所述类别描述进行特征更新;
    所述生成模块,具体用于:
    根据与所述第一类别对应的至少两个类别描述的高层特征,确定与所述第一类别对应的所述类别描述的高层特征的分布信息;
    根据每个所述第一类别所对应的所述类别描述的高层特征的分布信息,执行采样操作,得到特征信息集合,所述特征信息集合包括与每个所述第一类别对应的所述高层特征;
    根据图像的特征信息和所述特征信息集合,生成所述图像的预测类别信息。
  13. 根据权利要求12所述的装置,其特征在于,所述获取模块,具体用于:
    获取与每个所述第一类别对应的至少两个类别描述的底层特征,所述类别描述的底层特征为向量化形式的所述类别描述;
    将所述类别描述的底层特征输入所述神经网络,通过所述神经网络对所述类别描述的底层特征进行更新,得到所述类别描述的高层特征;
    所述更新模块,具体用于在保持所述神经网络的参数不变的前提下,根据所述第一损失函数的函数值对与所述C个类别对应的至少两个所述类别描述模板的底层特征进行梯度更新,以得到与每个第一类别对应的所述至少两个类别描述的更新后的底层特征。
  14. 根据权利要求10或11所述的装置,其特征在于,
    所述更新模块,具体用于根据所述第一损失函数和第二损失函数,对与每个第一类别对应的所述至少两个类别描述的特征信息进行更新,采用所述第二损失函数进行迭代更新的目标包括缩小至少两个类别描述模板的特征信息之间的相似度。
  15. 根据权利要求10或11所述的装置,其特征在于,所述第一损失函数的函数值大于或等于目标函数的函数值,所述目标函数为所述预测类别信息和所述图像的正确类别信息之间的距离,所述迭代更新的目标包括降低所述第一损失函数的函数值。
  16. 一种图像的处理装置,其特征在于,所述装置包括:
    获取模块,用于对图像进行特征提取,得到所述图像的特征信息;
    生成模块,用于根据与C个类别中每个第一类别对应的类别描述的特征信息和图像的特征信息,生成所述图像的预测类别信息,所述预测类别信息指向的所述图像的预测类别包括于所述C个类别,所述C为大于或等于2的整数;
    其中,所述第一类别的类别描述包括类别描述模板和所述第一类别,与所述C个类别中每个第一类别对应的特征信息基于每个所述第一类别所对应的至少两个类别描述的特征信息得到,每个所述第一类别所对应的至少两个类别描述的特征信息为利用第一损失函数进行迭代更新后得到,利用所述第一损失函数进行迭代更新的目标包括提高所述预测类别信息和所述图像的正确类别信息之间的相似度。
  17. 根据权利要求16所述的装置,其特征在于,与所述第一类别对应的所述至少两个类别描述的特征信息包括第一特征信息和第二特征信息,所述第一类别的特征信息在所述第一特征信息中的位置和所述第一类别的特征信息在所述第二特征信息中的位置不同。
  18. 根据权利要求16或17所述的装置,其特征在于,与所述C个类别对应的所述类别描述的特征信息是利用所述第一损失函数和第二损失函数进行迭代更新后得到,所述第二损失函数指示至少两个所述类别描述模板的特征信息之间的距离。
  19. 一种计算机程序产品,其特征在于,所述计算机程序产品包括程序,当所述程序在 计算机上运行时,使得计算机执行如权利要求1至6中任一项所述的方法,或者,使得计算机执行如权利要求7至9中任一项所述的方法。
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有程序,当所述程序在计算机上运行时,使得计算机执行如权利要求1至6中任一项所述的方法,或者,使得计算机执行如权利要求7至9中任一项所述的方法。
  21. 一种网络设备,其特征在于,包括处理器和存储器,所述处理器与所述存储器耦合,
    所述存储器,用于存储程序;
    所述处理器,用于执行所述存储器中的程序,使得所述网络设备执行如权利要求1至6中任一项所述的方法。
  22. 一种网络设备,其特征在于,包括处理器和存储器,所述处理器与所述存储器耦合,
    所述存储器,用于存储程序;
    所述处理器,用于执行所述存储器中的程序,使得所述网络设执行如权利要求7至9中任一项所述的方法。
PCT/CN2023/089990 2022-04-29 2023-04-23 类别描述的特征信息的获取方法、图像的处理方法及设备 WO2023207823A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210491778.1 2022-04-29
CN202210491778.1A CN114998643A (zh) 2022-04-29 2022-04-29 类别描述的特征信息的获取方法、图像的处理方法及设备

Publications (1)

Publication Number Publication Date
WO2023207823A1 true WO2023207823A1 (zh) 2023-11-02

Family

ID=83026120

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/089990 WO2023207823A1 (zh) 2022-04-29 2023-04-23 类别描述的特征信息的获取方法、图像的处理方法及设备

Country Status (2)

Country Link
CN (1) CN114998643A (zh)
WO (1) WO2023207823A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114998643A (zh) * 2022-04-29 2022-09-02 华为技术有限公司 类别描述的特征信息的获取方法、图像的处理方法及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291803A (zh) * 2020-01-21 2020-06-16 中国科学技术大学 一种图像分级粒度迁移方法、系统、设备和介质
US10769766B1 (en) * 2018-05-31 2020-09-08 Amazon Technologies, Inc. Regularized multi-label classification from partially labeled training data
CN112418256A (zh) * 2019-08-21 2021-02-26 阿里巴巴集团控股有限公司 分类、模型训练、信息搜索方法、系统及设备
CN113065634A (zh) * 2021-02-26 2021-07-02 华为技术有限公司 一种图像处理方法、神经网络的训练方法以及相关设备
CN114998643A (zh) * 2022-04-29 2022-09-02 华为技术有限公司 类别描述的特征信息的获取方法、图像的处理方法及设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10769766B1 (en) * 2018-05-31 2020-09-08 Amazon Technologies, Inc. Regularized multi-label classification from partially labeled training data
CN112418256A (zh) * 2019-08-21 2021-02-26 阿里巴巴集团控股有限公司 分类、模型训练、信息搜索方法、系统及设备
CN111291803A (zh) * 2020-01-21 2020-06-16 中国科学技术大学 一种图像分级粒度迁移方法、系统、设备和介质
CN113065634A (zh) * 2021-02-26 2021-07-02 华为技术有限公司 一种图像处理方法、神经网络的训练方法以及相关设备
CN114998643A (zh) * 2022-04-29 2022-09-02 华为技术有限公司 类别描述的特征信息的获取方法、图像的处理方法及设备

Also Published As

Publication number Publication date
CN114998643A (zh) 2022-09-02

Similar Documents

Publication Publication Date Title
CN111797893B (zh) 一种神经网络的训练方法、图像分类系统及相关设备
WO2022007823A1 (zh) 一种文本数据处理方法及装置
WO2022017245A1 (zh) 一种文本识别网络、神经网络训练的方法以及相关设备
WO2022068623A1 (zh) 一种模型训练方法及相关设备
CN112288075B (zh) 一种数据处理方法及相关设备
WO2022012407A1 (zh) 一种用于神经网络的训练方法以及相关设备
WO2022228425A1 (zh) 一种模型训练方法及装置
WO2022111617A1 (zh) 一种模型训练方法及装置
WO2021238333A1 (zh) 一种文本处理网络、神经网络训练的方法以及相关设备
WO2022179586A1 (zh) 一种模型训练方法及其相关联设备
WO2022001724A1 (zh) 一种数据处理方法及装置
WO2024041479A1 (zh) 一种数据处理方法及其装置
WO2023231954A1 (zh) 一种数据的去噪方法以及相关设备
WO2022111387A1 (zh) 一种数据处理方法及相关装置
WO2023231753A1 (zh) 一种神经网络的训练方法、数据的处理方法以及设备
WO2023051369A1 (zh) 一种神经网络的获取方法、数据处理方法以及相关设备
WO2023207823A1 (zh) 类别描述的特征信息的获取方法、图像的处理方法及设备
WO2024001806A1 (zh) 一种基于联邦学习的数据价值评估方法及其相关设备
WO2021036397A1 (zh) 目标神经网络模型的生成方法和装置
WO2022179603A1 (zh) 一种增强现实方法及其相关设备
WO2022100607A1 (zh) 一种神经网络结构确定方法及其装置
CN114169393A (zh) 一种图像分类方法及其相关设备
WO2024046473A1 (zh) 一种数据处理方法及其装置
WO2023246735A1 (zh) 一种项目推荐方法及其相关设备
WO2023197857A1 (zh) 一种模型切分方法及其相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23795256

Country of ref document: EP

Kind code of ref document: A1