CN109785843B

CN109785843B - Image processing apparatus and method

Info

Publication number: CN109785843B
Application number: CN201711121244.5A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2017-11-14
Filing date: 2017-11-14
Publication date: 2024-03-26
Anticipated expiration: 2037-11-14
Also published as: CN109785843A

Abstract

The invention discloses an image processing device, comprising: the device comprises an input/output unit for inputting a voice signal and an image to be processed and outputting the processed image, a storage unit for storing the voice signal and the image to be processed and an image processing unit for converting the voice signal into an image processing instruction and a target area, wherein the target area is a processing area of the image to be processed; and processing the target area according to the image processing instruction to obtain a processed image, and storing the image to be processed in the storage unit. By adopting the embodiment of the invention, the function of processing the image by inputting voice can be realized, the time for a user to learn the image processing software before processing the image is saved, and the user experience is improved.

Description

Image processing apparatus and method

Technical Field

The present invention relates to the field of image processing, and in particular, to an image processing apparatus and method.

Background

After the user takes the picture, in order to show better image effect, the image is processed by PS software in the computer or picture repairing software in the mobile phone.

However, before the PS software in the computer or the image repair software in the mobile phone is used for processing the image, the user needs to learn to grasp the use method of the software, and after grasping the use method of the software, needs to manually input an instruction to control the computer or the mobile phone to perform the image repair operation. This approach is time consuming for the user and the user experience is poor.

Disclosure of Invention

The embodiment of the invention provides an image processing device and method, which realize the function of processing images by inputting voice, save the time for a user to learn image processing software before image processing and improve the user experience.

In a first aspect, an embodiment of the present invention provides an image processing apparatus, including:

the input/output unit is used for inputting voice signals and images to be processed;

a storage unit for storing the voice signal and the image to be processed;

the image processing unit is used for converting the voice signal into an image processing instruction and a target area, wherein the target area is a processing area of an image to be processed; processing the target area according to the image processing instruction to obtain a processed image, and storing the image to be processed into the storage unit;

The input and output unit is also used for outputting the processed image.

In a possible embodiment, the storage unit includes a neuron storage unit and a weight caching unit, and the neural network operation unit of the image processing unit includes a neural network operation subunit;

when the neuron storage unit is used for storing the voice signal and the image to be processed and the weight caching unit is used for storing a target voice instruction conversion model and a target image processing model, the neural network operation subunit is used for converting the voice signal into the image processing instruction and the target area according to the target voice instruction conversion model;

the neural network operation subunit is further configured to process the target area according to the target image processing model and the image processing instruction, so as to obtain a processed image;

the neural network operation subunit is further configured to store the processed image into the neuron storage unit.

In a possible embodiment, the storage unit includes a general data buffer unit, and the neural network operation unit of the image processing unit includes a general operation subunit;

When the universal data caching unit is used for the voice signal and the image to be processed, the universal operation subunit is used for converting the voice signal into the image processing instruction and the target area;

the general operation subunit is further configured to process the target area according to the image processing instruction, so as to obtain a processed image;

the general operation subunit is further configured to store the processed image into the general data storage unit.

In a possible embodiment, the neural network operator subunit is specifically configured to:

converting the speech signal into text information according to a speech recognition technique;

converting the text information into the image processing instructions according to natural language processing technology and the target voice instruction conversion model;

dividing the region of the image to be processed according to granularity of semantic regions in the image processing instruction and an image recognition technology to obtain the target region

converting the speech signal into the image processing instruction according to a speech recognition technology, a semantic understanding technology and the target speech instruction conversion model;

And dividing the region of the image to be processed according to the granularity of the semantic region in the image processing instruction and the image recognition technology, and obtaining the target region.

In a possible embodiment, the general purpose operation subunit is specifically configured to:

converting the text information into the image processing instructions according to natural language processing technology;

converting the speech signal into the image processing instructions according to a speech recognition technique and a semantic understanding technique;

In a possible embodiment, the neuron storage unit is configured to store the target region and the image processing instructions.

In a possible embodiment, the general data cache unit is configured to store the target area and the image processing instruction.

In a possible embodiment, the neural network operator subunit is configured to:

acquiring M image processing instructions from the neuron storage unit within a preset time window;

deleting image processing instructions with the same functions in the M image processing instructions to obtain N image processing instructions, wherein M is an integer greater than 1, and N is an integer smaller than M;

and processing the target area according to the N image processing instructions and the target image processing model to obtain a processed image.

In a possible embodiment, the general purpose operator subunit is configured to:

acquiring M image processing instructions from the universal data cache unit within a preset time window;

and processing the target area according to the N image processing instructions to obtain a processed image.

In a possible embodiment, the neural network operator subunit is further configured to:

and carrying out self-adaptive training on the voice command conversion model to obtain the target voice command conversion model.

converting the voice signal into a prediction instruction according to the voice instruction conversion model;

determining a correlation coefficient of the predicted instruction and an instruction set corresponding to the predicted instruction;

optimizing the voice command conversion model according to the correlation coefficient of the predicted command and the command set corresponding to the predicted command so as to obtain the target voice command conversion model.

and carrying out self-adaptive training on the image processing model to obtain the target image processing model.

processing the image to be processed according to the image processing model to obtain a predicted image;

determining a correlation coefficient of the predicted image and a target image corresponding to the predicted image;

and optimizing the image processing model according to the correlation coefficient of the predicted image and the corresponding target image so as to obtain the target image processing model.

In a possible embodiment, the image processing unit of the image processing apparatus further comprises:

The instruction cache unit is used for storing instructions to be executed, wherein the instructions comprise neural network operation instructions and general operation instructions;

the instruction processing unit is used for transmitting the neural network operation instruction to the neural network operation subunit and transmitting the general operation instruction to the general operation subunit.

In a second aspect, an embodiment of the present invention provides an image processing method, including:

inputting a voice signal and an image to be processed;

storing the voice signal and the image to be processed;

converting the voice signal into an image processing instruction and a target area, wherein the target area is a processing area of an image to be processed; processing the target area according to the image processing instruction to obtain a processed image, and storing the image to be processed into the storage unit;

and outputting the processed image.

In a possible embodiment, the converting the speech signal into image processing instructions and a target area includes:

converting the text information into the image processing instruction according to a natural language processing technology and a target voice instruction conversion model;

converting the speech signal into the image processing instruction according to a speech recognition technology, a semantic understanding technology and a target speech instruction conversion model;

In a possible embodiment, after the converting the speech signal into the image processing instruction and the target area, the method further comprises:

storing the image processing instruction and the target area.

In a possible embodiment, the processing the target area according to the image processing instruction to obtain a processed image includes:

In a possible embodiment, the method further comprises:

In a possible embodiment, the adaptively training the voice command conversion model to obtain the target voice command conversion model includes:

In a possible embodiment, the method further comprises:

In a possible embodiment, the adaptively training the image processing model includes:

In a third aspect, an embodiment of the present invention further provides an image processing chip, where the chip includes the image processing apparatus of the first aspect of the embodiment of the present invention.

In one possible embodiment, the above-mentioned chips include a master chip and a cooperation chip;

the above-mentioned cooperation chip includes the device according to the first aspect of the embodiment of the present invention, where the above-mentioned main chip is configured to provide a start signal for the above-mentioned cooperation chip, and control transmission of an image to be processed and an image processing instruction to the above-mentioned cooperation chip.

In a fourth aspect, an embodiment of the present invention provides a chip package structure, where the chip package structure includes the image processing chip according to the third aspect of the embodiment of the present invention.

In a fifth aspect, an embodiment of the present invention provides a board, where the board includes the chip package structure according to the fourth aspect of the embodiment of the present invention.

In a sixth aspect, an embodiment of the present invention provides an electronic device, where the electronic device includes the board card according to the fifth aspect of the embodiment of the present invention.

It can be seen that, in the scheme of the embodiment of the invention, the input-output unit inputs the voice signal and the image to be processed; the storage unit stores the voice signal and the image to be processed; the image processing unit converts the voice signal into an image processing instruction and a target area, wherein the target area is a processing area of an image to be processed; processing the target area according to the image processing instruction to obtain a processed image, and storing the image to be processed into the storage unit; the input-output unit outputs the processed image. Compared with the existing image processing technology, the invention performs image processing through voice, saves the time for a user to learn the image processing software before performing image processing, and improves the user experience.

These and other aspects of the invention will be more readily apparent from the following description of the embodiments.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention;

fig. 2 is a schematic partial structure of another image processing apparatus according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

fig. 4 is a flowchart of an image processing method according to an embodiment of the present invention.

Detailed Description

The following will describe in detail.

The terms "first," "second," "third," and "fourth" and the like in the description and in the claims and drawings are used for distinguishing between different objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

Referring to fig. 1, fig. 1 is a schematic structural diagram of an image processing apparatus according to an embodiment of the invention. As shown in fig. 1, the image processing apparatus includes:

an input-output unit 130 for inputting a voice signal and an image to be processed.

Optionally, the image processing apparatus further includes a noise filter, and the input/output unit 130 acquires the voice signal, and then the noise filter performs noise reduction processing on the voice signal.

Alternatively, the input/output unit 130 may obtain other audio capturing devices for a voice sensor, a microphone, and a pickup.

Specifically, the input/output unit 130 acquires the ambient sound signal when acquiring the voice signal. The noise filter performs noise reduction processing on the speech signal based on the environmental sound signal. The ambient sound signal may be regarded as noise of the above-mentioned speech signal.

Further, the input/output unit 130 may include a microphone array, which may be used to collect the voice signal and the ambient sound signal, and perform noise reduction.

A storage unit 120, configured to store the speech signal and the image to be processed.

An image processing unit 110, configured to convert the speech signal into an image processing instruction and a target area, where the target area is a processing area of an image to be processed; and processing the target area according to the image processing instruction to obtain a processed image, and storing the image to be processed in the storage unit.

Alternatively, the storage unit 120 includes a neuron storage unit 121 and a weight caching unit 122, and the neural network operation unit 113 of the image processing unit 110 includes a neural network operation subunit 1131;

when the neuron storage unit 121 is used for storing the voice signal and the image to be processed and the weight caching unit 122 is used for storing a target voice instruction conversion model and a target image processing model, the neural network operation subunit 1131 is used for converting the voice signal into the image processing instruction and the target region according to the target voice instruction conversion model;

The neural network operation subunit 1131 is further configured to process the target area according to the target image processing model and the image processing instruction, so as to obtain a processed image;

the neural network operator unit 1131 is further configured to store the processed image into the neuron storage unit.

Further, the neural network operator unit 1131 is specifically configured to:

Further, the neuron storage unit 121 is used to store the target region and the image processing instruction.

Specifically, the neural network operator subunit 1131 is configured to:

Specifically, when the neuron storage unit 121 of the storage unit 120 stores the voice signal and the image to be processed, and the weight buffer unit 122 stores the target voice instruction conversion model, the neural network operation subunit 1131 converts the voice signal into text information according to a voice recognition technique, converts the text information into an image processing instruction according to a natural voice processing technique and the target voice instruction conversion model, and performs region division on the image to be processed according to granularity of a semantic region in the image processing instruction and an image recognition technique to acquire the target region; or,

The neural network operator 1131 converts the voice signal into an image processing instruction according to a voice recognition technique, a semantic understanding technique, and the target voice instruction conversion model, and performs region division on the image to be processed according to granularity of semantic regions in the image processing instruction and an image recognition technique to obtain the target region.

Further, the neural network operator unit 1131 stores the image processing instruction and the target area in the neuron cache unit 121. The neural network operation subunit 1131 obtains the target voice instruction conversion model from the weight buffer unit 122, obtains M image processing instructions and a target area from the neuron storage unit 121 within a preset time window, and deletes the image processing instructions with the same function in the M image processing instructions to obtain N image processing instructions. The neural network operator 1131 processes the target region according to the N image processing instructions and the target image processing model to obtain a processed image.

Optionally, the storage unit includes a general data buffer unit, and the neural network operation unit of the image processing unit includes a general operation subunit;

Further, the general operation subunit is specifically configured to:

Further, the general data buffer unit is configured to store the target area and the image processing instruction.

Specifically, the general operation subunit is configured to:

Specifically, when the general data buffer unit 123 of the storage unit 120 stores the voice signal and the image to be processed, the general operator unit 1132 converts the voice signal into text information according to a voice recognition technology, converts the text information into an image processing instruction according to a natural language processing technology, and performs region division on the image to be processed according to granularity of a semantic region of the image processing instruction and an image recognition technology to obtain the target region; or,

the general operator unit 1132 converts the voice signal into the image processing instruction according to a voice recognition technology and a semantic understanding technology, and performs region division on the image to be processed according to granularity of semantic regions in the image processing instruction and the image recognition technology, so as to obtain the target region.

Further, the general-purpose operation subunit 1132 stores the image processing instruction and the target area in the general-purpose data cache unit 123. The general operator unit 1132 obtains the target area from the general data buffer unit, obtains M image processing instructions from the general data buffer unit within a preset time window, deletes the image processing instructions with the same function in the M image processing instructions to obtain N image processing instructions, and processes the target area according to the N image processing instructions to obtain a processed image.

Specifically, the above-described preset time window may be understood as a preset time period. After the neural network operator unit 1131 obtains M image processing instructions from the neuron storage unit 121 or the general operator unit 1132 obtains M image processing instructions from the general data buffer unit within a preset period, the neural network operator unit 1131 or the general operator unit 1132 compares the M image processing instructions in pairs, and deletes the instructions with the same functions in the M image processing instructions, thereby obtaining N image processing instructions. The neural network operator unit 1131 or the general operator unit 1132 processes the image to be processed according to the N processing instructions and the target image processing model.

For example, the neural network operator unit 1131 or the general operator unit 1132 performs a pairwise comparison on the M image processing instructions. When the image processing instruction a is identical to the image processing instruction B, the neural network operator unit 1131 or the general operator unit 1132 deletes one of the image processing instructions a and B having the largest overhead; when the image processing instruction a and the image processing instruction B are not identical, the neural network operator unit 1131 or the general operator unit 1132 acquires the similarity coefficient of the image processing instruction a and the image processing instruction B. When the similarity coefficient is greater than a similarity threshold, determining that the image processing instruction a and the image processing instruction B have the same function, and deleting one of the image processing instructions a and B having the largest overhead by the neural network operator unit 1131 or the general operator unit 1132; when the similarity coefficient is smaller than the similarity threshold, the neural network operator unit 1131 or the general operator unit 1132 determines that the functions of the image processing instructions a and B are different. The image processing instructions A and B are any two of the M processing instructions.

The input/output unit 104 is further configured to output the processed image.

The image processing unit processes the image to be processed according to the voice signal, and outputs the processed image through the input and output unit after obtaining the processed image.

For example, if the image processing device determines that the target area is a face area according to a voice signal, the semantic area is a face area in the image to be processed, and the image processing device acquires a plurality of face areas in the image to be processed with a face as granularity; when the target area is a background, the image processing device divides the image to be processed into a background area and a non-background area; when the target area is a red color area, the image processing apparatus divides the image to be processed into areas of different colors according to colors.

Specifically, the speech recognition technology used in the present invention includes, but is not limited to, using models such as artificial neural network (Artificial Neural Network, ANN), hidden markov model (Hidden Markov Model, HMM), etc., and the above-mentioned first speech recognition unit may process the above-mentioned speech signal according to the above-mentioned speech recognition technology; the natural language processing technology includes, but is not limited to, statistical machine learning, ANN and other methods, and the semantic understanding unit may extract semantic information according to the natural language processing technology; the image recognition technology includes, but is not limited to, algorithms such as a method based on edge detection, a threshold segmentation method, a region growing and watershed algorithm, a gray integral projection curve analysis, template matching, a deformable template, hough transformation, a Snake operator, an elastic image matching technology based on Gabor wavelet transformation, an active shape model, an active appearance model and the like, and the image recognition unit can segment the image to be processed into different regions according to the image recognition technology.

Optionally, before the input/output unit 130 acquires the voice signal and the image to be processed, the neural network operator unit 1131 performs adaptive training on the voice command conversion model to obtain the target voice command conversion model.

The adaptive training of the voice command conversion model by the neural network operator 1131 is performed offline or online.

Specifically, the adaptive training of the voice command conversion model is performed offline, specifically, the neural network operator unit 1131 performs adaptive training on the voice command conversion model on the basis of hardware thereof, so as to obtain a target voice command conversion model; the adaptive training of the voice command conversion model is performed online, specifically, a cloud server different from the neural network operation subunit 1131 performs adaptive training on the voice command conversion model, so as to obtain a target voice command conversion model. When the neural network operator unit 1131 needs to use the target voice command conversion model, the neural network operator unit 1131 obtains the target voice command conversion model from the cloud server.

Optionally, the adaptive training of the voice instruction conversion model by the neural network operator unit 1131 is supervised or supervised.

Specifically, the adaptive training of the voice command conversion model is supervised specifically as follows:

the neural network operator 1131 converts the voice signal into a prediction command according to a voice command conversion model; then determining the correlation coefficient of the predicted instruction and the instruction set corresponding to the predicted instruction, wherein the instruction set is a set of instructions obtained by manpower according to the voice signal; the neural network operator 1131 optimizes the voice command conversion model according to the correlation coefficient between the predicted command and the command set corresponding to the predicted command, so as to obtain the target voice command conversion model.

For example, the foregoing adaptive training of the voice command conversion model is supervised specifically including: the neural network operator 1131 obtains a section of speech signal containing related commands, such as changing the color of an image, rotating a picture, and the like. Each command corresponds to a set of instructions. For the input speech signals used for adaptive training, a corresponding instruction set is known, and the neural network operation subunit 1131 uses these speech signals as input data of the speech instruction conversion model to obtain the output predicted instruction. The neural network operator 1131 calculates the correlation coefficient between the predicted command and the command set corresponding to the predicted command, and adaptively updates parameters (such as weights, offsets, etc.) in the voice command conversion model according to the calculated correlation coefficient, so as to improve the performance of the voice command conversion model, thereby obtaining the target voice command conversion model.

Specifically, for the above-described image processing unit 110, both of its input and output are images. The processing of the image 103 may include, but is not limited to, processing the image to be processed by methods including, but not limited to, ANN and conventional computer vision: body beautification (such as leg beautification, breast augmentation), face beautification, object replacement (cat changing dog, zebra changing horse, apple changing orange, etc.), background replacement (the following forest changing field), de-occlusion (such as face covering one eye, re-reconstructing eyes), style conversion (one second changing Sanskyline, face changing side), pose conversion (such as standing sitting, face changing side), non-oil painting changing oil painting, changing the color of the image background, and changing the seasonal background where the object is located in the image.

Optionally, before the neural network operator unit 1131 receives the speech signal, the neural network operator unit 1131 performs adaptive training on the image processing model to obtain the target image processing model.

The adaptive training of the image processing model by the neural network operator 1131 is performed offline or online.

Specifically, the self-adaptive training of the image processing model is performed offline, specifically, the neural network operation subunit 1131 performs the self-adaptive training on the image processing model on the basis of hardware thereof, so as to obtain a target voice instruction conversion model; the adaptive training of the image processing model is performed online, specifically, a cloud server different from the neural network operator unit 1131 performs adaptive training on the image processing model, so as to obtain a target image processing model. When the neural network operator unit 1131 needs to use the target image processing model, the neural network operator unit 1131 obtains the target image processing model from the cloud server.

Optionally, the adaptive training of the image processing model by the neural network operator unit 1131 is supervised or supervised.

Specifically, the neural network operator 1131 performs adaptive training on the image processing model, which is supervised specifically:

the neural network operator 1131 converts the voice signal into a predicted image according to an image processing model; then determining the correlation coefficient of the predicted image and a target image corresponding to the predicted image, wherein the target is an image obtained by manually processing an image to be processed according to a voice signal; the neural network operation subunit 1131 optimizes the image processing model according to the correlation coefficient between the predicted image and the corresponding target image, so as to obtain the target image processing model.

For example, the above-mentioned adaptive training of the image processing model is supervised specifically including: the neural network operator 1131 obtains a section of speech signal containing related commands, such as changing the color of an image, rotating a picture, and the like. Each command corresponds to a target image. For the input speech signals used for the adaptive training, the corresponding target images are known, and the neural network operation subunit 1131 obtains the output predicted images by using these speech signals as input data of the image processing model. The neural network operator 1131 calculates a correlation coefficient between the predicted image and the target image corresponding to the predicted image, and adaptively updates parameters (such as weights, offsets, etc.) in the image processing model according to the calculated correlation coefficient, so as to improve performance of the image processing model, thereby obtaining the target image processing model.

Wherein the image processing unit 110 of the image processing apparatus further includes:

an instruction cache unit 111 for storing instructions to be executed, the instructions including a neural network operation instruction and a general operation instruction;

the instruction processing unit 112 is configured to transmit the neural network operation instruction to the neural network operation subunit, and transmit the general operation instruction to the general operation subunit.

In the image processing unit 113 of the image processing apparatus, the instruction processing unit 112 obtains a neural network operation instruction from the instruction cache unit 111 and transmits the neural network operation instruction to the neural network operation subunit 1131 to drive the neural network operation subunit 1131, during an image processing operation and an adaptive training process for the image processing model and the voice instruction conversion model. During the image processing operation, the instruction processing unit 112 obtains a general operation instruction from the instruction cache unit 111 and transmits the general operation instruction to the general operation subunit 1132, so as to drive the general operation subunit 1132.

In the present embodiment, the above-described image processing apparatus is presented in the form of a unit. "unit" herein may refer to an application-specific integrated circuit (ASIC), a processor and memory executing one or more software or firmware programs, an integrated logic circuit, and/or other devices that can provide the above described functionality.

Referring to fig. 2, fig. 2 is a schematic structural diagram of another image processing apparatus according to an embodiment of the present invention. As shown in fig. 2, the chip includes:

an image processing unit 210, a storage unit 220, and an input-output unit 230.

Wherein the image processing unit 210 includes:

the instruction cache unit 211 is configured to store instructions to be executed, where the instructions include a neural network operation instruction and a general operation instruction.

In one embodiment, the instruction cache unit 211 may be a reorder cache.

The instruction processing unit 212 is configured to obtain a neural network operation instruction or a general operation instruction from the instruction cache unit, and process the instruction and provide the instruction to the neural network operation unit 213. Wherein the instruction processing unit 212 includes:

a fetch module 214 for fetching instructions from the instruction cache unit;

a decoding module 215, configured to decode the acquired instruction;

the instruction queue module 216 is configured to store the decoded instructions sequentially.

The scalar register module 217 is configured to store operation codes and operands corresponding to the above-mentioned instructions, including a neural network operation code and operand corresponding to a neural network operation instruction, and a general operation code and operand corresponding to a general operation instruction.

A processing dependency relationship module 218, configured to determine the instruction and the operation code and operand corresponding to the instruction sent by the instruction processing unit 212, determine whether the instruction and the previous instruction access the same data, if yes, store the instruction in a storage queue unit 219, and provide the instruction in the storage queue unit to the neural network operation unit 213 after the previous instruction is executed; otherwise, the instruction is directly supplied to the above-described neural network operation unit 213.

Store queue unit 219 is used to store two consecutive instructions accessing the same memory space when the instructions access the memory unit.

Specifically, in order to ensure the correctness of the execution result of the two consecutive instructions, if the current instruction is detected to have a dependency relationship with the data of the previous instruction, the two consecutive instructions must wait in the store queue unit 219 until the dependency relationship is eliminated, so that the two consecutive instructions may be provided to the neural network operation unit.

The neural network operation unit 213 is configured to process the instruction transmitted from the instruction processing module or the storage queue unit.

The storage unit 220 includes a neuron caching unit 521 and a weight caching unit 522, and the neural network data model is stored in the neuron caching unit 221 and the weight caching unit 222 described above.

The input-output unit 230 is used for inputting voice signals and outputting image processing instructions.

In one embodiment, the storage unit 220 may be a scratch pad memory and the input output unit 230 may be an IO direct memory access module.

Specifically, the neural network operation subunit of the image processing apparatus converts a speech signal into an image processing instruction specifically includes:

In step a, the instruction fetching module 214 fetches a neural network operation instruction for speech recognition from the instruction buffer unit 211, and sends the operation instruction to the decoding module 215.

Step B, the decoding module 215 decodes the operation instruction and sends the decoded instruction to the instruction queue unit 216.

And step C, acquiring a neural network operation code and a neural network operation operand corresponding to the instruction from the scalar register module 217.

Step D, the instruction is sent to the process dependency module 218; the processing dependency relationship module 218 judges the operation code and operand corresponding to the instruction, judges whether the instruction and the instruction which has not been executed before have a dependency relationship on data, and if not, directly sends the instruction to the neural network operation unit 213; if so, the instruction needs to wait in the store queue unit 219 until it no longer has a dependency on the data of the instruction that has not been executed before, and then send the instruction to the neural network arithmetic unit 213

In step E, the neural network operator unit 2131 determines the address and size of the required data according to the operation code and operand corresponding to the instruction, and fetches the required data from the storage unit 220, including voice instruction conversion model data and the like.

Step F, the neural network operation subunit 2131 performs the neural network operation corresponding to the instruction, completes the corresponding processing, obtains the image processing instruction, and writes the image processing instruction back to the neuron storage unit 221 of the storage unit 220.

Specifically, the general operation subunit of the image processing apparatus converts a speech signal into an image processing instruction specifically includes:

step a', the instruction fetching module 214 fetches a general operation instruction for speech recognition from the instruction buffer unit 211 and sends the operation instruction to the decoding module 215.

Step B', decode module 215 decodes the operation instruction and sends the decoded instruction to instruction queue unit 216.

Step C', the general operation code and general operation operand corresponding to the instruction are obtained from the scalar register module 217.

Step D', the instruction is directed to the process dependency module 218; the processing dependency relationship module 218 judges the operation code and operand corresponding to the instruction, judges whether the instruction and the instruction which has not been executed before have a dependency relationship on data, and if not, directly sends the instruction to the neural network operation unit 213; if so, the instruction needs to wait in the store queue unit 219 until it no longer has a dependency on the data of the instruction that has not been executed before, and then send the instruction to the neural network arithmetic unit 213

In step E', the general operator unit 2132 determines the address and size of the required data according to the operation code and operand corresponding to the instruction, and fetches the required data from the storage unit 220, including voice instruction conversion model data and the like.

Step F', the general operator unit 2132 performs general operations corresponding to the instructions, completes corresponding processing, obtains image processing instructions, and writes the image processing instructions back to the general data buffer unit 223 of the storage unit 220.

It should be noted that, in the image processing process, the specific operation processes of the neural network operator unit 2131 and the general operator unit 2132 of the neural network operator unit 213, the neuron storage unit 221, the weight buffer unit 222 and the general data buffer unit 223 of the storage unit 220, and the input/output unit 230 may be referred to the related description of the embodiment shown in fig. 1, and will not be described herein.

It should be noted that the storage unit 220 is an on-chip cache unit of the image processing apparatus shown in fig. 2.

Alternatively, the image processing device may be a data processing device, a robot, a computer, a tablet computer, an intelligent terminal, a mobile phone, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage device, or a wearable device.

In a possible embodiment, an image processing chip comprises the image processing device shown in fig. 1 described above.

The chip comprises a main chip and a cooperation chip;

Alternatively, the image processing chip may be used in a camera, a mobile phone, a computer, a notebook, a tablet computer or other image processing devices.

In one possible embodiment, the embodiment of the invention provides a chip packaging structure, which comprises the image processing chip.

In one possible embodiment, the embodiment of the invention provides a board card, which comprises the chip packaging structure.

In one possible embodiment, the embodiment of the invention provides an electronic device, which comprises the board card.

In one possible embodiment, the embodiment of the invention provides another electronic device, which comprises the board card, an interactive interface, a control unit and a voice collector.

As shown in fig. 3, the voice collector is configured to receive voice and transmit the voice and the image to be processed as input data to an image processing chip inside the board card.

Alternatively, the image processing chip may be an artificial neural network processing chip.

Preferably, the voice collector is a microphone or a multi-array microphone.

Wherein the chip inside the card includes the same embodiments as those shown in fig. 1 and 2, and is used to obtain corresponding output data (i.e. processed image) and transmit the output data to the interactive interface.

The interactive interface receives the output data of the chip (which can be regarded as an artificial neural network processor) and converts the output data into feedback information in a proper form to be displayed to a user.

Wherein the image processing unit receives a user's operation or command and controls the operation of the entire image processing apparatus.

Optionally, the electronic device may be a data processing apparatus, a robot, a computer, a tablet computer, an intelligent terminal, a mobile phone, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage device, or a wearable device.

Referring to fig. 4, fig. 4 is a flowchart of an image processing method according to an embodiment of the present invention. As shown in fig. 4, the method includes:

S401, the image processing device inputs a voice signal and an image to be processed.

S402, the image processing device stores the voice signal and the image to be processed

S403, the image processing device converts the voice signal into an image processing instruction and a target area, wherein the target area is a processing area of an image to be processed; and processing the target area according to the image processing instruction to obtain a processed image, and storing the image to be processed in the storage unit.

In a possible embodiment, the converting the speech signal into the image processing command and the target area according to the target speech command conversion model includes:

Converting the voice signal into the image processing instruction through a voice recognition technology, a semantic understanding technology and the voice instruction conversion model;

storing the image processing instruction and the target area.

S404, the image processing device outputs the processed image.

In a possible embodiment, the method further comprises:

In a possible embodiment, the adaptive training of the speech instruction conversion model is performed offline or online.

In a possible embodiment, the adaptive training of the speech instruction conversion model is supervised or unsupervised.

In a possible embodiment, the method further comprises:

In a possible embodiment, the adaptive training of the image processing model is performed off-line or on-line.

In a possible embodiment, the adaptive training of the image processing model is supervised or unsupervised.

Note that, the specific implementation of each step of the method shown in fig. 4 may refer to the specific implementation of the image processing apparatus, which is not described herein.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, such as the division of the units, merely a logical function division, and there may be additional manners of dividing the actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units described above may be implemented in hardware.

The foregoing has outlined rather broadly the more detailed description of embodiments of the invention, wherein the principles and embodiments of the invention are explained in detail using specific examples, the above examples being provided solely to facilitate the understanding of the method and core concepts of the invention; meanwhile, as those skilled in the art will appreciate, modifications will be made in the specific embodiments and application scope in accordance with the idea of the present invention, and the present disclosure should not be construed as limiting the present invention.

Claims

1. An image processing apparatus, comprising:

a storage unit for storing the voice signal and the image to be processed;

the neural network operator unit or the general operator unit in the image processing unit is used for dividing the area of the image to be processed according to the granularity of the semantic area in the image processing instruction and the image recognition technology to obtain the target area;

the input and output unit is also used for outputting the processed image.

2. The image processing apparatus according to claim 1, wherein the storage unit includes a neuron storage unit and a weight caching unit, and the neural network operation unit of the image processing unit includes a neural network operation subunit;

3. The image processing apparatus according to claim 1 or 2, wherein the storage unit includes a general data buffer unit, and the neural network operation unit of the image processing unit includes a general operation subunit;

4. The image processing apparatus according to claim 2, wherein the neural network operation subunit is specifically configured to:

the text information is converted into the image processing instructions according to natural language processing techniques and the target voice instruction conversion model.

5. The image processing apparatus according to claim 2, wherein the neural network operation subunit is specifically configured to:

the speech signal is converted into the image processing instructions according to speech recognition techniques, semantic understanding techniques, and the target speech instruction conversion model.

6. An image processing apparatus according to claim 3, wherein the general-purpose operator subunit is specifically configured to:

the text information is converted into the image processing instructions according to natural language processing techniques.

7. An image processing apparatus according to claim 3, wherein the general-purpose operator subunit is specifically configured to:

the speech signal is converted into the image processing instructions according to speech recognition techniques and semantic understanding techniques.

8. The image processing apparatus according to any one of claims 2, 4, or 5, wherein the neuron storage unit is configured to store the target region and the image processing instruction.

9. The image processing apparatus according to any one of claims 6 or 7, wherein the general data buffer unit is configured to store the target area and the image processing instruction.

10. The image processing apparatus of claim 8, wherein the neural network operator subunit is configured to:

11. The image processing apparatus according to claim 9, wherein the general operation subunit is configured to:

12. The image processing apparatus according to any one of claims 2, 4 or 5, wherein the neural network operator unit is further configured to:

13. The image processing apparatus of claim 12, wherein the neural network operator subunit is further configured to:

14. The image processing apparatus according to claim 2 or 10, wherein the neural network operator subunit is further configured to:

15. The image processing apparatus of claim 14, wherein the neural network operator subunit is further configured to:

16. The image processing apparatus according to claim 6, 7, 11, 13, or 15, wherein the image processing unit of the image processing apparatus further comprises:

17. An image processing method, comprising

Inputting a voice signal and an image to be processed;

storing the voice signal and the image to be processed;

converting the voice signal into an image processing instruction and a target area, wherein the target area is a processing area of an image to be processed; processing the target area according to the image processing instruction to obtain a processed image, and storing the image to be processed into the storage unit; the target area is obtained by dividing the area of the image to be processed according to granularity of semantic areas in the image processing instruction and an image recognition technology;

and outputting the processed image.

18. The method of claim 17, wherein said converting said speech signal into image processing instructions and a target area comprises:

19. The method of claim 17, wherein said converting said speech signal into image processing instructions and a target area comprises:

20. The method of claim 17, wherein said converting said speech signal into image processing instructions and a target area comprises:

21. The method of claim 17, wherein said converting said speech signal into image processing instructions and a target area comprises:

22. The method according to any one of claims 17-21, wherein after said converting the speech signal into image processing instructions and target areas, the method further comprises:

storing the image processing instruction and the target area.

23. The method of claim 22, wherein processing the target region according to the image processing instructions to obtain a processed image comprises:

acquiring M image processing instructions from a neuron storage unit in a preset time window;

24. The method of claim 22, wherein processing the target region according to the image processing instructions to obtain a processed image comprises:

Acquiring M image processing instructions from a universal data cache unit in a preset time window;

25. The method according to claim 18 or 19, characterized in that the method further comprises:

26. The method of claim 25, wherein adaptively training the speech instruction conversion model to obtain the target speech instruction conversion model comprises:

27. The method of claim 23, wherein the method further comprises:

28. The method of claim 27, wherein the adaptively training the image processing model comprises: