WO2019076120A1

WO2019076120A1 - Image processing method, device, storage medium and electronic device

Info

Publication number: WO2019076120A1
Application number: PCT/CN2018/100212
Authority: WO
Inventors: 邓童虎
Original assignee: 格力电器（武汉）有限公司; 珠海格力电器股份有限公司
Priority date: 2017-10-19
Filing date: 2018-08-13
Publication date: 2019-04-25
Also published as: CN107886947A

Abstract

The present invention relates to the technical field of image processing, in particular, to an image processing method, a device, a storage medium and an electronic device. Said method comprises: receiving voice information (101); recognizing the voice information, so as to obtain an image processing command (102); and performing image processing on the target image according to the image processing command, so as to obtain the processed target image (103). Therefore, a user does not need to manually operate a mobile terminal to process an image, and an image processing function can be implemented just by receiving voice information from the user. Compared with the prior art, this process is simpler, saving the time of the user, improving the operation efficiency.

Description

Method, device, storage medium and electronic device for image processing

Technical field

The embodiments of the present invention relate to the field of image processing technologies, and in particular, to a method, an apparatus, a storage medium, and an electronic device for image processing.

Background technique

With the development of science and technology, the functions of smart devices such as mobile terminals are increasingly enriched and improved, including intelligent image processing functions. In the prior art, the process of image processing by a user using a smart device such as a mobile terminal is generally The user obtains the image to be processed by using the mobile terminal, and the user performs manual operation on the mobile terminal, processes the image, and processes and acquires the desired image.

Therefore, in the prior art, the process of the user performing image processing using the smart device such as the mobile terminal is cumbersome, and the user must manually process the image on the smart device such as the mobile terminal, which is inconvenient for the user, and therefore provides a A simple, hands-free image processing method is especially necessary.

Summary of the invention

The technical problem to be solved by the embodiments of the present application is to provide a simple, non-manually operated image processing method, apparatus, storage medium and electronic device.

In a first aspect, in order to solve the above technical problem, a technical solution adopted by the embodiment of the present application is to provide a method for image processing, which is applied to a terminal device, including: receiving voice information; and identifying the voice information to obtain an image. Processing the command; performing image processing on the target image according to the image processing command to obtain the processed target image.

Optionally, the step of identifying the voice information to obtain an image processing command comprises: converting the voice information into text information; extracting a processing object keyword and a processing mode keyword from the text information; The processing target keyword and the processing mode keyword are combined into an image processing command.

Optionally, the step of identifying the voice information to obtain an image processing command comprises: extracting, according to the voice information and a voice library pre-set with a keyword voice, a pre-set key in the voice information a word in the speech database of the same pronunciation, wherein the speech library pre-set with the keyword speech includes a preset processing object keyword speech and a processing mode keyword speech; according to the extracted pronunciation is the same a word, a processing target keyword and a processing mode keyword are obtained; and the processing target keyword and the processing mode keyword are combined into an image processing command.

Optionally, the step of performing image processing on the target image according to the image processing command includes: identifying, according to the processing target keyword, a processing object from the target image; according to the processing mode keyword, Processing is performed on the processing object.

Optionally, after the step of receiving the voice information, the method further includes: determining whether the voice information includes only one voice; if the voice information includes only one voice, extracting the voice information a voice word of the first N bits; determining whether the voice word contains a sound of a preset command word; if yes, entering the step of identifying the voice information to obtain the image processing command.

Optionally, the method further includes: if the voice information includes multiple voices, extracting the first N voice words of each voice; and acquiring the voice words to include a preset command word voice; The voice information is identified, and the image processing command is obtained by: identifying the obtained sound to obtain the image processing command.

In a second aspect, in order to solve the above technical problem, another technical solution adopted by the embodiment of the present application is to provide an apparatus for image processing, which is applied to a terminal device, including: a voice receiving module, configured to receive voice information; and a command acquiring module. And the image processing module is configured to perform image processing on the target image according to the image processing command to obtain the processed target image.

Optionally, the command obtaining module includes: a text acquiring unit configured to convert the voice information into text information; and a text extracting unit configured to extract a processing object keyword and a processing mode keyword from the text information And a command forming unit configured to compose the processing target keyword and the processing mode keyword into an image processing command.

Optionally, the command obtaining module includes: a word obtaining unit, configured to extract, according to the voice information and a voice library pre-set with keyword voice, the voice information and the preset keyword voice The words in the speech library have the same pronunciation, wherein the speech library pre-set with the keyword speech includes the preset processing object keyword speech and the processing mode keyword speech; the word extraction unit is set according to the extracted The words having the same pronunciation obtain the processing target keyword and the processing mode keyword; and the command generating unit is configured to compose the processing target keyword and the processing mode keyword into image processing commands.

Optionally, the image processing module includes: an object recognition unit, configured to: identify a processing object from the target image according to the processing object keyword; and execute a processing unit, configured to be according to the processing mode keyword, Processing is performed on the processing object.

Optionally, the sound determining module is configured to determine whether the voice information includes only one voice; the first extracting module is configured to: if the voice information includes only one voice, extract the first N bits of the voice information a speech word; a speech word judging module, configured to determine whether the speech word includes a sound of a preset command word; and if so, enter the step of recognizing the speech information to obtain the image processing command.

Optionally, the device further includes: a second extraction module, configured to: if the voice information includes multiple sounds, extract a voice word of the first N bits of each voice; and the sound screening module is configured to acquire the voice word included The sound having the preset command word; the identifying the voice information, and obtaining the image processing command specifically: identifying the sound obtained by the sound screening module, and obtaining the image processing command.

In a third aspect, an embodiment of the present application provides a storage medium, where the computer program is stored, and the computer program is configured to execute the method in the embodiment of the present application.

In a fourth aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the computer stores a computer program, where the processor is configured to execute the computer program by using the computer program. Methods.

The beneficial effects of the solution provided in the embodiment of the present application are: different from the prior art solution, in the embodiment of the present application, the step of the image processing method includes: receiving voice information; identifying the voice information, and obtaining an image Processing the command; performing image editing processing on the target image according to the image processing command to obtain the edited target image. Therefore, in the embodiment of the present application, the user does not need to manually operate the mobile terminal to process the image, but the function of the image processing can be realized only by receiving the voice information of the user. Compared with the prior art, the embodiment of the present application is adopted. The solution is simpler, saves user time and improves operational efficiency.

DRAWINGS

The one or more embodiments are exemplified by the accompanying drawings in the accompanying drawings. The figures in the drawings do not constitute a scale limitation unless otherwise stated.

1 is a schematic flowchart of a method for image processing according to Embodiment 1 of the present application;

2 is a schematic flowchart of a method for recognizing voice information and obtaining an image processing command in image processing according to Embodiment 1 of the present application;

3 is another schematic flowchart of a method for recognizing voice information and obtaining an image processing command in image processing according to Embodiment 1 of the present application;

4 is a schematic flowchart of a method for performing image processing on a target image according to an image processing command to obtain a processed target image according to an image processing method according to Embodiment 1 of the present application;

FIG. 5 is a schematic flowchart diagram of a method for image processing according to Embodiment 2 of the present application; FIG.

6 is a schematic structural diagram of an apparatus for image processing according to Embodiment 3 of the present application;

7 is a schematic structural diagram of an apparatus for image processing according to Embodiment 4 of the present application;

FIG. 8 is a schematic diagram of a hardware structure of an electronic device that performs image processing according to an embodiment of the present application.

Detailed ways

In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting.

Embodiment 1

Referring to FIG. 1 to FIG. 4, FIG. 1 is a method for image processing according to Embodiment 1 of the present application, which is applied to a terminal device, and includes:

Step 101: Receive voice information.

When the user turns on the image processing function of the mobile terminal, the mobile terminal collects the voice information of the user in real time, and the voice information is the voice sent by the user in real time.

Step 102: Identify voice information to obtain an image processing command.

Optionally, the step of identifying the voice information includes:

Step 1021: Convert the received voice information into text information.

The text information is consistent with the voice information, and the text information is convenient for the mobile terminal to recognize and extract. The text information includes a processing target keyword and a processing mode keyword, and the processing target keyword is a name of an object to be processed in the to-be-processed image, for example, the processing target keyword includes “person”, “app”, and “house”. The processing mode keyword is the way the user wants to process the object to be processed in the picture. For example, the processing mode keywords include “cropping”, “mosaic mosaic”, “beauty”, “highlight” and “thin face”.

Step 1022: Extract a processing target keyword and a processing mode keyword from the text information.

Step 1023: The processing object keyword and the processing mode keyword are combined into an image processing command. For example, when the received voice information is converted into text information, the content obtained is “beauty processing of the person in the image”, wherein The processing object keyword is "person", and the processing mode keyword is "beauty", and the acquired image processing command is "beauty to the person in the picture".

Of course, in the first embodiment of the present application, the voice information can be identified by other means, and an image processing command is obtained. For example, referring to FIG. 3, the following

steps

1021a, 1022a, and 1023a are performed:

Step 1021a: extract, according to the voice information and the voice library pre-set with the keyword voice, the words in the voice information that are the same as the voice library in which the keyword voice is pre-set, wherein the voice library is pre-set with the keyword voice. The preset processing target keyword speech and the processing mode keyword speech are included; for example, the speech library pre-set with the keyword speech includes pre-set processing target keywords such as “person”, “female” and “male”. Voice, and pre-set processing key words such as "cropping", "mosaic", "beauty" and "highlight".

Step 1022a: Obtain a processing target keyword and a processing mode keyword according to the extracted words having the same pronunciation;

Optionally, for example, if the preset processing target keyword voice includes “female”, the preset processing mode keyword voice includes “beauty”, and the extracted words with the same pronunciation are “female”. And "beauty", "female" as the target keyword, "beauty" as the processing method keyword.

Step 1023a: The processing target keyword and the processing mode keyword are combined into an image processing command.

Optionally, for example, if the acquired processing target keyword is “female” and the acquired processing mode keyword is “beauty”, the acquired image processing command is “beauty to the female in the picture” . Step 103: Perform image processing on the target image according to the image processing command to obtain the processed target image.

Optionally, step 103 includes:

Step 1031: Identify, according to the processing target keyword and the processing mode keyword in the image processing command acquired in step 102, an image corresponding to the processing target keyword in the image by using an image recognition technology;

Step 1032: Perform processing on the processing object according to the manner corresponding to the processing mode keyword.

The processing is performed on the processing object in the image to be processed according to the manner corresponding to the processing mode keyword, and a processed new image is generated.

In the embodiment of the present application, the image processing method includes: receiving voice information; identifying the voice information to obtain an image processing command; performing image processing on the target image according to the image processing command, and obtaining the processed image Target image. Therefore, in the embodiment of the present application, the mobile terminal does not need to receive the manual operation of the user to process the image, but only realizes the function of image processing by receiving the voice information of the user, and the process is simpler than the prior art. It saves user time and improves operational efficiency.

Embodiment 2

Referring to FIG. 5, FIG. 5 is a schematic diagram of an image processing method according to Embodiment 2 of the present application, which is applied to a terminal device, and includes:

Step 201: Receive voice information.

Step 202: Determine whether the voice information includes only one voice;

Optionally, the existing voice recognition technology is used to determine whether the voice information includes only one voice through voice features such as timbre and audio.

Step 203: If the voice information includes only one voice, extract the voice words of the N bits before the voice information;

Optionally, when it is determined according to step 202 that the confirmed voice information includes only one voice, the voice words of the first N digits of the voice information are extracted, optionally, N is 3, 5, or 7, etc.; for example, when N is 5, and the received voice information is "the processing command is to make a beauty for the woman in the picture", then the voice word of the first 5 digits of the extracted voice information is "processing command is".

Step 204: Determine whether the phonetic word includes a preset command word;

The preset command word is a preset command word, for example, “processing command is” or “command is”, etc., and a specific example, when the voice word obtained according to step 203 is “processing command is”, and the preset command is When the word is also "process command is", it is determined that the phonetic word contains a preset command word. When it is determined that the phonetic word contains the preset command word, the process proceeds to step 205, otherwise, the process proceeds to step 207.

Step 205: Identify voice information to obtain an image processing command.

It should be noted that step 205 and step 102 of the embodiment of the present application are based on the same inventive concept, and the specific content of step 205 may refer to step 102, and details are not described herein.

Step 206: Perform image processing on the target image according to the image processing command, to obtain the processed target image.

Step 207: If the voice information includes multiple voices, extract the voice words of the first N bits of each voice;

When it is determined that the voice information contains a plurality of sounds after the step 202 is performed, the voice words of the first N bits of each sound are extracted and recorded.

Step 208: Acquire a voice that contains a preset command word;

The speech words in the respective speech information in the obtaining step 207 contain the sound of the preset command word. Further, in the sound in which the obtained phonetic word includes the preset command word, the sound with the highest volume is selected, and step 209 is performed on the sound.

Step 209: Identify the obtained sound to obtain the image processing command.

It should be noted that step 209 and step 102 of the embodiment of the present application are based on the same inventive concept, and the specific content of step 209 may refer to step 102, and details are not described herein.

After step 209 is performed, step 206 is performed.

In the embodiment of the present application, the step of the image processing method includes: receiving voice information; determining whether the voice information includes only one voice, and if so, extracting the N-bit voiceword before the voice information and determining whether the voiceword includes a preset command word And if yes, identifying the voice information, obtaining an image processing command, and performing image editing processing on the target image according to the image processing command to obtain the processed target image; and determining that the voice information includes multiple For the sound, the N-bit speech words of each sound are extracted, the speech words include the sounds of the preset command words, the acquired sounds are recognized, the image processing commands are obtained, and the target image is subjected to image processing, and processed. The target image.

Therefore, in the embodiment of the present application, the mobile terminal does not need to receive the manual operation of the user to process the image, but only realizes the function of image processing by receiving the voice information of the user, and the process is simpler than the prior art. It saves user time and improves operational efficiency. Further, when there are a plurality of acquired sounds, the first N-bit speech words of the respective sounds are extracted for each sound, image processing is performed separately, or image processing is performed according to the sound having the highest volume.

Embodiment 3

Referring to FIG. 6, FIG. 6 is a device 50 for image processing according to Embodiment 3 of the present application, which is applied to a terminal device, including: a voice receiving module 51, a command acquiring module 52, and an image processing module 53;

The voice receiving module 51 is configured to receive voice information.

The command obtaining module 52 is configured to identify the voice information to obtain an image processing command;

The image processing module 53 is arranged to perform image processing on the target image in accordance with the image processing command to obtain the processed target image.

Optionally, the command obtaining module 52 includes: a text obtaining unit 521, a text extracting unit 522, and a command forming unit 523;

The text obtaining unit 521 is configured to convert the voice information into text information;

The text extracting unit 522 is configured to extract a processing target keyword and a processing mode keyword from the text information;

The command forming unit 523 is configured to compose the processing target keyword and the processing mode keyword into image processing commands.

Optionally, the image processing module 53 includes: an object recognition unit 531 and an execution processing unit 532;

The object recognition unit 531 is configured to identify the processing object from the target image according to the processing target keyword;

The execution processing unit 532 is configured to perform processing on the processing target according to the processing mode keyword.

In the embodiment of the present application, the image processing method apparatus includes: a voice receiving module 51, a command acquiring module 52, and an image processing module 53; respectively performing: receiving voice information; identifying the voice information to obtain an image processing command; The image processing command performs image processing on the target image to obtain the processed target image. Therefore, in the embodiment of the present application, the mobile terminal does not need to receive the manual operation of the user to process the image, but only realizes the function of image processing by receiving the voice information of the user, and the process is simpler than the prior art. It saves user time and improves operational efficiency.

Embodiment 4

Referring to FIG. 7, FIG. 7 is a device 50 for image processing according to Embodiment 4 of the present application, which is applied to a terminal device, and includes: a voice receiving module 51, a command acquiring module 52, and an image processing module 53;

The voice receiving module 51 is configured to receive voice information.

The image processing module 53 is configured to perform image processing on the target image according to the image processing command to obtain the processed target image.

Optionally, the command obtaining module 52 includes: a word obtaining unit (not shown), a word extracting unit (not shown), and a command generating unit (not shown);

a word obtaining unit, configured to extract, according to the voice information and the voice library pre-set with the keyword voice, words in the voice information that are the same as the voice library in which the keyword voice is pre-set, wherein the keyword voice is pre-set The voice library contains preset processing target keyword speech and processing mode keyword speech;

a word extracting unit configured to obtain a processing target keyword and a processing mode keyword according to the extracted words having the same pronunciation;

A command generation unit is configured to compose an image processing command by processing the object keyword and the processing mode keyword.

Optionally, the device 50 further includes: a sound determining module 54 configured to determine whether the voice information includes only one voice;

The first extraction module 55 is configured to extract a voice word of the first N bits of the voice information if only one voice is included in the voice information;

The phonetic word judging module 56 is configured to determine whether the phonetic word contains a sound of a preset command word; if so, enter the step of recognizing the voice information to obtain the image processing command.

Optionally, the device 50 further includes:

The second extraction module 57 is configured to extract a phonetic word of the first N bits of each voice if the voice information includes multiple voices;

The sound screening module 58 is configured to acquire a sound in which the voice word includes a preset command word;

The identifying the voice information, and obtaining the image processing command is specifically:

Identifying the sound obtained by the sound screening module to obtain the image processing command.

In the embodiment of the present application, the image processing method apparatus includes: a voice receiving module 51, a command acquiring module 52, and an image processing module 53; respectively performing: receiving voice information; identifying the voice information to obtain an image processing command; The image processing command performs image processing on the target image to obtain the processed target image. Therefore, in the embodiment of the present application, the mobile terminal does not need to receive the manual operation of the user to process the image, but only realizes the function of image processing by receiving the voice information of the user, and the process is simpler than the prior art. It saves user time and improves operational efficiency. Further, when there are a plurality of acquired sounds, the first N-bit speech words of the respective sounds are extracted for each sound, image processing is performed separately, or image processing is performed according to the sound having the highest volume.

Please refer to FIG. 8. FIG. 8 is a schematic diagram showing the hardware structure of an electronic device for performing image processing according to an embodiment of the present disclosure. As shown in FIG. 8, the electronic device 70 includes:

One or more processors 71 and a memory 72 are exemplified by a processor 71 in FIG.

The processor 71 and the memory 72 may be connected by a bus or other means, as exemplified by a bus connection in FIG.

The memory 72 is used as a non-volatile computer readable storage medium, and can be used for storing non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions corresponding to image processing in the embodiments of the present application. A module (for example, the voice receiving module 51, the command acquiring module 52, and the image processing module 53 shown in FIG. 6). The processor 71 executes various functional applications of the server and data processing by executing non-volatile software programs, instructions, and modules stored in the memory 72, that is, image processing of the above-described method embodiments.

The memory 72 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to use of the item recommendation device, and the like. Moreover, memory 72 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 72 can optionally include memory remotely located relative to processor 71, which can be connected to the merchandise recommendation device over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The one or more modules are stored in the memory 72, and when executed by the one or more processors 71, perform image processing in any of the above method embodiments, for example, performing the above described FIG. Method step 101 to step 103, method step 1021 to step 1023 in FIG. 2, method step 1021a to step 1023a in FIG. 3, method step 1031 to step 1032 in FIG. 4, method step 201 to step 209 in FIG. The functions of modules 51-53, units 521-523, 531-532, modules 51-58, 521-523, and units 531-532 in FIG. 6 are implemented.

The above products can perform the methods provided by the embodiments of the present application, and have the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiments of the present application.

The electronic device of the embodiment of the present application exists in various forms, including but not limited to: a server: a device that provides a computing service, and the server is configured to include a processor, a hard disk, a memory, a system bus, etc., and the server is similar to a general computer architecture, but Due to the need to provide highly reliable services, it is highly demanded in terms of processing power, stability, reliability, security, scalability, and manageability. Or other electronic devices with data interaction capabilities.

The embodiment of the present application provides a non-transitory computer readable storage medium storing computer-executable instructions that are executed by an electronic device to perform any of the above method embodiments. Image processing in, for example, performing the method steps 101 to 103 in FIG. 1 described above, the method steps 1021 to 1023 in FIG. 2, the method steps 1021a to 1023a in FIG. 3, and the method steps in FIG. 1031 to step 1032, the method steps 201 to 209 in FIG. 5, the modules 51-53, the units 521-523, the units 531-532, the modules 51-58 in FIG. 7, the units 521-523 in FIG. The function of units 531-532.

An embodiment of the present application provides a computer program product, including a computing program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer, The computer performs image processing in any of the above method embodiments, for example, performing the method steps 101 to 103 in FIG. 1 described above, the method steps 1021 to 1023 in FIG. 2, and the method steps 1021a to 1023a in FIG. Method step 1031 to step 1032 in FIG. 4, method step 201 to step 209 in FIG. 5, implement module 51-53, unit 521-523, unit 531-532, and module 51- in FIG. 58, units 521-523, functions of units 531-532.

The device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie may be located A place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

Through the description of the above embodiments, those skilled in the art can clearly understand that the various embodiments can be implemented by means of software plus a general hardware platform, and of course, by hardware. A person skilled in the art can understand that all or part of the process of implementing the above embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a computer readable storage medium. When executed, the flow of an embodiment of the methods as described above may be included. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

The above description is only the embodiment of the present application, and thus does not limit the scope of the patent application, and the equivalent structure or equivalent process transformation of the specification and the drawings of the present application, or directly or indirectly applied to other related technologies. The fields are all included in the scope of patent protection of this application.

Industrial applicability

As described above, an image processing method, apparatus, storage medium, and electronic device provided by an embodiment of the present invention have the following beneficial effects: a user does not need to manually process a mobile terminal to process an image, but merely receives a user's voice information. The image processing function can be realized, and the process is simpler than the prior art, saving user time and improving operation efficiency.

Claims

An image processing method is applied to a terminal device, including:

Receiving voice information;

Identifying the voice information to obtain an image processing command;

And performing image processing on the target image according to the image processing command to obtain the processed target image.
The method of claim 1, wherein the step of identifying the voice information to obtain an image processing command comprises:

Converting the voice information into text information;

Extracting a processing target keyword and a processing mode keyword from the text information;

The processing target keyword and the processing mode keyword are combined into an image processing command.
The method of claim 1, wherein the step of identifying the voice information to obtain an image processing command comprises:

And extracting, from the voice information and the voice library pre-set with the keyword voice, a word in the voice information that is the same as the pronunciation in the voice library pre-set with the keyword voice, wherein the pre-set keyword The voice speech library includes preset processing target keyword speech and processing mode keyword speech;

Obtaining a processing target keyword and a processing mode keyword according to the extracted words having the same pronunciation;

The processing target keyword and the processing mode keyword are combined into an image processing command.
The method according to claim 2 or 3, wherein the step of performing image processing on the target image according to the image processing command comprises:

Identifying a processing object from the target image according to the processing target keyword;

Processing is performed on the processing object according to the processing mode keyword.
The method of claim 1 wherein

After the step of receiving voice information, the method further includes:

Determining whether the voice information contains only one voice;

If the voice information includes only one voice, extract the voice words of the first N digits of the voice information;

Determining whether the phonetic word contains a preset command word;

If yes, proceed to the step of identifying the voice information to obtain the image processing command.
The method of claim 5, wherein

The method further includes:

If the voice information includes multiple voices, extract the first N voice words of each voice;

Obtaining the voice word includes a sound of a preset command word;

The identifying the voice information, and obtaining the image processing command is specifically:

The acquired sound is identified to obtain the image processing command.
An image processing device is applied to a terminal device, including:

a voice receiving module, configured to receive voice information;

a command acquisition module, configured to identify the voice information to obtain an image processing command;

And an image processing module configured to perform image processing on the target image according to the image processing command to obtain the processed target image.
The apparatus according to claim 7, wherein

The command acquisition module includes:

a text acquisition unit configured to convert the voice information into text information;

a text extracting unit configured to extract a processing target keyword and a processing mode keyword from the text information;

A command forming unit is provided to compose the processing target keyword and the processing mode keyword into an image processing command.
The apparatus according to claim 7, wherein

The command acquisition module includes:

a word obtaining unit, configured to extract, according to the voice information and a voice library pre-set with a keyword voice, a word in the voice information that is the same as a voice in the voice library in which the keyword voice is pre-set, wherein The voice library pre-set with keyword speech includes a preset processing object keyword voice and a processing mode keyword voice;

a word extracting unit configured to obtain a processing target keyword and a processing mode keyword according to the extracted words having the same pronunciation;

A command generating unit configured to compose the processing target keyword and the processing mode keyword into an image processing command.
The device according to claim 8 or 9, wherein

The image processing module includes:

An object recognition unit configured to identify a processing object from the target image according to the processing target keyword;

An execution processing unit is provided to perform processing on the processing object according to the processing mode keyword.
The apparatus of claim 7 wherein said apparatus further comprises:

a sound judging module, configured to determine whether the voice information includes only one sound;

a first extraction module, configured to extract a voice word of the first N bits of the voice information if the voice information includes only one voice;

a speech word judging module, configured to determine whether the speech word includes a preset command word; if yes, enter the step of recognizing the speech information to obtain the image processing command.
The apparatus according to claim 11, wherein

The device also includes:

a second extraction module, configured to extract a voice word of N bits before each voice if the voice information includes multiple voices;

a sound screening module, configured to acquire a sound in which the voice word includes a preset command word;

The identifying the voice information, and obtaining the image processing command is specifically:

Identifying the sound obtained by the sound screening module to obtain the image processing command.
A storage medium, wherein the storage medium stores a computer program, the computer program being arranged to perform the method of any one of claims 1 to 6 at runtime.
An electronic device comprising a memory and a processor, wherein the memory stores a computer program, the processor being arranged to perform the method of any one of claims 1 to 6 by the computer program .