WO2019076120A1 - Image processing method, device, storage medium and electronic device - Google Patents

Image processing method, device, storage medium and electronic device Download PDF

Info

Publication number
WO2019076120A1
WO2019076120A1 PCT/CN2018/100212 CN2018100212W WO2019076120A1 WO 2019076120 A1 WO2019076120 A1 WO 2019076120A1 CN 2018100212 W CN2018100212 W CN 2018100212W WO 2019076120 A1 WO2019076120 A1 WO 2019076120A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
image processing
processing
keyword
command
Prior art date
Application number
PCT/CN2018/100212
Other languages
French (fr)
Chinese (zh)
Inventor
邓童虎
Original Assignee
格力电器(武汉)有限公司
珠海格力电器股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 格力电器(武汉)有限公司, 珠海格力电器股份有限公司 filed Critical 格力电器(武汉)有限公司
Publication of WO2019076120A1 publication Critical patent/WO2019076120A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the embodiments of the present invention relate to the field of image processing technologies, and in particular, to a method, an apparatus, a storage medium, and an electronic device for image processing.
  • the process of image processing by a user using a smart device such as a mobile terminal is generally The user obtains the image to be processed by using the mobile terminal, and the user performs manual operation on the mobile terminal, processes the image, and processes and acquires the desired image.
  • the technical problem to be solved by the embodiments of the present application is to provide a simple, non-manually operated image processing method, apparatus, storage medium and electronic device.
  • a technical solution adopted by the embodiment of the present application is to provide a method for image processing, which is applied to a terminal device, including: receiving voice information; and identifying the voice information to obtain an image. Processing the command; performing image processing on the target image according to the image processing command to obtain the processed target image.
  • the step of identifying the voice information to obtain an image processing command comprises: converting the voice information into text information; extracting a processing object keyword and a processing mode keyword from the text information; The processing target keyword and the processing mode keyword are combined into an image processing command.
  • the step of identifying the voice information to obtain an image processing command comprises: extracting, according to the voice information and a voice library pre-set with a keyword voice, a pre-set key in the voice information a word in the speech database of the same pronunciation, wherein the speech library pre-set with the keyword speech includes a preset processing object keyword speech and a processing mode keyword speech; according to the extracted pronunciation is the same a word, a processing target keyword and a processing mode keyword are obtained; and the processing target keyword and the processing mode keyword are combined into an image processing command.
  • the step of performing image processing on the target image according to the image processing command includes: identifying, according to the processing target keyword, a processing object from the target image; according to the processing mode keyword, Processing is performed on the processing object.
  • the method further includes: determining whether the voice information includes only one voice; if the voice information includes only one voice, extracting the voice information a voice word of the first N bits; determining whether the voice word contains a sound of a preset command word; if yes, entering the step of identifying the voice information to obtain the image processing command.
  • the method further includes: if the voice information includes multiple voices, extracting the first N voice words of each voice; and acquiring the voice words to include a preset command word voice; The voice information is identified, and the image processing command is obtained by: identifying the obtained sound to obtain the image processing command.
  • another technical solution adopted by the embodiment of the present application is to provide an apparatus for image processing, which is applied to a terminal device, including: a voice receiving module, configured to receive voice information; and a command acquiring module. And the image processing module is configured to perform image processing on the target image according to the image processing command to obtain the processed target image.
  • the command obtaining module includes: a text acquiring unit configured to convert the voice information into text information; and a text extracting unit configured to extract a processing object keyword and a processing mode keyword from the text information And a command forming unit configured to compose the processing target keyword and the processing mode keyword into an image processing command.
  • the command obtaining module includes: a word obtaining unit, configured to extract, according to the voice information and a voice library pre-set with keyword voice, the voice information and the preset keyword voice
  • the words in the speech library have the same pronunciation, wherein the speech library pre-set with the keyword speech includes the preset processing object keyword speech and the processing mode keyword speech;
  • the word extraction unit is set according to the extracted The words having the same pronunciation obtain the processing target keyword and the processing mode keyword;
  • the command generating unit is configured to compose the processing target keyword and the processing mode keyword into image processing commands.
  • the image processing module includes: an object recognition unit, configured to: identify a processing object from the target image according to the processing object keyword; and execute a processing unit, configured to be according to the processing mode keyword, Processing is performed on the processing object.
  • the sound determining module is configured to determine whether the voice information includes only one voice; the first extracting module is configured to: if the voice information includes only one voice, extract the first N bits of the voice information a speech word; a speech word judging module, configured to determine whether the speech word includes a sound of a preset command word; and if so, enter the step of recognizing the speech information to obtain the image processing command.
  • the device further includes: a second extraction module, configured to: if the voice information includes multiple sounds, extract a voice word of the first N bits of each voice; and the sound screening module is configured to acquire the voice word included The sound having the preset command word; the identifying the voice information, and obtaining the image processing command specifically: identifying the sound obtained by the sound screening module, and obtaining the image processing command.
  • a second extraction module configured to: if the voice information includes multiple sounds, extract a voice word of the first N bits of each voice
  • the sound screening module is configured to acquire the voice word included The sound having the preset command word; the identifying the voice information, and obtaining the image processing command specifically: identifying the sound obtained by the sound screening module, and obtaining the image processing command.
  • an embodiment of the present application provides a storage medium, where the computer program is stored, and the computer program is configured to execute the method in the embodiment of the present application.
  • an embodiment of the present application provides an electronic device, including a memory and a processor, where the computer stores a computer program, where the processor is configured to execute the computer program by using the computer program.
  • the step of the image processing method includes: receiving voice information; identifying the voice information, and obtaining an image Processing the command; performing image editing processing on the target image according to the image processing command to obtain the edited target image. Therefore, in the embodiment of the present application, the user does not need to manually operate the mobile terminal to process the image, but the function of the image processing can be realized only by receiving the voice information of the user. Compared with the prior art, the embodiment of the present application is adopted. The solution is simpler, saves user time and improves operational efficiency.
  • FIG. 1 is a schematic flowchart of a method for image processing according to Embodiment 1 of the present application;
  • FIG. 2 is a schematic flowchart of a method for recognizing voice information and obtaining an image processing command in image processing according to Embodiment 1 of the present application;
  • FIG. 3 is another schematic flowchart of a method for recognizing voice information and obtaining an image processing command in image processing according to Embodiment 1 of the present application;
  • FIG. 4 is a schematic flowchart of a method for performing image processing on a target image according to an image processing command to obtain a processed target image according to an image processing method according to Embodiment 1 of the present application;
  • FIG. 5 is a schematic flowchart diagram of a method for image processing according to Embodiment 2 of the present application.
  • FIG. 6 is a schematic structural diagram of an apparatus for image processing according to Embodiment 3 of the present application.
  • FIG. 7 is a schematic structural diagram of an apparatus for image processing according to Embodiment 4 of the present application.
  • FIG. 8 is a schematic diagram of a hardware structure of an electronic device that performs image processing according to an embodiment of the present application.
  • FIG. 1 is a method for image processing according to Embodiment 1 of the present application, which is applied to a terminal device, and includes:
  • Step 101 Receive voice information.
  • the mobile terminal collects the voice information of the user in real time, and the voice information is the voice sent by the user in real time.
  • Step 102 Identify voice information to obtain an image processing command.
  • the step of identifying the voice information includes:
  • Step 1021 Convert the received voice information into text information.
  • the text information is consistent with the voice information, and the text information is convenient for the mobile terminal to recognize and extract.
  • the text information includes a processing target keyword and a processing mode keyword, and the processing target keyword is a name of an object to be processed in the to-be-processed image, for example, the processing target keyword includes “person”, “app”, and “house”.
  • the processing mode keyword is the way the user wants to process the object to be processed in the picture.
  • the processing mode keywords include “cropping”, “mosaic mosaic”, “beauty”, “highlight” and “thin face”.
  • Step 1022 Extract a processing target keyword and a processing mode keyword from the text information.
  • Step 1023 The processing object keyword and the processing mode keyword are combined into an image processing command.
  • the content obtained is “beauty processing of the person in the image”, wherein The processing object keyword is "person”, and the processing mode keyword is "beauty”, and the acquired image processing command is "beauty to the person in the picture”.
  • the voice information can be identified by other means, and an image processing command is obtained.
  • an image processing command is obtained.
  • steps 1021a, 1022a, and 1023a are performed:
  • Step 1021a extract, according to the voice information and the voice library pre-set with the keyword voice, the words in the voice information that are the same as the voice library in which the keyword voice is pre-set, wherein the voice library is pre-set with the keyword voice.
  • the preset processing target keyword speech and the processing mode keyword speech are included; for example, the speech library pre-set with the keyword speech includes pre-set processing target keywords such as “person”, “female” and “male”.
  • Voice, and pre-set processing key words such as “cropping”, “mosaic”, “beauty” and “highlight”.
  • Step 1022a Obtain a processing target keyword and a processing mode keyword according to the extracted words having the same pronunciation
  • the preset processing target keyword voice includes “female”
  • the preset processing mode keyword voice includes “beauty”
  • the extracted words with the same pronunciation are “female”.
  • Step 1023a The processing target keyword and the processing mode keyword are combined into an image processing command.
  • Step 103 Perform image processing on the target image according to the image processing command to obtain the processed target image.
  • step 103 includes:
  • Step 1031 Identify, according to the processing target keyword and the processing mode keyword in the image processing command acquired in step 102, an image corresponding to the processing target keyword in the image by using an image recognition technology;
  • Step 1032 Perform processing on the processing object according to the manner corresponding to the processing mode keyword.
  • the processing is performed on the processing object in the image to be processed according to the manner corresponding to the processing mode keyword, and a processed new image is generated.
  • the image processing method includes: receiving voice information; identifying the voice information to obtain an image processing command; performing image processing on the target image according to the image processing command, and obtaining the processed image Target image. Therefore, in the embodiment of the present application, the mobile terminal does not need to receive the manual operation of the user to process the image, but only realizes the function of image processing by receiving the voice information of the user, and the process is simpler than the prior art. It saves user time and improves operational efficiency.
  • FIG. 5 is a schematic diagram of an image processing method according to Embodiment 2 of the present application, which is applied to a terminal device, and includes:
  • Step 201 Receive voice information.
  • the mobile terminal collects the voice information of the user in real time, and the voice information is the voice sent by the user in real time.
  • Step 202 Determine whether the voice information includes only one voice
  • the existing voice recognition technology is used to determine whether the voice information includes only one voice through voice features such as timbre and audio.
  • Step 203 If the voice information includes only one voice, extract the voice words of the N bits before the voice information;
  • the voice words of the first N digits of the voice information are extracted, optionally, N is 3, 5, or 7, etc.; for example, when N is 5, and the received voice information is "the processing command is to make a beauty for the woman in the picture", then the voice word of the first 5 digits of the extracted voice information is "processing command is”.
  • Step 204 Determine whether the phonetic word includes a preset command word
  • the preset command word is a preset command word, for example, “processing command is” or “command is”, etc., and a specific example, when the voice word obtained according to step 203 is “processing command is”, and the preset command is When the word is also "process command is”, it is determined that the phonetic word contains a preset command word. When it is determined that the phonetic word contains the preset command word, the process proceeds to step 205, otherwise, the process proceeds to step 207.
  • Step 205 Identify voice information to obtain an image processing command.
  • step 205 and step 102 of the embodiment of the present application are based on the same inventive concept, and the specific content of step 205 may refer to step 102, and details are not described herein.
  • Step 206 Perform image processing on the target image according to the image processing command, to obtain the processed target image.
  • Step 207 If the voice information includes multiple voices, extract the voice words of the first N bits of each voice;
  • the voice words of the first N bits of each sound are extracted and recorded.
  • Step 208 Acquire a voice that contains a preset command word
  • the speech words in the respective speech information in the obtaining step 207 contain the sound of the preset command word. Further, in the sound in which the obtained phonetic word includes the preset command word, the sound with the highest volume is selected, and step 209 is performed on the sound.
  • Step 209 Identify the obtained sound to obtain the image processing command.
  • step 209 and step 102 of the embodiment of the present application are based on the same inventive concept, and the specific content of step 209 may refer to step 102, and details are not described herein.
  • step 206 is performed.
  • the step of the image processing method includes: receiving voice information; determining whether the voice information includes only one voice, and if so, extracting the N-bit voiceword before the voice information and determining whether the voiceword includes a preset command word And if yes, identifying the voice information, obtaining an image processing command, and performing image editing processing on the target image according to the image processing command to obtain the processed target image; and determining that the voice information includes multiple
  • the N-bit speech words of each sound are extracted, the speech words include the sounds of the preset command words, the acquired sounds are recognized, the image processing commands are obtained, and the target image is subjected to image processing, and processed. The target image.
  • the mobile terminal does not need to receive the manual operation of the user to process the image, but only realizes the function of image processing by receiving the voice information of the user, and the process is simpler than the prior art. It saves user time and improves operational efficiency. Further, when there are a plurality of acquired sounds, the first N-bit speech words of the respective sounds are extracted for each sound, image processing is performed separately, or image processing is performed according to the sound having the highest volume.
  • FIG. 6 is a device 50 for image processing according to Embodiment 3 of the present application, which is applied to a terminal device, including: a voice receiving module 51, a command acquiring module 52, and an image processing module 53;
  • the voice receiving module 51 is configured to receive voice information.
  • the command obtaining module 52 is configured to identify the voice information to obtain an image processing command
  • the image processing module 53 is arranged to perform image processing on the target image in accordance with the image processing command to obtain the processed target image.
  • the command obtaining module 52 includes: a text obtaining unit 521, a text extracting unit 522, and a command forming unit 523;
  • the text obtaining unit 521 is configured to convert the voice information into text information
  • the text extracting unit 522 is configured to extract a processing target keyword and a processing mode keyword from the text information
  • the command forming unit 523 is configured to compose the processing target keyword and the processing mode keyword into image processing commands.
  • the image processing module 53 includes: an object recognition unit 531 and an execution processing unit 532;
  • the object recognition unit 531 is configured to identify the processing object from the target image according to the processing target keyword
  • the execution processing unit 532 is configured to perform processing on the processing target according to the processing mode keyword.
  • the image processing method apparatus includes: a voice receiving module 51, a command acquiring module 52, and an image processing module 53; respectively performing: receiving voice information; identifying the voice information to obtain an image processing command;
  • the image processing command performs image processing on the target image to obtain the processed target image. Therefore, in the embodiment of the present application, the mobile terminal does not need to receive the manual operation of the user to process the image, but only realizes the function of image processing by receiving the voice information of the user, and the process is simpler than the prior art. It saves user time and improves operational efficiency.
  • FIG. 7 is a device 50 for image processing according to Embodiment 4 of the present application, which is applied to a terminal device, and includes: a voice receiving module 51, a command acquiring module 52, and an image processing module 53;
  • the voice receiving module 51 is configured to receive voice information.
  • the command obtaining module 52 is configured to identify the voice information to obtain an image processing command
  • the image processing module 53 is configured to perform image processing on the target image according to the image processing command to obtain the processed target image.
  • the command obtaining module 52 includes: a text obtaining unit 521, a text extracting unit 522, and a command forming unit 523;
  • the text obtaining unit 521 is configured to convert the voice information into text information
  • the text extracting unit 522 is configured to extract a processing target keyword and a processing mode keyword from the text information
  • the command forming unit 523 is configured to compose the processing target keyword and the processing mode keyword into image processing commands.
  • the command obtaining module 52 includes: a word obtaining unit (not shown), a word extracting unit (not shown), and a command generating unit (not shown);
  • a word obtaining unit configured to extract, according to the voice information and the voice library pre-set with the keyword voice, words in the voice information that are the same as the voice library in which the keyword voice is pre-set, wherein the keyword voice is pre-set
  • the voice library contains preset processing target keyword speech and processing mode keyword speech;
  • a word extracting unit configured to obtain a processing target keyword and a processing mode keyword according to the extracted words having the same pronunciation
  • a command generation unit is configured to compose an image processing command by processing the object keyword and the processing mode keyword.
  • the image processing module 53 includes: an object recognition unit 531 and an execution processing unit 532;
  • the object recognition unit 531 is configured to identify the processing object from the target image according to the processing target keyword
  • the execution processing unit 532 is configured to perform processing on the processing target according to the processing mode keyword.
  • the device 50 further includes: a sound determining module 54 configured to determine whether the voice information includes only one voice;
  • the first extraction module 55 is configured to extract a voice word of the first N bits of the voice information if only one voice is included in the voice information;
  • the phonetic word judging module 56 is configured to determine whether the phonetic word contains a sound of a preset command word; if so, enter the step of recognizing the voice information to obtain the image processing command.
  • the device 50 further includes:
  • the second extraction module 57 is configured to extract a phonetic word of the first N bits of each voice if the voice information includes multiple voices;
  • the sound screening module 58 is configured to acquire a sound in which the voice word includes a preset command word
  • the identifying the voice information, and obtaining the image processing command is specifically:
  • the image processing method apparatus includes: a voice receiving module 51, a command acquiring module 52, and an image processing module 53; respectively performing: receiving voice information; identifying the voice information to obtain an image processing command;
  • the image processing command performs image processing on the target image to obtain the processed target image. Therefore, in the embodiment of the present application, the mobile terminal does not need to receive the manual operation of the user to process the image, but only realizes the function of image processing by receiving the voice information of the user, and the process is simpler than the prior art. It saves user time and improves operational efficiency. Further, when there are a plurality of acquired sounds, the first N-bit speech words of the respective sounds are extracted for each sound, image processing is performed separately, or image processing is performed according to the sound having the highest volume.
  • FIG. 8 is a schematic diagram showing the hardware structure of an electronic device for performing image processing according to an embodiment of the present disclosure.
  • the electronic device 70 includes:
  • One or more processors 71 and a memory 72 are exemplified by a processor 71 in FIG.
  • the processor 71 and the memory 72 may be connected by a bus or other means, as exemplified by a bus connection in FIG.
  • the memory 72 is used as a non-volatile computer readable storage medium, and can be used for storing non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions corresponding to image processing in the embodiments of the present application.
  • a module for example, the voice receiving module 51, the command acquiring module 52, and the image processing module 53 shown in FIG. 6).
  • the processor 71 executes various functional applications of the server and data processing by executing non-volatile software programs, instructions, and modules stored in the memory 72, that is, image processing of the above-described method embodiments.
  • the memory 72 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to use of the item recommendation device, and the like.
  • memory 72 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device.
  • memory 72 can optionally include memory remotely located relative to processor 71, which can be connected to the merchandise recommendation device over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the one or more modules are stored in the memory 72, and when executed by the one or more processors 71, perform image processing in any of the above method embodiments, for example, performing the above described FIG. Method step 101 to step 103, method step 1021 to step 1023 in FIG. 2, method step 1021a to step 1023a in FIG. 3, method step 1031 to step 1032 in FIG. 4, method step 201 to step 209 in FIG.
  • the functions of modules 51-53, units 521-523, 531-532, modules 51-58, 521-523, and units 531-532 in FIG. 6 are implemented.
  • the electronic device of the embodiment of the present application exists in various forms, including but not limited to: a server: a device that provides a computing service, and the server is configured to include a processor, a hard disk, a memory, a system bus, etc., and the server is similar to a general computer architecture, but Due to the need to provide highly reliable services, it is highly demanded in terms of processing power, stability, reliability, security, scalability, and manageability. Or other electronic devices with data interaction capabilities.
  • the embodiment of the present application provides a non-transitory computer readable storage medium storing computer-executable instructions that are executed by an electronic device to perform any of the above method embodiments.
  • Image processing in, for example, performing the method steps 101 to 103 in FIG. 1 described above, the method steps 1021 to 1023 in FIG. 2, the method steps 1021a to 1023a in FIG. 3, and the method steps in FIG. 1031 to step 1032, the method steps 201 to 209 in FIG. 5, the modules 51-53, the units 521-523, the units 531-532, the modules 51-58 in FIG. 7, the units 521-523 in FIG.
  • the function of units 531-532 in, for example, performing the method steps 101 to 103 in FIG. 1 described above, the method steps 1021 to 1023 in FIG. 2, the method steps 1021a to 1023a in FIG. 3, and the method steps in FIG. 1031 to step 1032, the method steps 201 to 209 in FIG. 5, the modules 51-53, the units 521-523, the
  • An embodiment of the present application provides a computer program product, including a computing program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer,
  • the computer performs image processing in any of the above method embodiments, for example, performing the method steps 101 to 103 in FIG. 1 described above, the method steps 1021 to 1023 in FIG. 2, and the method steps 1021a to 1023a in FIG. Method step 1031 to step 1032 in FIG. 4, method step 201 to step 209 in FIG. 5, implement module 51-53, unit 521-523, unit 531-532, and module 51- in FIG. 58, units 521-523, functions of units 531-532.
  • the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie may be located A place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
  • an image processing method, apparatus, storage medium, and electronic device provided by an embodiment of the present invention have the following beneficial effects: a user does not need to manually process a mobile terminal to process an image, but merely receives a user's voice information.
  • the image processing function can be realized, and the process is simpler than the prior art, saving user time and improving operation efficiency.

Abstract

The present invention relates to the technical field of image processing, in particular, to an image processing method, a device, a storage medium and an electronic device. Said method comprises: receiving voice information (101); recognizing the voice information, so as to obtain an image processing command (102); and performing image processing on the target image according to the image processing command, so as to obtain the processed target image (103). Therefore, a user does not need to manually operate a mobile terminal to process an image, and an image processing function can be implemented just by receiving voice information from the user. Compared with the prior art, this process is simpler, saving the time of the user, improving the operation efficiency.

Description

一种图像处理的方法、装置、存储介质及电子装置Method, device, storage medium and electronic device for image processing 技术领域Technical field
本申请实施方式涉及图像处理技术领域,特别是涉及一种图像处理的方法、装置、存储介质及电子装置。The embodiments of the present invention relate to the field of image processing technologies, and in particular, to a method, an apparatus, a storage medium, and an electronic device for image processing.
背景技术Background technique
随着科学技术的发展,移动终端等智能设备的功能日渐丰富和完善,其中包括智能化的图像处理功能,现有技术中,用户使用移动终端等智能设备进行图像处理的过程一般是,先由用户使用移动终端获取待处理的图像,用户再对移动终端进行手动操作,对图像进行处理,进而处理并获取到想要的图像。With the development of science and technology, the functions of smart devices such as mobile terminals are increasingly enriched and improved, including intelligent image processing functions. In the prior art, the process of image processing by a user using a smart device such as a mobile terminal is generally The user obtains the image to be processed by using the mobile terminal, and the user performs manual operation on the mobile terminal, processes the image, and processes and acquires the desired image.
由此可知,在现有技术中,用户使用移动终端等智能设备进行图像处理的过程较为繁琐,用户必须对移动终端等智能设备进行手动在可以处理图像,为用户带来不便,因此,提供一种简便、无需手动操作的图像处理方法是尤为必要的。Therefore, in the prior art, the process of the user performing image processing using the smart device such as the mobile terminal is cumbersome, and the user must manually process the image on the smart device such as the mobile terminal, which is inconvenient for the user, and therefore provides a A simple, hands-free image processing method is especially necessary.
发明内容Summary of the invention
本申请实施方式主要解决的技术问题是提供一种简便、无需手动操作的图像处理的方法、装置、存储介质及电子装置。The technical problem to be solved by the embodiments of the present application is to provide a simple, non-manually operated image processing method, apparatus, storage medium and electronic device.
第一方面,为解决上述技术问题,本申请实施方式采用的一个技术方案是:提供一种图像处理的方法,应用于终端设备,包括:接收语音信息;对所述语音信息进行识别,得到图像处理命令;根据所述图像处理命令,对目标图像进行图像处理,得到处理后的所述目标图像。In a first aspect, in order to solve the above technical problem, a technical solution adopted by the embodiment of the present application is to provide a method for image processing, which is applied to a terminal device, including: receiving voice information; and identifying the voice information to obtain an image. Processing the command; performing image processing on the target image according to the image processing command to obtain the processed target image.
可选的,所述对所述语音信息进行识别,得到图像处理命令的步骤包括:将所述语音信息转换为文本信息;从所述文本信息中提取出处理对象关键词和处理方式关键词;将所述处理对象关键词和处理方式关键 词组成图像处理命令。Optionally, the step of identifying the voice information to obtain an image processing command comprises: converting the voice information into text information; extracting a processing object keyword and a processing mode keyword from the text information; The processing target keyword and the processing mode keyword are combined into an image processing command.
可选的,所述对所述语音信息进行识别,得到图像处理命令的步骤包括:根据所述语音信息和预设有关键词语音的语音库,提取出所述语音信息中与预设有关键词语音的语音库中发音相同的词语,其中,预设有关键词语音的语音库中包含了预设的处理对象关键词语音和处理方式关键词语音;根据所提取出的发音相同的所述词语,获得处理对象关键词和处理方式关键词;将所述处理对象关键词和处理方式关键词组成图像处理命令。Optionally, the step of identifying the voice information to obtain an image processing command comprises: extracting, according to the voice information and a voice library pre-set with a keyword voice, a pre-set key in the voice information a word in the speech database of the same pronunciation, wherein the speech library pre-set with the keyword speech includes a preset processing object keyword speech and a processing mode keyword speech; according to the extracted pronunciation is the same a word, a processing target keyword and a processing mode keyword are obtained; and the processing target keyword and the processing mode keyword are combined into an image processing command.
可选的,所述根据所述图像处理命令,对目标图像进行图像处理的步骤包括:根据所述处理对象关键词,从所述目标图像中识别出处理对象;根据所述处理方式关键词,对所述处理对象执行处理。Optionally, the step of performing image processing on the target image according to the image processing command includes: identifying, according to the processing target keyword, a processing object from the target image; according to the processing mode keyword, Processing is performed on the processing object.
可选的,在所述接收语音信息的步骤之后,所述方法还包括:判断所述语音信息中是否只包含一种声音;若所述语音信息中只包含一种声音,提取所述语音信息前N位的语音词;判断所述语音词是否包含有预设命令词的声音;若是,则进入所述对语音信息进行识别,得到所述图像处理命令的步骤。Optionally, after the step of receiving the voice information, the method further includes: determining whether the voice information includes only one voice; if the voice information includes only one voice, extracting the voice information a voice word of the first N bits; determining whether the voice word contains a sound of a preset command word; if yes, entering the step of identifying the voice information to obtain the image processing command.
可选的,所述方法还包括:若所述语音信息包含有多种声音,提取各个声音前N位的语音词;获取所述语音词包含有预设命令词的声音;所述对所述语音信息进行识别,得到所述图像处理命令具体为:对获取得到的声音进行识别,得到所述图像处理命令。Optionally, the method further includes: if the voice information includes multiple voices, extracting the first N voice words of each voice; and acquiring the voice words to include a preset command word voice; The voice information is identified, and the image processing command is obtained by: identifying the obtained sound to obtain the image processing command.
第二方面,为解决上述技术问题,本申请实施方式采用的另一个技术方案是:提供一种图像处理的装置,应用于终端设备,包括:语音接收模块,设置为接收语音信息;命令获取模块,设置为对所述语音信息进行识别,得到图像处理命令;图像处理模块,设置为根据所述图像处理命令,对目标图像进行图像处理,得到处理后的所述目标图像。In a second aspect, in order to solve the above technical problem, another technical solution adopted by the embodiment of the present application is to provide an apparatus for image processing, which is applied to a terminal device, including: a voice receiving module, configured to receive voice information; and a command acquiring module. And the image processing module is configured to perform image processing on the target image according to the image processing command to obtain the processed target image.
可选的,所述命令获取模块包括:文本获取单元,设置为将所述语音信息转换为文本信息;文本提取单元,设置为从所述文本信息中提取出处理对象关键词和处理方式关键词;命令形成单元,设置为将所述处理对象关键词和处理方式关键词组成图像处理命令。Optionally, the command obtaining module includes: a text acquiring unit configured to convert the voice information into text information; and a text extracting unit configured to extract a processing object keyword and a processing mode keyword from the text information And a command forming unit configured to compose the processing target keyword and the processing mode keyword into an image processing command.
可选的,所述命令获取模块包括:词语获取单元,其设置为根据所述语音信息和预设有关键词语音的语音库,提取出所述语音信息中与所述预设有关键词语音的语音库中发音相同的词语,其中,预设有关键词语音的语音库中包含了预设的处理对象关键词语音和处理方式关键词语音;词语提取单元,其设置为根据所提取出的发音相同的所述词语,获得处理对象关键词和处理方式关键词;命令生成单元,其设置为将所述处理对象关键词和处理方式关键词组成图像处理命令。Optionally, the command obtaining module includes: a word obtaining unit, configured to extract, according to the voice information and a voice library pre-set with keyword voice, the voice information and the preset keyword voice The words in the speech library have the same pronunciation, wherein the speech library pre-set with the keyword speech includes the preset processing object keyword speech and the processing mode keyword speech; the word extraction unit is set according to the extracted The words having the same pronunciation obtain the processing target keyword and the processing mode keyword; and the command generating unit is configured to compose the processing target keyword and the processing mode keyword into image processing commands.
可选的,所述图像处理模块包括:对象识别单元,设置为根据所述处理对象关键词,从所述目标图像中识别出处理对象;执行处理单元,设置为根据所述处理方式关键词,对所述处理对象执行处理。Optionally, the image processing module includes: an object recognition unit, configured to: identify a processing object from the target image according to the processing object keyword; and execute a processing unit, configured to be according to the processing mode keyword, Processing is performed on the processing object.
可选的,声音判断模块,设置为判断所述语音信息中是否只包含一种声音;第一提取模块,设置为若所述语音信息中只包含一种声音,提取所述语音信息前N位的语音词;语音词判断模块,设置为判断所述语音词是否包含有预设命令词的声音;若是,则进入所述对语音信息进行识别,得到所述图像处理命令的步骤。Optionally, the sound determining module is configured to determine whether the voice information includes only one voice; the first extracting module is configured to: if the voice information includes only one voice, extract the first N bits of the voice information a speech word; a speech word judging module, configured to determine whether the speech word includes a sound of a preset command word; and if so, enter the step of recognizing the speech information to obtain the image processing command.
可选的,所述装置还包括:第二提取模块,设置为若所述语音信息包含有多种声音,提取各个声音前N位的语音词;声音筛选模块,设置为获取所述语音词包含有预设命令词的声音;所述对所述语音信息进行识别,得到所述图像处理命令具体为:对声音筛选模块获取得到的声音进行识别,得到所述图像处理命令。Optionally, the device further includes: a second extraction module, configured to: if the voice information includes multiple sounds, extract a voice word of the first N bits of each voice; and the sound screening module is configured to acquire the voice word included The sound having the preset command word; the identifying the voice information, and obtaining the image processing command specifically: identifying the sound obtained by the sound screening module, and obtaining the image processing command.
第三方面,本申请实施例提供了一种存储介质,所述存储介质中存储有计算机程序,所述计算机程序被设置为运行时执行本申请实施例中的方法。In a third aspect, an embodiment of the present application provides a storage medium, where the computer program is stored, and the computer program is configured to execute the method in the embodiment of the present application.
第四方面,本申请实施例提供了一种电子装置,包括存储器和处理器,其中,所述存储器中存储有计算机程序,所述处理器被设置为通过所述计算机程序执行本申请实施例中的方法。In a fourth aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the computer stores a computer program, where the processor is configured to execute the computer program by using the computer program. Methods.
本申请实施实施例中提供的方案的有益效果是:区别于现有技术的方案,在本申请实施方式中,图像处理方法的步骤包括:接收语音信息;对所述语音信息进行识别,得到图像处理命令;根据所述图像处理命令, 对目标图像进行图像编辑处理,得到编辑后的所述目标图像。因此,在本申请实施方式中,用户无需通过的手动操作移动终端来处理图像,而是仅仅通过接收用户的语音信息,便可实现图像处理的功能,相比现有技术,采用本申请实施例中的方案更加简便,节省了用户时间,提高了操作效率。The beneficial effects of the solution provided in the embodiment of the present application are: different from the prior art solution, in the embodiment of the present application, the step of the image processing method includes: receiving voice information; identifying the voice information, and obtaining an image Processing the command; performing image editing processing on the target image according to the image processing command to obtain the edited target image. Therefore, in the embodiment of the present application, the user does not need to manually operate the mobile terminal to process the image, but the function of the image processing can be realized only by receiving the voice information of the user. Compared with the prior art, the embodiment of the present application is adopted. The solution is simpler, saves user time and improves operational efficiency.
附图说明DRAWINGS
一个或多个实施方式通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施方式的限定,附图中具有相同参考数字标号的元件表示为类似的元件,除非有特别申明,附图中的图不构成比例限制。The one or more embodiments are exemplified by the accompanying drawings in the accompanying drawings. The figures in the drawings do not constitute a scale limitation unless otherwise stated.
图1是本申请实施方式一提供的一种图像处理的方法的流程示意图;1 is a schematic flowchart of a method for image processing according to Embodiment 1 of the present application;
图2是本申请实施方式一提供的一种图像处理中对语音信息进行识别并得到图像处理命令的方法的一流程示意图;2 is a schematic flowchart of a method for recognizing voice information and obtaining an image processing command in image processing according to Embodiment 1 of the present application;
图3是本申请实施方式一提供的一种图像处理中对语音信息进行识别并得到图像处理命令的方法的另一流程示意图;3 is another schematic flowchart of a method for recognizing voice information and obtaining an image processing command in image processing according to Embodiment 1 of the present application;
图4是本申请实施方式一提供的一种图像处理中根据图像处理命令,对目标图像进行图像处理,得到处理后的目标图像的方法的流程示意图;4 is a schematic flowchart of a method for performing image processing on a target image according to an image processing command to obtain a processed target image according to an image processing method according to Embodiment 1 of the present application;
图5是本申请实施方式二提供的一种图像处理的方法的流程示意图;FIG. 5 is a schematic flowchart diagram of a method for image processing according to Embodiment 2 of the present application; FIG.
图6是本申请实施方式三提供的一种图像处理的装置的结构示意图;6 is a schematic structural diagram of an apparatus for image processing according to Embodiment 3 of the present application;
图7是本申请实施方式四提供的一种图像处理的装置的结构示意图;7 is a schematic structural diagram of an apparatus for image processing according to Embodiment 4 of the present application;
图8是本申请实施例提供的执行图像处理的电子设备的硬件结构示意图。FIG. 8 is a schematic diagram of a hardware structure of an electronic device that performs image processing according to an embodiment of the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施方式,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施方式仅用以解释本申请,并不用于限定本申请。In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting.
实施方式一Embodiment 1
请参阅图1至图4,其中,图1是本申请实施方式一提供的一种图像处理的方法,应用于终端设备,包括:Referring to FIG. 1 to FIG. 4, FIG. 1 is a method for image processing according to Embodiment 1 of the present application, which is applied to a terminal device, and includes:
步骤101:接收语音信息;Step 101: Receive voice information.
当用户开启移动终端的图像处理功能时,移动终端将实时采集用户的语音信息,该语音信息为用户实时发出的语音。When the user turns on the image processing function of the mobile terminal, the mobile terminal collects the voice information of the user in real time, and the voice information is the voice sent by the user in real time.
步骤102:对语音信息进行识别,得到图像处理命令;Step 102: Identify voice information to obtain an image processing command.
可选的,对语音信息进行识别的步骤包括:Optionally, the step of identifying the voice information includes:
步骤1021:将所接受到的语音信息转换为文本信息;Step 1021: Convert the received voice information into text information.
该文本信息与语音信息一致,文本信息便于移动终端识别以及提取。其中,文本信息中包括处理对象关键词和处理方式关键词,处理对象关键词为待处理图片中待处理的对象的名称,例如:处理对象关键词包括“人”、“苹果”和“房子”等;处理方式关键词为用户想对图片中待处理对象的处理方式,例如:处理方式关键词包括“裁剪”、“打马赛克”、“美颜”、“高光”和“瘦脸”等。The text information is consistent with the voice information, and the text information is convenient for the mobile terminal to recognize and extract. The text information includes a processing target keyword and a processing mode keyword, and the processing target keyword is a name of an object to be processed in the to-be-processed image, for example, the processing target keyword includes “person”, “app”, and “house”. The processing mode keyword is the way the user wants to process the object to be processed in the picture. For example, the processing mode keywords include “cropping”, “mosaic mosaic”, “beauty”, “highlight” and “thin face”.
步骤1022:从文本信息中提取处理对象关键词和处理方式关键词;Step 1022: Extract a processing target keyword and a processing mode keyword from the text information.
步骤1023:将处理对象关键词和处理方式关键词组成图像处理命令,举个例子,当所接受的语音信息转换为文本信息后得到的内容为“对图片中的人进行美颜处理”时,其中,处理对象关键词为“人”,处理方式关键词为“美颜”,则获取的图像处理命令则为“对图片中的人进行美颜”。Step 1023: The processing object keyword and the processing mode keyword are combined into an image processing command. For example, when the received voice information is converted into text information, the content obtained is “beauty processing of the person in the image”, wherein The processing object keyword is "person", and the processing mode keyword is "beauty", and the acquired image processing command is "beauty to the person in the picture".
当然,在本申请实施方式一中,也可以通过其他方式对语音信息进行识别,并得到图像处理命令,例如,参阅图3,执行如下步骤1021a、步骤1022a和步骤1023a:Of course, in the first embodiment of the present application, the voice information can be identified by other means, and an image processing command is obtained. For example, referring to FIG. 3, the following steps 1021a, 1022a, and 1023a are performed:
步骤1021a:根据语音信息和预设有关键词语音的语音库,提取出语音信息中与预设有关键词语音的语音库中发音相同的词语,其中,预设有关键词语音的语音库中包含了预设的处理对象关键词语音和处理方式关键词语音;例如,预设有关键词语音的语音库中包含了“人”、“女性”和“男性”等预先设置的处理对象关键词语音,以及“裁剪”、“打马赛克”、“美颜”和“高光”等预先设置的处理方式关键词语音。 Step 1021a: extract, according to the voice information and the voice library pre-set with the keyword voice, the words in the voice information that are the same as the voice library in which the keyword voice is pre-set, wherein the voice library is pre-set with the keyword voice. The preset processing target keyword speech and the processing mode keyword speech are included; for example, the speech library pre-set with the keyword speech includes pre-set processing target keywords such as “person”, “female” and “male”. Voice, and pre-set processing key words such as "cropping", "mosaic", "beauty" and "highlight".
步骤1022a:根据所提取出的发音相同的词语,获得处理对象关键词和处理方式关键词; Step 1022a: Obtain a processing target keyword and a processing mode keyword according to the extracted words having the same pronunciation;
可选的,例如,假如预先设置的处理对象关键词语音包括了“女性”,预先设置的处理方式关键词语音包括了“美颜”,并且,所提取出的发音相同的词语为“女性”和“美颜”,则将“女性”作为处理对象关键词,将“美颜”作为处理方式关键词。Optionally, for example, if the preset processing target keyword voice includes “female”, the preset processing mode keyword voice includes “beauty”, and the extracted words with the same pronunciation are “female”. And "beauty", "female" as the target keyword, "beauty" as the processing method keyword.
步骤1023a:将处理对象关键词和处理方式关键词组成图像处理命令。 Step 1023a: The processing target keyword and the processing mode keyword are combined into an image processing command.
可选的,例如,假如所获取的处理对象关键词为“女性”,所获取的处理方式关键词为“美颜”,则获取的图像处理命令则为“对图片中的女性进行美颜”。步骤103:根据图像处理命令,对目标图像进行图像处理,得到处理后的目标图像。Optionally, for example, if the acquired processing target keyword is “female” and the acquired processing mode keyword is “beauty”, the acquired image processing command is “beauty to the female in the picture” . Step 103: Perform image processing on the target image according to the image processing command to obtain the processed target image.
可选的,步骤103包括:Optionally, step 103 includes:
步骤1031:根据步骤102所获取的图像处理命令中的处理对象关键词和处理方式关键词,利用图像识别技术识别图片中与处理对象关键词对应的对象;Step 1031: Identify, according to the processing target keyword and the processing mode keyword in the image processing command acquired in step 102, an image corresponding to the processing target keyword in the image by using an image recognition technology;
步骤1032:根据处理方式关键词对应的方式对处理对象执行处理。Step 1032: Perform processing on the processing object according to the manner corresponding to the processing mode keyword.
根据处理方式关键词对应的方式对待处理图像中的处理对象执行处理并生成一处理后的新图像。The processing is performed on the processing object in the image to be processed according to the manner corresponding to the processing mode keyword, and a processed new image is generated.
本申请实施方式中,图像处理方法的步骤包括:接收语音信息;对所述语音信息进行识别,得到图像处理命令;根据所述图像处理命令,对目标图像进行图像处理,得到处理后的所述目标图像。因此,在本申请实施方式中,移动终端无需接收用户的手动操作来处理图像,而是仅仅通过接收用户的语音信息,便可实现图像处理的功能,相比现有技术,此过程更加简便,节省了用户时间,提高了操作效率。In the embodiment of the present application, the image processing method includes: receiving voice information; identifying the voice information to obtain an image processing command; performing image processing on the target image according to the image processing command, and obtaining the processed image Target image. Therefore, in the embodiment of the present application, the mobile terminal does not need to receive the manual operation of the user to process the image, but only realizes the function of image processing by receiving the voice information of the user, and the process is simpler than the prior art. It saves user time and improves operational efficiency.
实施方式二Embodiment 2
请参阅图5,图5是本申请实施方式二提供的一种图像处理的方法,应用于终端设备,包括:Referring to FIG. 5, FIG. 5 is a schematic diagram of an image processing method according to Embodiment 2 of the present application, which is applied to a terminal device, and includes:
步骤201:接收语音信息;Step 201: Receive voice information.
当用户开启移动终端的图像处理功能时,移动终端将实时采集用户的语音信息,该语音信息为用户实时发出的语音。When the user turns on the image processing function of the mobile terminal, the mobile terminal collects the voice information of the user in real time, and the voice information is the voice sent by the user in real time.
步骤202:判断语音信息中是否只包含一种声音;Step 202: Determine whether the voice information includes only one voice;
可选的,利用现有的语音识别技术,通过音色、音频等语音特征判断语音信息中是否只包含一种声音。Optionally, the existing voice recognition technology is used to determine whether the voice information includes only one voice through voice features such as timbre and audio.
步骤203:若语音信息中只包含一种声音,提取语音信息前N位的语音词;Step 203: If the voice information includes only one voice, extract the voice words of the N bits before the voice information;
可选的,当根据步骤202判断确认语音信息中只包含一种声音时,提取语音信息前N位的语音词,可选的,N为3、5或者7等;举个例子,当N为5,且所接受到的语音信息为“处理命令为对图片中的女性进行美颜”,则提取语音信息前5位的语音词为“处理命令为”。Optionally, when it is determined according to step 202 that the confirmed voice information includes only one voice, the voice words of the first N digits of the voice information are extracted, optionally, N is 3, 5, or 7, etc.; for example, when N is 5, and the received voice information is "the processing command is to make a beauty for the woman in the picture", then the voice word of the first 5 digits of the extracted voice information is "processing command is".
步骤204:判断语音词是否包含有预设命令词;Step 204: Determine whether the phonetic word includes a preset command word;
预设命令词为预先设置的命令词,例如:“处理命令为”或者“命令为”等,举个具体的例子,当根据步骤203获取的语音词为“处理命令为”,且预设命令词也为“处理命令为”时,则确定语音词包含有预设命令词。当判断语音词包含有预设命令词,则进入步骤205,否则,进入步骤207。The preset command word is a preset command word, for example, “processing command is” or “command is”, etc., and a specific example, when the voice word obtained according to step 203 is “processing command is”, and the preset command is When the word is also "process command is", it is determined that the phonetic word contains a preset command word. When it is determined that the phonetic word contains the preset command word, the process proceeds to step 205, otherwise, the process proceeds to step 207.
步骤205:对语音信息进行识别,得到图像处理命令;Step 205: Identify voice information to obtain an image processing command.
需要说明的是:本申请实施方式的步骤205与步骤102基于相同的发明构思,步骤205的具体内容可以参照步骤102,在此不一一赘述。It should be noted that step 205 and step 102 of the embodiment of the present application are based on the same inventive concept, and the specific content of step 205 may refer to step 102, and details are not described herein.
步骤206:根据所述图像处理命令,对目标图像进行图像处理,得到处理后的所述目标图像;Step 206: Perform image processing on the target image according to the image processing command, to obtain the processed target image.
步骤207:若语音信息包含有多种声音,则提取各个声音前N位的语音词;Step 207: If the voice information includes multiple voices, extract the voice words of the first N bits of each voice;
当执行完步骤202后确定语音信息包含有多种声音时,则提取并记录各个声音前N位的语音词。When it is determined that the voice information contains a plurality of sounds after the step 202 is performed, the voice words of the first N bits of each sound are extracted and recorded.
步骤208:获取语音词包含有预设命令词的声音;Step 208: Acquire a voice that contains a preset command word;
获取步骤207中各个语音信息中的语音词包含有预设命令词的声 音。进一步可选的,在所获取的语音词包含有预设命令词的声音中,筛选出音量最大的声音,对该声音执行步骤209。The speech words in the respective speech information in the obtaining step 207 contain the sound of the preset command word. Further, in the sound in which the obtained phonetic word includes the preset command word, the sound with the highest volume is selected, and step 209 is performed on the sound.
步骤209:对获取得到的声音进行识别,得到所述图像处理命令。Step 209: Identify the obtained sound to obtain the image processing command.
需要说明的是:本申请实施方式的步骤209与步骤102基于相同的发明构思,步骤209的具体内容可以参照步骤102,在此不一一赘述。It should be noted that step 209 and step 102 of the embodiment of the present application are based on the same inventive concept, and the specific content of step 209 may refer to step 102, and details are not described herein.
当执行完步骤209后,再执行步骤206。After step 209 is performed, step 206 is performed.
本申请实施方式中,图像处理方法的步骤包括:接收语音信息;判断语音信息中是否只包含一种声音,若是,提取语音信息前N位的语音词并判断语音词是否包含有预设命令词,若是,则对所述语音信息进行识别,得到图像处理命令,再根据所述图像处理命令,对目标图像进行图像编辑处理,得到处理后的所述目标图像;若判断语音信息中包含多种声音,则提取各个声音前N位的语音词,获取语音词包含有预设命令词的声音,对获取得到的声音进行识别,得到所述图像处理命令,对目标图像进行图像处理,得到处理后的所述目标图像。In the embodiment of the present application, the step of the image processing method includes: receiving voice information; determining whether the voice information includes only one voice, and if so, extracting the N-bit voiceword before the voice information and determining whether the voiceword includes a preset command word And if yes, identifying the voice information, obtaining an image processing command, and performing image editing processing on the target image according to the image processing command to obtain the processed target image; and determining that the voice information includes multiple For the sound, the N-bit speech words of each sound are extracted, the speech words include the sounds of the preset command words, the acquired sounds are recognized, the image processing commands are obtained, and the target image is subjected to image processing, and processed. The target image.
因此,在本申请实施方式中,移动终端无需接收用户的手动操作来处理图像,而是仅仅通过接收用户的语音信息,便可实现图像处理的功能,相比现有技术,此过程更加简便,节省了用户时间,提高了操作效率。此外,当获取的声音为多个时,还将针对各个声音提取各个声音前N位的语音词,分别执行图像处理,或者根据音量最大的声音,执行图像处理。Therefore, in the embodiment of the present application, the mobile terminal does not need to receive the manual operation of the user to process the image, but only realizes the function of image processing by receiving the voice information of the user, and the process is simpler than the prior art. It saves user time and improves operational efficiency. Further, when there are a plurality of acquired sounds, the first N-bit speech words of the respective sounds are extracted for each sound, image processing is performed separately, or image processing is performed according to the sound having the highest volume.
实施方式三Embodiment 3
请参阅图6,图6是本申请实施方式三提供的一种图像处理的装置50,应用于终端设备,包括:语音接收模块51、命令获取模块52和图像处理模块53;Referring to FIG. 6, FIG. 6 is a device 50 for image processing according to Embodiment 3 of the present application, which is applied to a terminal device, including: a voice receiving module 51, a command acquiring module 52, and an image processing module 53;
其中,语音接收模块51设置为接收语音信息;The voice receiving module 51 is configured to receive voice information.
命令获取模块52设置为对所述语音信息进行识别,得到图像处理命令;The command obtaining module 52 is configured to identify the voice information to obtain an image processing command;
图像处理模块53设置为根据所述图像处理命令,对目标图像进行 图像处理,得到处理后的所述目标图像。The image processing module 53 is arranged to perform image processing on the target image in accordance with the image processing command to obtain the processed target image.
可选的,所述命令获取模块52包括:文本获取单元521、文本提取单元522和命令形成单元523;Optionally, the command obtaining module 52 includes: a text obtaining unit 521, a text extracting unit 522, and a command forming unit 523;
文本获取单元521设置为将所述语音信息转换为文本信息;The text obtaining unit 521 is configured to convert the voice information into text information;
文本提取单元522设置为从所述文本信息中提取出处理对象关键词和处理方式关键词;The text extracting unit 522 is configured to extract a processing target keyword and a processing mode keyword from the text information;
命令形成单元523设置为将所述处理对象关键词和处理方式关键词组成图像处理命令。The command forming unit 523 is configured to compose the processing target keyword and the processing mode keyword into image processing commands.
可选的,所述图像处理模块53包括:对象识别单元531和执行处理单元532;Optionally, the image processing module 53 includes: an object recognition unit 531 and an execution processing unit 532;
对象识别单元531,设置为根据所述处理对象关键词,从所述目标图像中识别出处理对象;The object recognition unit 531 is configured to identify the processing object from the target image according to the processing target keyword;
执行处理单元532,设置为根据所述处理方式关键词,对所述处理对象执行处理。The execution processing unit 532 is configured to perform processing on the processing target according to the processing mode keyword.
本申请实施方式中,图像处理方法装置包括:语音接收模块51、命令获取模块52和图像处理模块53;分别执行:接收语音信息;对所述语音信息进行识别,得到图像处理命令;根据所述图像处理命令,对目标图像进行图像处理,得到处理后的所述目标图像。因此,在本申请实施方式中,移动终端无需接收用户的手动操作来处理图像,而是仅仅通过接收用户的语音信息,便可实现图像处理的功能,相比现有技术,此过程更加简便,节省了用户时间,提高了操作效率。In the embodiment of the present application, the image processing method apparatus includes: a voice receiving module 51, a command acquiring module 52, and an image processing module 53; respectively performing: receiving voice information; identifying the voice information to obtain an image processing command; The image processing command performs image processing on the target image to obtain the processed target image. Therefore, in the embodiment of the present application, the mobile terminal does not need to receive the manual operation of the user to process the image, but only realizes the function of image processing by receiving the voice information of the user, and the process is simpler than the prior art. It saves user time and improves operational efficiency.
实施方式四Embodiment 4
请参阅图7,图7是本申请实施方式四提供的一种图像处理的装置50,应用于终端设备,包括:语音接收模块51、命令获取模块52和图像处理模块53;Referring to FIG. 7, FIG. 7 is a device 50 for image processing according to Embodiment 4 of the present application, which is applied to a terminal device, and includes: a voice receiving module 51, a command acquiring module 52, and an image processing module 53;
其中,语音接收模块51设置为接收语音信息;The voice receiving module 51 is configured to receive voice information.
命令获取模块52设置为对所述语音信息进行识别,得到图像处理命令;The command obtaining module 52 is configured to identify the voice information to obtain an image processing command;
图像处理模块53设置为根据所述图像处理命令,对目标图像进行图像处理,得到处理后的所述目标图像。The image processing module 53 is configured to perform image processing on the target image according to the image processing command to obtain the processed target image.
可选的,所述命令获取模块52包括:文本获取单元521、文本提取单元522和命令形成单元523;Optionally, the command obtaining module 52 includes: a text obtaining unit 521, a text extracting unit 522, and a command forming unit 523;
文本获取单元521设置为将所述语音信息转换为文本信息;The text obtaining unit 521 is configured to convert the voice information into text information;
文本提取单元522设置为从所述文本信息中提取出处理对象关键词和处理方式关键词;The text extracting unit 522 is configured to extract a processing target keyword and a processing mode keyword from the text information;
命令形成单元523设置为将所述处理对象关键词和处理方式关键词组成图像处理命令。The command forming unit 523 is configured to compose the processing target keyword and the processing mode keyword into image processing commands.
可选的,所述命令获取模块52包括:词语获取单元(图未示)、词语提取单元(图未示)和命令生成单元(图未示);Optionally, the command obtaining module 52 includes: a word obtaining unit (not shown), a word extracting unit (not shown), and a command generating unit (not shown);
词语获取单元,其设置为根据语音信息和预设有关键词语音的语音库,提取出语音信息中与预设有关键词语音的语音库中发音相同的词语,其中,预设有关键词语音的语音库中包含了预设的处理对象关键词语音和处理方式关键词语音;a word obtaining unit, configured to extract, according to the voice information and the voice library pre-set with the keyword voice, words in the voice information that are the same as the voice library in which the keyword voice is pre-set, wherein the keyword voice is pre-set The voice library contains preset processing target keyword speech and processing mode keyword speech;
词语提取单元,其设置为根据所提取出的发音相同的词语,获得处理对象关键词和处理方式关键词;a word extracting unit configured to obtain a processing target keyword and a processing mode keyword according to the extracted words having the same pronunciation;
命令生成单元,其设置为将处理对象关键词和处理方式关键词组成图像处理命令。A command generation unit is configured to compose an image processing command by processing the object keyword and the processing mode keyword.
可选的,所述图像处理模块53包括:对象识别单元531和执行处理单元532;Optionally, the image processing module 53 includes: an object recognition unit 531 and an execution processing unit 532;
对象识别单元531,设置为根据所述处理对象关键词,从所述目标图像中识别出处理对象;The object recognition unit 531 is configured to identify the processing object from the target image according to the processing target keyword;
执行处理单元532,设置为根据所述处理方式关键词,对所述处理对象执行处理。The execution processing unit 532 is configured to perform processing on the processing target according to the processing mode keyword.
可选的,装置50还包括:声音判断模块54,设置为判断所述语音信息中是否只包含一种声音;Optionally, the device 50 further includes: a sound determining module 54 configured to determine whether the voice information includes only one voice;
第一提取模块55,设置为若所述语音信息中只包含一种声音,提取所述语音信息前N位的语音词;The first extraction module 55 is configured to extract a voice word of the first N bits of the voice information if only one voice is included in the voice information;
语音词判断模块56,设置为判断所述语音词是否包含有预设命令词的声音;若是,则进入所述对语音信息进行识别,得到所述图像处理命令的步骤。The phonetic word judging module 56 is configured to determine whether the phonetic word contains a sound of a preset command word; if so, enter the step of recognizing the voice information to obtain the image processing command.
可选的,所述装置50还包括:Optionally, the device 50 further includes:
第二提取模块57,设置为若所述语音信息包含有多种声音,提取各个声音前N位的语音词;The second extraction module 57 is configured to extract a phonetic word of the first N bits of each voice if the voice information includes multiple voices;
声音筛选模块58,设置为获取所述语音词包含有预设命令词的声音;The sound screening module 58 is configured to acquire a sound in which the voice word includes a preset command word;
所述对所述语音信息进行识别,得到所述图像处理命令具体为:The identifying the voice information, and obtaining the image processing command is specifically:
对声音筛选模块获取得到的声音进行识别,得到所述图像处理命令。Identifying the sound obtained by the sound screening module to obtain the image processing command.
本申请实施方式中,图像处理方法装置包括:语音接收模块51、命令获取模块52和图像处理模块53;分别执行:接收语音信息;对所述语音信息进行识别,得到图像处理命令;根据所述图像处理命令,对目标图像进行图像处理,得到处理后的所述目标图像。因此,在本申请实施方式中,移动终端无需接收用户的手动操作来处理图像,而是仅仅通过接收用户的语音信息,便可实现图像处理的功能,相比现有技术,此过程更加简便,节省了用户时间,提高了操作效率。此外,当获取的声音为多个时,还将针对各个声音提取各个声音前N位的语音词,分别执行图像处理,或者根据音量最大的声音,执行图像处理。In the embodiment of the present application, the image processing method apparatus includes: a voice receiving module 51, a command acquiring module 52, and an image processing module 53; respectively performing: receiving voice information; identifying the voice information to obtain an image processing command; The image processing command performs image processing on the target image to obtain the processed target image. Therefore, in the embodiment of the present application, the mobile terminal does not need to receive the manual operation of the user to process the image, but only realizes the function of image processing by receiving the voice information of the user, and the process is simpler than the prior art. It saves user time and improves operational efficiency. Further, when there are a plurality of acquired sounds, the first N-bit speech words of the respective sounds are extracted for each sound, image processing is performed separately, or image processing is performed according to the sound having the highest volume.
请参考图8,图8是本申请实施例提供的执行图像处理的电子设备的硬件结构示意图,如图8所示,该电子设备70包括:Please refer to FIG. 8. FIG. 8 is a schematic diagram showing the hardware structure of an electronic device for performing image processing according to an embodiment of the present disclosure. As shown in FIG. 8, the electronic device 70 includes:
一个或多个处理器71以及存储器72,图7中以一个处理器71为例。One or more processors 71 and a memory 72 are exemplified by a processor 71 in FIG.
处理器71和存储器72可以通过总线或者其他方式连接,图8中以通过总线连接为例。The processor 71 and the memory 72 may be connected by a bus or other means, as exemplified by a bus connection in FIG.
存储器72作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块,如本申请实施例中的图像处理对应的程序指令/模块(例如,附图6所示的语音接收模 块51、命令获取模块52和图像处理模块53)。处理器71通过运行存储在存储器72中的非易失性软件程序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现上述方法实施例图像处理。The memory 72 is used as a non-volatile computer readable storage medium, and can be used for storing non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions corresponding to image processing in the embodiments of the present application. A module (for example, the voice receiving module 51, the command acquiring module 52, and the image processing module 53 shown in FIG. 6). The processor 71 executes various functional applications of the server and data processing by executing non-volatile software programs, instructions, and modules stored in the memory 72, that is, image processing of the above-described method embodiments.
存储器72可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据商品推荐装置的使用所创建的数据等。此外,存储器72可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中,存储器72可选包括相对于处理器71远程设置的存储器,这些远程存储器可以通过网络连接至商品推荐装置。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 72 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to use of the item recommendation device, and the like. Moreover, memory 72 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 72 can optionally include memory remotely located relative to processor 71, which can be connected to the merchandise recommendation device over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
所述一个或者多个模块存储在所述存储器72中,当被所述一个或者多个处理器71执行时,执行上述任意方法实施例中的图像处理,例如,执行以上描述的图1中的方法步骤101至步骤103,图2中的方法步骤1021至步骤1023,图3中的方法步骤1021a至步骤1023a,图4中的方法步骤1031至步骤1032,图5中的方法步骤201至步骤209,实现图6中的模块51-53,单元521-523,单元531-532,图7中的模块51-58,单元521-523,单元531-532的功能。The one or more modules are stored in the memory 72, and when executed by the one or more processors 71, perform image processing in any of the above method embodiments, for example, performing the above described FIG. Method step 101 to step 103, method step 1021 to step 1023 in FIG. 2, method step 1021a to step 1023a in FIG. 3, method step 1031 to step 1032 in FIG. 4, method step 201 to step 209 in FIG. The functions of modules 51-53, units 521-523, 531-532, modules 51-58, 521-523, and units 531-532 in FIG. 6 are implemented.
上述产品可执行本申请实施例所提供的方法,具备执行方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节,可参见本申请实施例所提供的方法。The above products can perform the methods provided by the embodiments of the present application, and have the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiments of the present application.
本申请实施例的电子设备以多种形式存在,包括但不限于:服务器:提供计算服务的设备,服务器的构成包括处理器、硬盘、内存、系统总线等,服务器和通用的计算机架构类似,但是由于需要提供高可靠的服务,因此在处理能力、稳定性、可靠性、安全性、可扩展性、可管理性等方面要求较高。或者,其他具有数据交互功能的电子装置。The electronic device of the embodiment of the present application exists in various forms, including but not limited to: a server: a device that provides a computing service, and the server is configured to include a processor, a hard disk, a memory, a system bus, etc., and the server is similar to a general computer architecture, but Due to the need to provide highly reliable services, it is highly demanded in terms of processing power, stability, reliability, security, scalability, and manageability. Or other electronic devices with data interaction capabilities.
本申请实施例提供了一种非易失性计算机可读存储介质,所述非易失性计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被电子设备执行上述任意方法实施例中的图像处理,例如,执行以上 描述的图1中的方法步骤101至步骤103,图2中的方法步骤1021至步骤1023,图3中的方法步骤1021a至步骤1023a,图4中的方法步骤1031至步骤1032,图5中的方法步骤201至步骤209,实现图6中的模块51-53,单元521-523,单元531-532,图7中的模块51-58,单元521-523,单元531-532的功能。The embodiment of the present application provides a non-transitory computer readable storage medium storing computer-executable instructions that are executed by an electronic device to perform any of the above method embodiments. Image processing in, for example, performing the method steps 101 to 103 in FIG. 1 described above, the method steps 1021 to 1023 in FIG. 2, the method steps 1021a to 1023a in FIG. 3, and the method steps in FIG. 1031 to step 1032, the method steps 201 to 209 in FIG. 5, the modules 51-53, the units 521-523, the units 531-532, the modules 51-58 in FIG. 7, the units 521-523 in FIG. The function of units 531-532.
本申请实施例提供了一种计算机程序产品,包括存储在非易失性计算机可读存储介质上的计算程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时时,使所述计算机执行上述任意方法实施例中的图像处理,例如,执行以上描述的图1中的方法步骤101至步骤103,图2中的方法步骤1021至步骤1023,图3中的方法步骤1021a至步骤1023a,图4中的方法步骤1031至步骤1032,图5中的方法步骤201至步骤209,实现图6中的模块51-53,单元521-523,单元531-532,图7中的模块51-58,单元521-523,单元531-532的功能。An embodiment of the present application provides a computer program product, including a computing program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer, The computer performs image processing in any of the above method embodiments, for example, performing the method steps 101 to 103 in FIG. 1 described above, the method steps 1021 to 1023 in FIG. 2, and the method steps 1021a to 1023a in FIG. Method step 1031 to step 1032 in FIG. 4, method step 201 to step 209 in FIG. 5, implement module 51-53, unit 521-523, unit 531-532, and module 51- in FIG. 58, units 521-523, functions of units 531-532.
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie may be located A place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
通过以上的实施方式的描述,本领域普通技术人员可以清楚地了解到各实施方式可借助软件加通用硬件平台的方式来实现,当然也可以通过硬件。本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。Through the description of the above embodiments, those skilled in the art can clearly understand that the various embodiments can be implemented by means of software plus a general hardware platform, and of course, by hardware. A person skilled in the art can understand that all or part of the process of implementing the above embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a computer readable storage medium. When executed, the flow of an embodiment of the methods as described above may be included. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
以上所述仅为本申请的实施方式,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的 专利保护范围内。The above description is only the embodiment of the present application, and thus does not limit the scope of the patent application, and the equivalent structure or equivalent process transformation of the specification and the drawings of the present application, or directly or indirectly applied to other related technologies. The fields are all included in the scope of patent protection of this application.
工业实用性Industrial applicability
如上所述,本发明实施例提供的一种图像处理的方法、装置、存储介质及电子装置具有以下有益效果:用户无需通过手动操作移动终端来处理图像,而是仅仅通过接收用户的语音信息,便可实现图像处理的功能,相比现有技术,此过程更加简便,节省了用户时间,提高了操作效率。As described above, an image processing method, apparatus, storage medium, and electronic device provided by an embodiment of the present invention have the following beneficial effects: a user does not need to manually process a mobile terminal to process an image, but merely receives a user's voice information. The image processing function can be realized, and the process is simpler than the prior art, saving user time and improving operation efficiency.

Claims (14)

  1. 一种图像处理的方法,应用于终端设备,包括:An image processing method is applied to a terminal device, including:
    接收语音信息;Receiving voice information;
    对所述语音信息进行识别,得到图像处理命令;Identifying the voice information to obtain an image processing command;
    根据所述图像处理命令,对目标图像进行图像处理,得到处理后的所述目标图像。And performing image processing on the target image according to the image processing command to obtain the processed target image.
  2. 根据权利要求1所述的方法,其中,所述对所述语音信息进行识别,得到图像处理命令的步骤包括:The method of claim 1, wherein the step of identifying the voice information to obtain an image processing command comprises:
    将所述语音信息转换为文本信息;Converting the voice information into text information;
    从所述文本信息中提取出处理对象关键词和处理方式关键词;Extracting a processing target keyword and a processing mode keyword from the text information;
    将所述处理对象关键词和处理方式关键词组成图像处理命令。The processing target keyword and the processing mode keyword are combined into an image processing command.
  3. 根据权利要求1所述的方法,其中,所述对所述语音信息进行识别,得到图像处理命令的步骤包括:The method of claim 1, wherein the step of identifying the voice information to obtain an image processing command comprises:
    根据所述语音信息和预设有关键词语音的语音库,提取出所述语音信息中与所述预设有关键词语音的语音库中发音相同的词语,其中,所述预设有关键词语音的语音库中包含了预设的处理对象关键词语音和处理方式关键词语音;And extracting, from the voice information and the voice library pre-set with the keyword voice, a word in the voice information that is the same as the pronunciation in the voice library pre-set with the keyword voice, wherein the pre-set keyword The voice speech library includes preset processing target keyword speech and processing mode keyword speech;
    根据所提取出的所述发音相同的词语,获得处理对象关键词和处理方式关键词;Obtaining a processing target keyword and a processing mode keyword according to the extracted words having the same pronunciation;
    将所述处理对象关键词和处理方式关键词组成图像处理命令。The processing target keyword and the processing mode keyword are combined into an image processing command.
  4. 根据权利要求2或3所述的方法,其中,所述根据所述图像处理命令,对目标图像进行图像处理的步骤包括:The method according to claim 2 or 3, wherein the step of performing image processing on the target image according to the image processing command comprises:
    根据所述处理对象关键词,从所述目标图像中识别出处理对象;Identifying a processing object from the target image according to the processing target keyword;
    根据所述处理方式关键词,对所述处理对象执行处理。Processing is performed on the processing object according to the processing mode keyword.
  5. 根据权利要求1所述的方法,其中,The method of claim 1 wherein
    在所述接收语音信息的步骤之后,所述方法还包括:After the step of receiving voice information, the method further includes:
    判断所述语音信息中是否只包含一种声音;Determining whether the voice information contains only one voice;
    若所述语音信息中只包含一种声音,提取所述语音信息前N位的语 音词;If the voice information includes only one voice, extract the voice words of the first N digits of the voice information;
    判断所述语音词是否包含有预设命令词;Determining whether the phonetic word contains a preset command word;
    若是,则进入所述对语音信息进行识别,得到所述图像处理命令的步骤。If yes, proceed to the step of identifying the voice information to obtain the image processing command.
  6. 根据权利要求5所述的方法,其中,The method of claim 5, wherein
    所述方法还包括:The method further includes:
    若所述语音信息包含有多种声音,提取各个声音前N位的语音词;If the voice information includes multiple voices, extract the first N voice words of each voice;
    获取所述语音词包含有预设命令词的声音;Obtaining the voice word includes a sound of a preset command word;
    所述对所述语音信息进行识别,得到所述图像处理命令具体为:The identifying the voice information, and obtaining the image processing command is specifically:
    对获取得到的声音进行识别,得到所述图像处理命令。The acquired sound is identified to obtain the image processing command.
  7. 一种图像处理的装置,应用于终端设备,包括:An image processing device is applied to a terminal device, including:
    语音接收模块,其设置为接收语音信息;a voice receiving module, configured to receive voice information;
    命令获取模块,其设置为对所述语音信息进行识别,得到图像处理命令;a command acquisition module, configured to identify the voice information to obtain an image processing command;
    图像处理模块,其设置为根据所述图像处理命令,对目标图像进行图像处理,得到处理后的所述目标图像。And an image processing module configured to perform image processing on the target image according to the image processing command to obtain the processed target image.
  8. 根据权利要求7所述的装置,其中,The apparatus according to claim 7, wherein
    所述命令获取模块包括:The command acquisition module includes:
    文本获取单元,其设置为将所述语音信息转换为文本信息;a text acquisition unit configured to convert the voice information into text information;
    文本提取单元,其设置为从所述文本信息中提取出处理对象关键词和处理方式关键词;a text extracting unit configured to extract a processing target keyword and a processing mode keyword from the text information;
    命令形成单元,其设置为将所述处理对象关键词和处理方式关键词组成图像处理命令。A command forming unit is provided to compose the processing target keyword and the processing mode keyword into an image processing command.
  9. 根据权利要求7所述的装置,其中,The apparatus according to claim 7, wherein
    所述命令获取模块包括:The command acquisition module includes:
    词语获取单元,其设置为根据所述语音信息和预设有关键词语音的语音库,提取出所述语音信息中与所述预设有关键词语音的语音库中发音相同的词语,其中,所述预设有关键词语音的语音库中包含了预设的处理对象关键词语音和处理方式关键词语音;a word obtaining unit, configured to extract, according to the voice information and a voice library pre-set with a keyword voice, a word in the voice information that is the same as a voice in the voice library in which the keyword voice is pre-set, wherein The voice library pre-set with keyword speech includes a preset processing object keyword voice and a processing mode keyword voice;
    词语提取单元,其设置为根据所提取出的所述发音相同的词语,获得处理对象关键词和处理方式关键词;a word extracting unit configured to obtain a processing target keyword and a processing mode keyword according to the extracted words having the same pronunciation;
    命令生成单元,其设置为将所述处理对象关键词和处理方式关键词组成图像处理命令。A command generating unit configured to compose the processing target keyword and the processing mode keyword into an image processing command.
  10. 根据权利要求8或9所述的装置,其中,The device according to claim 8 or 9, wherein
    所述图像处理模块包括:The image processing module includes:
    对象识别单元,其设置为根据所述处理对象关键词,从所述目标图像中识别出处理对象;An object recognition unit configured to identify a processing object from the target image according to the processing target keyword;
    执行处理单元,其设置为根据所述处理方式关键词,对所述处理对象执行处理。An execution processing unit is provided to perform processing on the processing object according to the processing mode keyword.
  11. 根据权利要求7所述的装置,其中,所述装置还包括:The apparatus of claim 7 wherein said apparatus further comprises:
    声音判断模块,其设置为判断所述语音信息中是否只包含一种声音;a sound judging module, configured to determine whether the voice information includes only one sound;
    第一提取模块,其设置为若所述语音信息中只包含一种声音,提取所述语音信息前N位的语音词;a first extraction module, configured to extract a voice word of the first N bits of the voice information if the voice information includes only one voice;
    语音词判断模块,其设置为判断所述语音词是否包含有预设命令词;若是,则进入所述对语音信息进行识别,得到所述图像处理命令的步骤。a speech word judging module, configured to determine whether the speech word includes a preset command word; if yes, enter the step of recognizing the speech information to obtain the image processing command.
  12. 根据权利要求11所述的装置,其中,The apparatus according to claim 11, wherein
    所述装置还包括:The device also includes:
    第二提取模块,其设置为若所述语音信息包含有多种声音,提取各个声音前N位的语音词;a second extraction module, configured to extract a voice word of N bits before each voice if the voice information includes multiple voices;
    声音筛选模块,其设置为获取所述语音词包含有预设命令词的声音;a sound screening module, configured to acquire a sound in which the voice word includes a preset command word;
    所述对所述语音信息进行识别,得到所述图像处理命令具体为:The identifying the voice information, and obtaining the image processing command is specifically:
    对声音筛选模块获取得到的声音进行识别,得到所述图像处理命令。Identifying the sound obtained by the sound screening module to obtain the image processing command.
  13. 一种存储介质,其中,所述存储介质中存储有计算机程序,所述计算机程序被设置为运行时执行所述权利要求1至6任一项中所述的 方法。A storage medium, wherein the storage medium stores a computer program, the computer program being arranged to perform the method of any one of claims 1 to 6 at runtime.
  14. 一种电子装置,包括存储器和处理器,其中,所述存储器中存储有计算机程序,所述处理器被设置为通过所述计算机程序执行所述权利要求1至6任一项中所述的方法。An electronic device comprising a memory and a processor, wherein the memory stores a computer program, the processor being arranged to perform the method of any one of claims 1 to 6 by the computer program .
PCT/CN2018/100212 2017-10-19 2018-08-13 Image processing method, device, storage medium and electronic device WO2019076120A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710980039.8A CN107886947A (en) 2017-10-19 2017-10-19 The method and device of a kind of image procossing
CN201710980039.8 2017-10-19

Publications (1)

Publication Number Publication Date
WO2019076120A1 true WO2019076120A1 (en) 2019-04-25

Family

ID=61781978

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/100212 WO2019076120A1 (en) 2017-10-19 2018-08-13 Image processing method, device, storage medium and electronic device

Country Status (2)

Country Link
CN (1) CN107886947A (en)
WO (1) WO2019076120A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110784523A (en) * 2019-10-11 2020-02-11 北京地平线机器人技术研发有限公司 Target object information pushing method and device

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107886947A (en) * 2017-10-19 2018-04-06 珠海格力电器股份有限公司 The method and device of a kind of image procossing
CN111383638A (en) * 2018-12-28 2020-07-07 上海寒武纪信息科技有限公司 Signal processing device, signal processing method and related product
CN109977254A (en) * 2019-04-03 2019-07-05 百度在线网络技术(北京)有限公司 For obtaining the method and device of image
CN112801083B (en) * 2021-01-29 2023-08-08 百度在线网络技术(北京)有限公司 Image recognition method, device, equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100238323A1 (en) * 2009-03-23 2010-09-23 Sony Ericsson Mobile Communications Ab Voice-controlled image editing
US20110071829A1 (en) * 2009-09-18 2011-03-24 Konica Minolta Business Technologies, Inc. Image processing apparatus, speech recognition processing apparatus, control method for speech recognition processing apparatus, and computer-readable storage medium for computer program
CN102930867A (en) * 2011-08-08 2013-02-13 三星电子株式会社 Voice recognition apparatus, voice recognition server, voice recognition system and voice recognition method
CN104584527A (en) * 2012-08-05 2015-04-29 诚研科技股份有限公司 Image capture device and method for image processing by voice recognition
CN105912717A (en) * 2016-04-29 2016-08-31 广东小天才科技有限公司 Image-based information search method and apparatus
CN106156310A (en) * 2016-06-30 2016-11-23 努比亚技术有限公司 A kind of picture processing apparatus and method
CN107886947A (en) * 2017-10-19 2018-04-06 珠海格力电器股份有限公司 The method and device of a kind of image procossing

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6295391B1 (en) * 1998-02-19 2001-09-25 Hewlett-Packard Company Automatic data routing via voice command annotation
TW200733059A (en) * 2006-02-17 2007-09-01 Inventec Appliances Corp Method of using voice recognition measure to input characters and its hand-held apparatus
CN102945671A (en) * 2012-10-31 2013-02-27 四川长虹电器股份有限公司 Voice recognition method
CN103714815A (en) * 2013-12-09 2014-04-09 何永 Voice control method and device thereof
KR101713770B1 (en) * 2015-09-18 2017-03-08 주식회사 베이리스 Voice recognition system and voice recognition method therefor
CN105446146B (en) * 2015-11-19 2019-05-28 深圳创想未来机器人有限公司 Intelligent terminal control method, system and intelligent terminal based on semantic analysis
CN106250747B (en) * 2016-08-01 2021-01-15 联想(北京)有限公司 Information processing method and electronic equipment
CN106157950A (en) * 2016-09-29 2016-11-23 合肥华凌股份有限公司 Speech control system and awakening method, Rouser and household electrical appliances, coprocessor
CN106782563B (en) * 2016-12-28 2020-06-02 上海百芝龙网络科技有限公司 Smart home voice interaction system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100238323A1 (en) * 2009-03-23 2010-09-23 Sony Ericsson Mobile Communications Ab Voice-controlled image editing
US20110071829A1 (en) * 2009-09-18 2011-03-24 Konica Minolta Business Technologies, Inc. Image processing apparatus, speech recognition processing apparatus, control method for speech recognition processing apparatus, and computer-readable storage medium for computer program
CN102930867A (en) * 2011-08-08 2013-02-13 三星电子株式会社 Voice recognition apparatus, voice recognition server, voice recognition system and voice recognition method
CN104584527A (en) * 2012-08-05 2015-04-29 诚研科技股份有限公司 Image capture device and method for image processing by voice recognition
CN105912717A (en) * 2016-04-29 2016-08-31 广东小天才科技有限公司 Image-based information search method and apparatus
CN106156310A (en) * 2016-06-30 2016-11-23 努比亚技术有限公司 A kind of picture processing apparatus and method
CN107886947A (en) * 2017-10-19 2018-04-06 珠海格力电器股份有限公司 The method and device of a kind of image procossing

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110784523A (en) * 2019-10-11 2020-02-11 北京地平线机器人技术研发有限公司 Target object information pushing method and device
CN110784523B (en) * 2019-10-11 2022-08-02 北京地平线机器人技术研发有限公司 Target object information pushing method and device

Also Published As

Publication number Publication date
CN107886947A (en) 2018-04-06

Similar Documents

Publication Publication Date Title
WO2019076120A1 (en) Image processing method, device, storage medium and electronic device
JP6647351B2 (en) Method and apparatus for generating candidate response information
US9424836B2 (en) Privacy-sensitive speech model creation via aggregation of multiple user models
US10270736B2 (en) Account adding method, terminal, server, and computer storage medium
TWI711967B (en) Method, device and equipment for determining broadcast voice
CN103165131A (en) Voice processing system and voice processing method
JP6783339B2 (en) Methods and devices for processing audio
WO2015109971A1 (en) Voice processing method and processing system for smart television, and smart television
WO2021237923A1 (en) Smart dubbing method and apparatus, computer device, and storage medium
WO2020135756A1 (en) Video segment extraction method, apparatus and device, and computer-readable storage medium
WO2023116122A1 (en) Subtitle generation method, electronic device, and computer-readable storage medium
WO2019101099A1 (en) Video program identification method and device, terminal, system, and storage medium
US20160105620A1 (en) Methods, apparatus, and terminal devices of image processing
KR102194194B1 (en) Method, apparatus for blind signal seperating and electronic device
JP6208631B2 (en) Voice document search device, voice document search method and program
CN104932665A (en) Information processing method and electronic device
CN115691503A (en) Voice recognition method and device, electronic equipment and storage medium
JP7113000B2 (en) Method and apparatus for generating images
CN114218428A (en) Audio data clustering method, device, equipment and storage medium
CN113741864A (en) Automatic design method and system of semantic service interface based on natural language processing
CN112578965A (en) Processing method and device and electronic equipment
CN111147905A (en) Media resource searching method, television, storage medium and device
CN114501112B (en) Method, apparatus, device, medium, and article for generating video notes
CN115440198B (en) Method, apparatus, computer device and storage medium for converting mixed audio signal
US10860627B2 (en) Server and method for classifying entities of a query

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18867977

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18867977

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 30.09.2020)