CN113873165A

CN113873165A - Photographing method and device and electronic equipment

Info

Publication number: CN113873165A
Application number: CN202111240072.XA
Authority: CN
Inventors: 吴兆君
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2021-12-31

Abstract

The application discloses a photographing method, a photographing device and electronic equipment, wherein the method comprises the following steps: receiving a first input; acquiring target audio data of a photographic subject in response to a first input; and displaying target text information corresponding to the target audio data at a target position of a shooting preview interface.

Description

Photographing method and device and electronic equipment

Technical Field

The application belongs to the technical field of communication, and particularly relates to a photographing method and device and electronic equipment.

Background

At present, as the frequency of taking pictures by using a mobile terminal is higher, users are also used to show their mood with pictures or as life records. Through personalized character editing, the mood to be expressed by the user is displayed, and the mood display or life record can be performed more intuitively.

However, in the existing shooting mode, only fixed characters can be displayed in the shot image, or the characters need to be edited manually by a user, so that the operation is not convenient enough.

Disclosure of Invention

The embodiment of the application aims to provide a photographing method, a photographing device and electronic equipment, and the problem that displayed characters in a photographed image are fixed or characters need to be edited manually by a user can be solved.

In a first aspect, an embodiment of the present application provides a photographing method, including:

receiving a first input;

acquiring target audio data of a photographic subject in response to a first input;

and displaying target text information corresponding to the target audio data at a target position of a shooting preview interface.

In a second aspect, an embodiment of the present application provides a photographing apparatus, including:

the first receiving module is used for receiving a first input;

the first response module is used for responding to the first input and acquiring target audio data of a shooting object;

and the first display module is used for displaying the target text information corresponding to the target audio data at the target position of the shooting preview interface.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.

In an embodiment of the present application, a first input is received; acquiring target audio data of a photographic subject in response to a first input; and displaying the target text information corresponding to the target audio data at the target position of the shooting preview interface, so that the corresponding text information can be updated and displayed in real time according to the words spoken by the shooting object instead of displaying fixed text information, and a user does not need to manually edit corresponding characters, so that the operation is more convenient and faster.

Drawings

Fig. 1 is a schematic flowchart of a photographing method according to an embodiment of the present application;

FIG. 2 is one of display diagrams of a shooting preview interface in an embodiment of the present application;

FIG. 3 is a second schematic diagram of a display of a preview interface in the embodiment of the present application;

FIG. 4 is a third schematic diagram of a display of a preview interface in the embodiment of the present application;

FIG. 5 is a block diagram of a photographing apparatus according to an embodiment of the present application;

FIG. 6 is one of block diagrams of the electronic device of an embodiment of the present invention;

fig. 7 is a second block diagram of the electronic device according to the embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present disclosure.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

The photographing method provided by the embodiment of the present application is described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

As shown in fig. 1, an embodiment of the present application provides a photographing method, including:

step 101: a first input is received.

In this embodiment of the application, the first input may be a first input to the shooting preview interface, as shown in fig. 2, a "bubble" control is displayed in the shooting preview interface, the first input may specifically be an input to the bubble control, and the first input may specifically be a click input.

For example, a user turns on a camera, enters a photographing mode, and selects a bubble mode for photographing, specifically, a photographing mode in which bubbles are added to a photographed image and characters are added to the bubbles.

Step 102: in response to the first input, target audio data of a photographic subject is acquired.

In this step, the speech spoken by the user can be analyzed through a speech recognition algorithm to obtain target audio data of the photographic subject.

For example, if a user wants to take a self-timer photo, the camera is turned on, the bubble function is turned on, and the user speaks a paragraph, and if the weather is really good today, the electronic device recognizes the speech spoken by the user through a voice recognition algorithm.

Step 103: and displaying target text information corresponding to the target audio data at a target position of a shooting preview interface.

Here, the target audio data is converted into text information, and the text information is displayed at a target position.

Alternatively, the target position may be a position within a certain distance from the head of the subject.

Optionally, as shown in fig. 3, target text information corresponding to the target audio data is displayed in a bubble form at a target position of the shooting preview interface.

The photographing method of the embodiment of the application receives a first input; acquiring target audio data of a photographic subject in response to a first input; and displaying the target text information corresponding to the target audio data at the target position of the shooting preview interface, so that the corresponding text information can be updated and displayed in real time according to the words spoken by the shooting object instead of displaying fixed text information, and a user does not need to manually edit corresponding characters, so that the operation is more convenient and faster.

Optionally, after target text information corresponding to the target audio data is displayed at a target position of the shooting preview interface, the method includes:

receiving a shooting input;

and generating a shooting image according to the shooting input, wherein the shooting image comprises an image corresponding to a shooting object and the target text information.

In the embodiment of the present application, the shooting input may be a voice input, a sliding input or a click input of the user to the screen of the electronic device, or an idle operation of the user to the electronic device.

For example, the user clicks a shooting button or takes a picture using voice, or detects a gesture (such as a scissor hand, a palm or a fist) of the user to take a picture, or detects that the overall shooting posture of the user is good to take a picture. Upon receiving the shooting instruction, the bubble is combined with the shooting object to form a shot image.

Optionally, before displaying the target text information corresponding to the target audio data at the target position of the shooting preview interface, the method further includes:

obtaining sound source position information corresponding to the target audio data;

and determining the target position according to the sound source position information and the face recognition information of the shooting object.

In the embodiment of the present application, the sound source position information specifically refers to position information of a currently speaking photographic subject, and here, a correspondence between the text information and the photographic subject is determined according to the audio position information and face recognition information of the photographic subject. When the position of the shot object changes, the text information correspondingly changes, if the shot object leaves the preview picture, the text information can be hidden, and after the shot object reenters the preview picture, the text information can be redisplayed near the corresponding shot object through a face recognition technology.

Optionally, determining the target position according to the sound source position information and the face recognition information of the shooting object includes:

carrying out face recognition on a shooting object in a target shooting area corresponding to the sound source position information to obtain target face recognition information;

in a shooting preview interface, determining a target shooting object corresponding to the target face recognition information; and determining the position, which is located in the preset area range of the target shooting object, in the shooting preview interface as the target position.

For example, as shown in fig. 4, when the user photographs the photographic subject 1 and the photographic subject 2, the photographic subject says: when the weather is really like today, the position information of the photographic subject 1 is identified through audio tracking, the face identification information of the photographic subject is identified based on the position information, and the audio data of the photographic subject 1 is displayed at the position corresponding to the photographic subject 1. And displays the so-called yes of the shooting object 2 at the position corresponding to the shooting object 2. When the positions of the photographic subject 1 and the photographic subject 2 are changed, for example, the two are interchanged, the corresponding text information of the two is also interchanged correspondingly.

Here, in combination with the voice recognition and audio tracking technologies, before the photographing is performed, the information that the user needs to express is recognized through the voice recognition, and meanwhile, according to the audio tracking technology, the user who utters is located, and the information that utters is displayed in the form of bubbles in the vicinity of the user who utters. When a photograph is generated, the bubble information is also incorporated into the photograph. Therefore, under the photographing scene of a plurality of users, the information to be expressed by each user can be displayed in the form of bubbles, and the creativity and convenience of photographing are improved.

Optionally, acquiring target audio data of the photographic subject includes:

and if the audio data re-identification input is received, re-acquiring the audio data of the shooting object, and determining the target audio data of the shooting object according to the re-acquired audio data of the shooting object.

In an embodiment of the application, if the user wants to regenerate the text information and re-inputs the audio data, the electronic device regenerates the text information according to the audio data re-input by the user and replaces the previous text information. Specifically, after the electronic device identifies the target audio data of the user, a prompt interface may be displayed, where the prompt interface is used to prompt the user whether to re-input the audio data. In this way, the user can conveniently update the text information in the shot image.

Alternatively,

in the embodiment of the application, the target text information can be updated in real time according to the audio data input by the user. For example, in fig. 4, the text information corresponding to the photographic subject 2 is "yes", and at this time, the audio data of the photographic subject 2 "where you go and play today" is recognized, and "yes" is replaced with "where you go and play today".

According to the embodiment of the application, the voice recognition and audio tracking technology is combined, before photographing is carried out, the information which needs to be expressed by the user is recognized through the voice recognition, meanwhile, the user which generates the sound is located according to the audio tracking technology, and the information which generates the sound is displayed nearby the user which generates the sound in a bubble mode. When a photograph is generated, the bubble information is also incorporated into the photograph. Therefore, under the photographing scene of a plurality of users, the information to be expressed by each user can be displayed in the form of bubbles, and the photographing convenience is improved.

Optionally, after the target text information corresponding to the target audio data is displayed at the target position of the shooting preview interface, the method further includes:

hiding the target text information under the condition that the target shooting object disappears in the shooting preview interface is detected;

and in the case that the target shooting object is detected to reappear in the shooting preview interface, redisplaying the target text information at the target position.

Here, when the position of the target photographic subject changes, the position corresponding to the target text information changes correspondingly, if the target photographic subject leaves the photographic preview screen, the target text information is hidden, and after the target photographic subject reenters the photographic preview screen, the target text information is redisplayed near the corresponding target photographic subject by a face recognition technology.

Optionally, after determining a position in the shooting preview interface within the preset area range of the target shooting object as the target position, the method further includes:

and updating the target position according to the change of the position of the target shooting object in the shooting preview interface.

That is, when the position of the target photographic subject changes, the position of the corresponding target text information also changes, so that the text information corresponding to the words spoken by the target photographic subject can be ensured to be always displayed at the corresponding position of the target photographic subject.

It should be noted that, in the photographing method provided in the embodiment of the present application, the execution main body may be a photographing device, or a control module used for executing the photographing method in the photographing device. The embodiment of the present application takes a photographing apparatus executing a photographing method as an example, and the photographing apparatus provided in the embodiment of the present application is described.

As shown in fig. 5, an embodiment of the present application further provides a photographing apparatus 500, including:

a first receiving module 501, configured to receive a first input;

a first response module 502 for acquiring target audio data of a photographic subject in response to a first input;

and a first display module 503, configured to display target text information corresponding to the target audio data at a target position of the shooting preview interface.

Optionally, the apparatus according to the embodiment of the present application further includes:

the second receiving module is used for receiving shooting input after the first display module displays target text information corresponding to the target audio data at the target position of the shooting preview interface;

and the generating module is used for generating a shooting image according to the shooting input, wherein the shooting image comprises an image corresponding to a shooting object and the target text information.

the first acquisition module is used for acquiring sound source position information corresponding to the target audio data before the first display module displays the target text information corresponding to the target audio data at the target position of the shooting preview interface;

and the first determining module is used for determining the target position according to the sound source position information and the face recognition information of the shooting object.

Optionally, the first determining module includes:

the first determining submodule is used for carrying out face recognition on a shooting object in a target shooting area corresponding to the sound source position information to obtain target face recognition information;

the second determining submodule is used for determining a target shooting object corresponding to the target face recognition information in a shooting preview interface; and the third determining submodule is used for determining the position, which is located in the preset area range of the target shooting object, in the shooting preview interface as the target position.

Optionally, the first response module is configured to, if an audio data re-recognition input is received, re-acquire audio data of the photographic subject, and determine target audio data of the photographic subject according to the re-acquired audio data of the photographic subject.

the first processing module is used for hiding the target text information when the target shooting object disappears in the shooting preview interface is detected after the first display module displays the target text information corresponding to the target audio data at the target position of the shooting preview interface;

and the second processing module is used for redisplaying the target text information at the target position under the condition that the target shooting object is detected to reappear in the shooting preview interface.

and the first updating module is used for updating the target position according to the change of the position of the target shooting object in the shooting preview interface after the position of the target shooting object in the preset area range of the target shooting object in the shooting preview interface is determined as the target position by the first display module.

The device of the embodiment of the application receives a first input; acquiring target audio data of a photographic subject in response to a first input; and displaying the target text information corresponding to the target audio data at the target position of the shooting preview interface, so that the corresponding text information can be updated and displayed in real time according to the words spoken by the shooting object instead of displaying fixed text information, and a user does not need to manually edit corresponding characters, so that the operation is more convenient and faster.

The photographing device in the embodiment of the present application may be a device, or may be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiments of the present application are not particularly limited.

The photographing device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android operating system (Android), an iOS operating system, or other possible operating systems, which is not specifically limited in the embodiments of the present application.

The photographing device provided in the embodiment of the present application can implement each process implemented by the method embodiments in fig. 1 to 4, and for avoiding repetition, the details are not repeated here

Optionally, as shown in fig. 6, an electronic device 600 is further provided in this embodiment of the present application, and includes a processor 601, a memory 602, and a program or an instruction stored in the memory 602 and executable on the processor 601, where the program or the instruction is executed by the processor 601 to implement each process of the foregoing photographing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

It should be noted that the electronic devices in the embodiments of the present application include the mobile electronic devices and the non-mobile electronic devices described above.

Fig. 7 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 700 includes, but is not limited to: a radio frequency unit 701, a network module 702, an audio output unit 703, an input unit 704, a sensor 705, a display unit 706, a user input unit 707, an interface unit 708, a memory 709, and a processor 710.

Those skilled in the art will appreciate that the electronic device 700 may also include a power supply (e.g., a battery) for powering the various components, and the power supply may be logically coupled to the processor 710 via a power management system, such that the functions of managing charging, discharging, and power consumption may be performed via the power management system. The electronic device structure shown in fig. 7 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.

The user input unit 707 configured to receive a first input; the processor 710 is configured to obtain target audio data of a photographic subject in response to a first input; the processor 710 is configured to display target text information corresponding to the target audio data at a target position of the shooting preview interface through the display unit 706.

Optionally, the input unit 707 is further configured to receive a shooting input; the processor 710 is configured to generate a captured image according to the capture input, where the captured image includes an image corresponding to a capture object and the target text information.

Optionally, the processor 710 is further configured to obtain sound source position information corresponding to the target audio data; and determining the target position according to the sound source position information and the face recognition information of the shooting object.

Optionally, the processor 710 is further configured to perform face recognition on a target shooting object in a shooting area corresponding to the sound source position information to obtain target face recognition information; in a shooting preview interface, determining a target shooting object corresponding to the target face recognition information; and determining the position, which is located in the preset area range of the target shooting object, in the shooting preview interface as the target position.

Optionally, the processor 710 is further configured to, if an audio data re-recognition input is received, re-acquire the audio data of the photographic subject, and determine target audio data of the photographic subject according to the re-acquired audio data of the photographic subject.

Optionally, after the target text information corresponding to the target audio data is displayed at the target position of the shooting preview interface through the display unit 706, the processor 710 is further configured to perform hiding processing on the target text information when it is detected that the target shooting object disappears in the shooting preview interface;

in a case where it is detected that the target photographic subject reappears in the photographic preview interface, the target text information is redisplayed at the target position through the display unit 706.

Optionally, after determining, as the target position, a position in the shooting preview interface within the preset area of the target shooting object, the processor 710 is further configured to: and updating the target position according to the change of the position of the target shooting object in the shooting preview interface.

The electronic equipment receives a first input; acquiring target audio data of a photographic subject in response to a first input; and displaying the target text information corresponding to the target audio data at the target position of the shooting preview interface, so that the corresponding text information can be updated and displayed in real time according to the words spoken by the shooting object instead of displaying fixed text information, and a user does not need to manually edit corresponding characters, so that the operation is more convenient and faster.

It should be understood that in the embodiment of the present application, the input Unit 704 may include a Graphics Processing Unit (GPU) 7041 and a microphone 7042, and the Graphics Processing Unit 7041 processes image data of still pictures or videos obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The display unit 706 may include a display panel 7061, and the display panel 7061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 707 includes a touch panel 7071 and other input devices 7072. The touch panel 7071 is also referred to as a touch screen. The touch panel 7071 may include two parts of a touch detection device and a touch controller. Other input devices 7072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. Memory 709 may be used to store software programs as well as various data, including but not limited to applications and operating systems. Processor 710 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 710.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the foregoing photographing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement each process of the foregoing photographing method embodiment, and can achieve the same technical effect, and the details are not repeated here to avoid repetition.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of taking a picture, comprising:

receiving a first input;

2. The method of claim 1, wherein after displaying the target text information corresponding to the target audio data at the target position of the shooting preview interface, the method comprises:

receiving a shooting input;

3. The method of claim 1, further comprising, before displaying target text information corresponding to the target audio data at a target position of the shooting preview interface:

4. The method of claim 3, wherein determining the target position according to the sound source position information and the face recognition information of the photographic subject comprises:

in a shooting preview interface, determining a target shooting object corresponding to the target face recognition information;

and determining the position, which is located in the preset area range of the target shooting object, in the shooting preview interface as the target position.

5. The method according to claim 1, wherein acquiring target audio data of a photographic subject comprises:

6. The method of claim 4, further comprising, after displaying the target text information corresponding to the target audio data at the target position of the shooting preview interface:

7. The method of claim 4, wherein after determining a position in the shooting preview interface within a preset area of the target shooting object as the target position, the method further comprises:

8. A photographing apparatus, comprising:

the first receiving module is used for receiving a first input;

9. The apparatus of claim 8, comprising:

10. The apparatus of claim 8, further comprising:

11. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions when executed by the processor implementing the steps of the photographing method according to any one of claims 1 to 7.

12. A readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the photographing method according to any one of claims 1 to 7.