CN108710697B

CN108710697B - Method and apparatus for generating information

Info

Publication number: CN108710697B
Application number: CN201810501246.5A
Authority: CN
Inventors: 李财瑜; 曲海龙; 张现伟; 颜滔; 翟宇宏; 孙雅杰; 金良雨
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-05-23
Filing date: 2018-05-23
Publication date: 2020-01-03
Anticipated expiration: 2038-05-23
Also published as: CN108710697A

Abstract

The embodiment of the application discloses a method and a device for generating information. One embodiment of the method comprises: acquiring current input information and a current image corresponding to the current input information; determining whether the current image and a pre-acquired target historical image indicate the same user; in response to the current image indicating the same user as the target history image, generating reply information of the current input information based on the current input information and the history input information, wherein the history input information is information of the corresponding target history image acquired in advance. The embodiment enriches the generation mode of the information.

Description

Method and apparatus for generating information

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for generating information.

Background

During a human-machine conversation, a user may ask the machine to answer some questions. At present, machines with answering and guiding functions are receiving more and more attention and pursuit from many people due to their answering functions. The use of the machine ensures the efficient and orderly operation of service and work to a certain extent. For example, a robot arranged in an airport can reply to a problem which is provided by a user in a relevant scene such as the airport, so that the robot brings convenience to the user and greatly reduces the labor cost.

In general, during the above-mentioned human-computer conversation, the current conversation of the user may be an independent sentence or a sentence related to the previous conversation. Furthermore, sessions of different users often do not have relevance.

Disclosure of Invention

The embodiment of the application provides a method and a device for generating information.

In a first aspect, an embodiment of the present application provides a method for generating information, where the method includes: acquiring current input information and a current image corresponding to the current input information; determining whether the current image and a pre-acquired target historical image indicate the same user; in response to the current image indicating the same user as the target history image, generating reply information of the current input information based on the current input information and the history input information, wherein the history input information is information of the corresponding target history image acquired in advance.

In some embodiments, the current image comprises a current facial image, and the target history image comprises a history facial image; and determining whether the current image and the pre-acquired target history image indicate the same user, including: it is determined whether the current facial image and the historical facial image indicate the same user.

In some embodiments, determining whether the current facial image and the historical facial image indicate the same user comprises: determining similarity between image features of the current face image and image features of the historical face image; in response to the similarity being greater than or equal to a preset similarity threshold, determining that the current facial image and the historical facial image indicate the same user; in response to the similarity being less than the similarity threshold, it is determined that the current facial image and the historical facial image are indicative of different users.

In some embodiments, determining whether the current facial image and the historical facial image indicate the same user comprises: extracting image features of the historical face image, and performing face tracking on an image acquired after the acquisition time of the historical face image based on the image features; in response to a result of the face tracking indicating that the image acquired after the acquisition time includes a face image of a historical user, determining that the current face image and the historical face image indicate the same user, wherein the historical user is the user indicated by the historical face image; in response to a result of the face tracking indicating that the image acquired after the acquisition time does not include a face image of the historical user, determining that the current face image and the historical face image indicate a different user.

In some embodiments, in response to the result of the face tracking indicating that the image acquired after the acquisition time does not include a facial image of the historical user, determining that the current facial image and the historical facial image indicate a different user includes: determining a similarity between image features of the current face image and image features of the historical face image in response to a result of the face tracking indicating that an image acquired after the acquisition time does not include a face image of the historical user; in response to the similarity being less than the similarity threshold, it is determined that the current facial image and the historical facial image are indicative of different users.

In some embodiments, the above method further comprises: in response to the current image indicating a different user than the target history image, generating reply information to the current input information based on the current input information.

In some embodiments, obtaining the current input information, and the current image corresponding to the current input information, comprises: acquiring current input information; and responding to the fact that the current input information comprises the awakening information, and acquiring a current image corresponding to the current input information.

In some embodiments, obtaining the current input information, and the current image corresponding to the current input information, comprises: and responding to the situation that the distance between the user and the target distance sensor is smaller than a preset distance threshold value, and acquiring the current input information and a current image corresponding to the current input information.

In a second aspect, an embodiment of the present application provides an apparatus for generating information, where the apparatus includes: an acquisition unit configured to acquire current input information and a current image corresponding to the current input information; a determination unit configured to determine whether the current image and the target history image acquired in advance indicate the same user; a first generation unit configured to generate reply information of the current input information based on the current input information and history input information in response to the current image indicating the same user as the target history image, wherein the history input information is information of a corresponding target history image acquired in advance.

In some embodiments, the current image comprises a current facial image, and the target history image comprises a history facial image; and a determination unit further configured to determine whether the current face image and the history face image indicate the same user.

In some embodiments, the determination unit is further configured to determine a similarity between an image feature of the current face image and an image feature of the history face image; in response to the similarity being greater than or equal to a preset similarity threshold, determining that the current facial image and the historical facial image indicate the same user; in response to the similarity being less than the similarity threshold, it is determined that the current facial image and the historical facial image are indicative of different users.

In some embodiments, the determination unit is further configured to extract image features of the historical face image, and perform face tracking on an image acquired after an acquisition time of the historical face image based on the image features; in response to a result of the face tracking indicating that the image acquired after the acquisition time includes a face image of a historical user, determining that the current face image and the historical face image indicate the same user, wherein the historical user is the user indicated by the historical face image; in response to a result of the face tracking indicating that the image acquired after the acquisition time does not include a face image of the historical user, determining that the current face image and the historical face image indicate a different user.

In some embodiments, the determination unit is further configured to determine a similarity between an image feature of the current face image and an image feature of the historical face image in response to a result of the face tracking indicating that the image acquired after the acquisition time does not include a face image of the historical user; in response to the similarity being less than the similarity threshold, it is determined that the current facial image and the historical facial image are indicative of different users.

In some embodiments, the above apparatus further comprises: a second generating unit configured to generate reply information to the current input information based on the current input information in response to the current image indicating a different user from the target history image.

In some embodiments, the obtaining unit is further configured to obtain the current input information; and responding to the fact that the current input information comprises the awakening information, and acquiring a current image corresponding to the current input information.

In some embodiments, the obtaining unit is further configured to obtain the current input information and a current image corresponding to the current input information in response to a distance between the user and the target distance sensor being less than a preset distance threshold.

In a third aspect, an embodiment of the present application provides an electronic device for generating information, including: one or more processors; a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement the method of any of the embodiments of the method for generating information as described above.

In a fourth aspect, an embodiment of the present application provides a robot, including: an information acquisition device; an image acquisition device; one or more processors; a storage device, on which one or more programs are stored, when the one or more programs are executed by the one or more processors, so that the one or more processors acquire current input information via the information acquisition device and acquire a current image corresponding to the current input information via an image acquisition device, so as to implement the method according to any one of the embodiments of the method for generating information.

In a fifth aspect, the present application provides a computer-readable medium for generating information, on which a computer program is stored, and when the program is executed by a processor, the computer program implements the method of any one of the embodiments of the method for generating information.

According to the method and the device for generating the information, the current input information and the current image corresponding to the current input information are obtained, whether the current image and the target historical image obtained in advance indicate the same user or not is determined, and finally, the reply information of the current input information is generated on the basis of the current input information and the historical input information in response to the fact that the current image and the target historical image indicate the same user, so that the information generation mode is enriched.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for generating information according to the present application;

FIG. 3 is a schematic illustration of an application scenario of a method for generating information according to the present application;

FIG. 4 is a schematic illustration of yet another application scenario of a method for generating information according to the present application;

FIG. 5 is a flow diagram of yet another embodiment of a method for generating information according to the present application;

FIG. 6 is a schematic block diagram illustrating one embodiment of an apparatus for generating information according to the present application;

FIG. 7 is a schematic block diagram of a computer system suitable for use in implementing an electronic device of an embodiment of the present application;

fig. 8 is an exemplary structural schematic diagram of a robot according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of a method for generating information or an apparatus for generating information of embodiments of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use

terminal devices

101, 102, 103 to interact with server 105 over network 104 to receive or transmit data (e.g., images, user-entered information), etc. In some use cases, the

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as an instant messaging tool, social platform software, and the like; the

terminal devices

101, 102, 103 may also be equipped with an image capturing device, an audio capturing device, a distance sensor, etc.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices that have an image capturing function and/or an audio capturing function and support data (e.g., images, audio, etc.) transmission, including but not limited to, a conversation terminal device (e.g., an airport conversation robot), a smart phone, a tablet computer, a laptop portable computer, a desktop computer, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server that provides various services, such as a voice processing server that processes input information transmitted by the

terminal apparatuses

101, 102, 103. The voice processing server may perform processing such as voice recognition on the received voice, and feed back a processing result (e.g., reply information to the input information) to the terminal device.

It should be noted that, the method for generating information provided in the embodiment of the present application may be executed by the server 105, and accordingly, the apparatus for generating information may be disposed in the server 105; in addition, the method for generating information provided by the embodiment of the present application may also be executed by the

terminal devices

101, 102, and 103, and accordingly, the apparatus for generating information may also be disposed in the

terminal devices

101, 102, and 103.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. The system architecture may not include a network when the electronic device on which the information processing method operates does not need to perform data transmission with other devices.

It should be understood that other additional devices may be added based on the exemplary system architecture of fig. 1, as desired. For example, an image capture device for providing an image acquisition function, an audio capture device for providing an audio acquisition function, a distance sensor for detecting distance, and the like. The additional devices may exist independently of each other or may exist as an integrated device. For example, the additional device may be integrated with the terminal device.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for generating information in accordance with the present application is shown. The method for generating information comprises the following steps:

step 201, obtaining the current input information and the current image corresponding to the current input information.

In this embodiment, an execution main body (for example, a server or a terminal device shown in fig. 1) of the method for generating information may obtain current input information from other electronic devices (for example, an information acquisition device communicatively connected to the execution main body) or locally through a wired connection manner or a wireless connection manner. The execution main body may obtain the current image corresponding to the current input information from other electronic devices (for example, an image capturing device communicatively connected to the execution main body) or locally through a wired connection manner or a wireless connection manner.

Here, the current input information may include, but is not limited to, at least one of: audio information, text information, behavior information (e.g., gesture information), operation trajectory information, and the like. The currently input information may be information input by the user, or may be information acquired by the execution main body. The current image may be a photograph or a video, a whole body image of the user, or a local image (e.g., an eye image, a face image, etc.). The current image corresponding to the current input information may be an image acquired within a preset time length (for example, within 5 minutes, within 3 minutes, etc.) from the acquisition time of the current input information; the image may be acquired at an acquisition time that is the shortest time apart from the acquisition time of the current input information among the acquisition times of the acquired images.

In practice, the execution subject may continuously acquire images, and when it is detected that the acquired images do not include an image of the user (for example, the user leaves), the execution subject may take the image of the user acquired within the target time period as a history image and the information input by the user acquired within the target time period as history input information. The history input information may be information input by the user or information acquired by the execution main body. The above-described target time period is a time period before the detection time at which it is detected this time that the acquired image does not include the image of the user and after the detection time at which it is detected last time that the acquired image does not include the image of the user.

Optionally, the execution main body may further use a previous piece of input information of the current input information as the history input information, use an image of the user acquired between the acquisition time of the current input information and the acquisition time of the history input information as the current image, and use an image of the user acquired between the acquisition time of the history input information and the acquisition time of the previous piece of input information as the history image.

It should be noted that, the execution main body may obtain the current input information first, and then obtain the current image; or the current image can be acquired first and then the current input information can be acquired.

At step 202, it is determined whether the current image and the pre-acquired target history image indicate the same user.

In the present embodiment, based on the current input information obtained in step 201, the execution subject may determine whether the current image and the target history image acquired in advance indicate the same user. The target history image may be an image acquired before the current image. The target history image may be a photo or a video, and may be a whole body image of the user, a local image (e.g., an eye image, a face image, etc.), or an image containing a biological feature of the user, which may include but is not limited to: hand shape, fingerprint, retina, pulse, pinna, gait, etc.

Here, it is assumed that the execution subject is an electronic device for replying to a question posed by a user. The above current image and the target history image are exemplarily explained as follows:

in a first case, after the first user leaves, the second user has a previous dialog with the electronic device (the current second user is in a dialog with the electronic device, and no other user has a dialog with the electronic device before the second user comes after the first user leaves), then the first user is a history user, the input information input by the first user or the information of the first user obtained by the execution main body is history input information, the image of the first user obtained by the execution main body is a target history image, the second user is a current user, the input information input by the second user or the information of the second user obtained by the execution main body is current input information, and the image of the second user obtained by the execution main body is a current image.

In the second case, when the first user returns after leaving (the first user currently returning is in conversation with the electronic device, and no other user has a conversation with the electronic device before returning after leaving), the first user who used the electronic device last time is a history user, the input information input by the first user when the first user used the electronic device last time or the information of the first user acquired by the execution main body is history input information, the image of the first user acquired by the electronic device last time when the first user used the first user is a target history image, the first user after returning is a current user, the input information input by the first user after returning or the information of the first user acquired by the execution main body is current input information, and the image of the first user after returning acquired by the execution main body is a current image.

In a third case, the electronic device may determine whether an account associated with the user exists through face recognition, so as to store related information of the user (e.g., input information of the user, an image of the user, etc.). When a user interacts with the execution main body for the first time, the execution main body can instruct the user to log in an account in a face recognition mode; when the user interacts with the execution main body again, the user does not need to log in, and the execution main body can determine an account number associated with the user in a face recognition mode, so that the user interacts with the user in the interaction process based on the stored related information of the user. Here, the user in the current interaction process is a current user, the information input by the user in the current interaction process or the information of the user acquired by the execution main body is current input information, the image of the user acquired by the execution main body in the current interaction process is a current image, the user in the interaction process before the current interaction is a historical user, the information input by the user in the interaction process before the current interaction or the information of the user acquired by the execution main body is historical input information, and the image of the user acquired by the execution main body in the interaction process before the current interaction is a historical image.

As an example, when the current image and the history image are both eye images of the user, the electronic device may determine whether the current image and the history image indicate the same user by using an iris recognition technique.

As yet another example, when the current image and the historical image are both fingerprint images of a user, the electronic device may employ a fingerprint identification technique to determine whether the current image and the historical image indicate the same user.

In some optional implementations of the present embodiment, the current image comprises a current face image, the target history image comprises a history face image; and determining whether the current image and the pre-acquired target history image indicate the same user, including: it is determined whether the current facial image and the historical facial image indicate the same user. Wherein the current face image is a face image included in the current image, and the history face image is a face image included in the target history image.

Here, the execution subject described above may determine whether the current face image and the history face image indicate the same user in various ways.

As an example, the execution subject described above may determine whether the current face image and the history face image indicate the same user in the following manner:

first, the execution subject described above may determine the similarity between the image features of the current face image and the image features of the history face image. The image features may include, but are not limited to: texture features, skin tone features, contour features, spatial relationship features, and the like. In practice, the electronic device may extract image Features of the current facial image and the historical facial image through algorithms such as a convolutional neural network and speedup Robust Features (SURF). The above similarity calculation methods include, but are not limited to: scale-invariant feature transform (SIFT) algorithm, cosine similarity algorithm, pearson correlation coefficient algorithm, and so on.

Then, if the similarity is greater than or equal to a preset similarity threshold (e.g., 76%, 80%, etc.), the execution subject may determine that the current face image and the historical face image indicate the same user. The execution subject may determine that the current face image and the historical face image indicate different users if the similarity is less than the similarity threshold.

It can be appreciated that through face recognition techniques, the accuracy of determining whether the current facial image and the historical facial image indicate the same user can be improved to some extent.

As another example, the execution subject described above may determine whether the current face image and the historical face image indicate the same user in the following manner:

first, the execution subject described above may extract image features of a history face image. The image features may include, but are not limited to: texture features, skin tone features, contour features, spatial relationship features, and the like. In general, the image features may be features determined by a skilled person to distinguish different persons.

Then, the execution subject may perform face tracking on an image acquired after the acquisition time of the history face image based on the image feature to determine whether the face image of the history user is included in the image acquired after the acquisition time of the history face image. Wherein the historical user is the user indicated by the historical face image.

Finally, if the result of the face tracking indicates that the image acquired after the acquisition time includes a face image of a historical user, the execution subject may determine that the current face image and the historical face image indicate the same user; the execution subject may determine that the current face image and the historical face image indicate different users if the result of the face tracking indicates that the image acquired after the acquisition time does not include the face image of the historical user. The face tracking is a technology for continuously capturing information such as the position and size of a face in a subsequent image on the premise that the face is detected.

It is understood that, with respect to the above-described scheme of determining whether the current face image and the history face image indicate the same user based on the image features, the amount of operation of the above-described execution subject can be reduced to some extent by the face tracking technique.

As a third example, the execution subject described above may also determine whether the current face image and the history face image indicate the same user in the following manner:

first, the execution subject may determine whether or not the face image of the historical user is included in the image acquired after the acquisition time of the historical face image, using the face tracking technique described above. When the result of the face tracking indicates that the image acquired after the acquisition time does not include the face image of the historical user, the execution subject may further determine the similarity between the image feature of the current face image and the image feature of the historical face image in such a manner that the similarity between the image feature of the historical face image and the image feature of the current face image is calculated.

Thereafter, if the similarity is less than the similarity threshold, the executing body may determine that the current face image and the history face image indicate different users.

It is to be understood that, in the case where it is determined that the image acquired after the acquisition time does not include the face image of the historical user, the similarity between the image feature of the current face image and the image feature of the historical face image may be determined, and thus, the operation amount of the executing subject may be reduced while ensuring that the determined result (i.e., whether the current face image and the historical face image indicate the same user) has a certain accuracy.

Step 203, in response to the current image and the target history image indicating the same user, generating reply information of the current input information based on the current input information and the history input information.

In this embodiment, in a case where it is determined that the current image and the target history image indicate the same user, the execution main body described above may generate reply information of the current input information based on the current input information and the history input information. Wherein the history input information is information of a corresponding target history image acquired in advance.

As an implementation manner, when the historical input information and the current input information are character information or audio information, the execution subject may generate reply information of the current input information by performing voice recognition on the current input information and the historical input information through a model for recognizing voice. Wherein, the model may include but is not limited to: acoustic Models (AM), Language Models (LM), and so on.

As another implementation manner, the execution main body may further perform voice recognition and semantic analysis on the current input information and the historical input information, so as to generate reply information of the current input information.

As an example, with continuing reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of a method for generating information according to the present embodiment. In the application scenario of fig. 3, the user has entered the current input information "CA 12345" to the airport robot. Then, the airport robot acquires the current input information "CA 12345", and the current image corresponding to the current input information. Wherein the current image is the face image of the user. Thereafter, the airport robot determines that the current image and the historical image indicate the same user. Finally, the airport robot semantically analyzes the current input information "CA 12345" and the historical input information "which check-in place my flight" to generate a reply information "check-in counter is No. 14" of the current input information.

According to the method provided by the embodiment of the application, the current input information and the current image corresponding to the current input information are obtained, whether the current image and the pre-obtained target historical image indicate the same user is determined, if yes, the reply information of the current input information is generated based on the current input information and the historical input information, so that the information generation mode is enriched, the accuracy of determining whether the current image and the historical image indicate the same user is improved, and the calculation amount of equipment is reduced.

In some optional implementations of the embodiment, in a case where the execution subject determines that the current image and the target history image do not indicate the same user, the execution subject may further generate reply information of the current input information based on the current input information.

As an example, please continue to refer to fig. 4, fig. 4 is yet another schematic diagram of an application scenario of the method for generating information according to the present embodiment. In the application scenario of fig. 4, the user has entered the current input information "CA 12345" to the airport robot. Then, the airport robot acquires the current input information "CA 12345", and the current image corresponding to the current input information. Wherein the current image is the face image of the user. Thereafter, the above-mentioned airport robot determines that the current image and the history image indicate different users. Finally, the airport robot semantically analyzes the current input information "CA 12345", thereby generating the reply information "CA 12345 is a flight from city a to city b with the departure time of …" of the current input information.

Optionally, the executing entity may generate reply information of the current input information by performing voice recognition on the current input information and the historical input information through a model for recognizing voice.

It will be appreciated that, in general, when the current image and the target historical image indicate different users, the reply information may be generated more quickly based on the current input information (without the need for historical input information) than otherwise (e.g., based on the current input information and the historical input information, generating the reply information for the current input information).

In some optional implementation manners of this embodiment, the acquiring the current input information and the current image corresponding to the current input information includes: acquiring current input information; and responding to the fact that the current input information comprises the awakening information, and acquiring a current image corresponding to the current input information. The awakening information may be a word or a word in a word set or a set of words predetermined by a technician; or any word, sentence, symbol, string, etc. The wake-up information may be information for instructing the electronic device to acquire the current image. And after the execution main body acquires the awakening information, the execution main body starts to acquire a current image corresponding to the current input information.

It can be understood that, under the condition that the wake-up information is determined to be included in the current input information, the electronic device acquires the current image corresponding to the current input information, so that frequent acquisition of the image can be avoided to a certain extent, and electric energy is saved.

In some optional implementation manners of this embodiment, the acquiring the current input information and the current image corresponding to the current input information includes: and responding to the situation that the distance between the user and the target distance sensor is smaller than a preset distance threshold value, and acquiring the current input information and a current image corresponding to the current input information. The distance threshold may be a distance value (e.g., 30 cm, 40 cm) predetermined by a technician. The target distance sensor may be a sensor communicatively connected to the execution main body. In practice, the specific location of the target distance sensor can be determined by the skilled person according to actual needs. For example, the target distance sensor may be installed around the actuator body; or may be mounted above the execution body. The target distance sensor may be integrated with the execution body, or may exist independently as a single device. It can be understood that, under the condition that the distance between the user and the target distance sensor is determined to be smaller than the preset distance threshold, the electronic device acquires the current image corresponding to the current input information, so that frequent acquisition of the image can be avoided to a certain extent, and electric energy is saved.

With further reference to fig. 5, a flow 500 of yet another embodiment of a method for generating information is shown. The flow 500 of the method for generating information includes the steps of:

step 501, history input information and history face image corresponding to the history input information are acquired, and then step 502 is executed.

In this embodiment, an execution subject (for example, a server or a terminal device shown in fig. 1) of the method for generating information may acquire the history input information from other electronic devices (for example, an information acquisition device communicatively connected to the execution subject) or locally through a wired connection manner or a wireless connection manner. The execution main body can acquire the historical face image corresponding to the historical input information from other electronic equipment (such as an image acquisition device in communication connection with the execution main body) or locally through a wired connection mode or a wireless connection mode. Wherein the history face image is a face image included in the history image.

Here, the above-mentioned history input information may include, but is not limited to, at least one of: audio information, text information, behavior information (e.g., gesture information), operation trajectory information, and the like. The historical face image may be a photograph or a video. The history face image corresponding to the history input information may be an image acquired within a preset time period (for example, within 5 minutes, within 3 minutes, etc.) from the acquisition time of the history input information; the image may be acquired at an acquisition time that is the shortest time apart from the acquisition time of the history input information among the acquisition times of the acquired images.

It should be noted that, the execution main body may acquire the history input information first, and then acquire the history face image; or the historical face image may be acquired first and then the historical input information may be acquired.

Step 502, the current input information and the current face image corresponding to the current input information are acquired, and then step 503 is executed.

In this embodiment, the execution main body may obtain the current input information from other electronic devices (for example, the information acquisition device communicatively connected to the execution main body) or locally through a wired connection manner or a wireless connection manner. The execution main body may obtain the current face image corresponding to the current input information from other electronic devices (for example, the image capturing device communicatively connected to the execution main body) or locally through a wired connection manner or a wireless connection manner. Wherein the current face image is a face image included in the current image.

Here, the current input information may include, but is not limited to, at least one of: audio information, text information, behavior information (e.g., gesture information), operation trajectory information, and the like. The current face image may be a photograph or a video. The current image corresponding to the current input information may be an image acquired within a preset time length (for example, within 5 minutes, within 3 minutes, etc.) from the acquisition time of the current input information; the image may be acquired at an acquisition time that is the shortest time apart from the acquisition time of the current input information among the acquisition times of the acquired images.

It should be noted that, the executing body may first obtain the current input information, and then obtain the current face image; or the current face image may be acquired first and then the current input information may be acquired.

Step 503 is to extract image features of the historical face image, perform face tracking on an image acquired after the acquisition time of the historical face image based on the image features, and then execute step 504.

In the present embodiment, the execution subject may extract an image feature of the history face image. Then, based on the extracted image features, face tracking is performed on an image acquired after the acquisition time of the history face image. The image features may include, but are not limited to: texture features, skin tone features, contour features, spatial relationship features, and the like. In general, the image features may be features determined by a skilled person to distinguish different persons. The face tracking is a technology for continuously capturing information such as the position and size of a face in a subsequent image on the premise that the face is detected.

Step 504, determine whether the result of the face tracking indicates that the image acquired after the acquisition time includes the face image of the historical user, then if yes, go to step 507, if no, go to step 505.

In the present embodiment, the execution subject described above may determine whether the result of face tracking indicates that the image acquired after the acquisition time includes a face image of a historical user. If the result of the face tracking indicates that the image acquired after the acquisition time includes the face image of the historical user, the execution subject may proceed to step 507; if the result of the face tracking indicates that the image acquired after the acquisition time does not include the face image of the historical user, the execution subject may proceed to step 505.

In step 507, it is determined that the current face image and the history face image indicate the same user, and thereafter, step 508 is performed.

In the present embodiment, the execution subject described above may determine that the current face image and the history face image indicate the same user.

And step 508, generating reply information of the current input information based on the current input information and the historical input information.

In this embodiment, the execution subject may generate reply information of the current input information based on the current input information and the history input information.

As one implementation, when the current input information and the historical input information are voice information, the execution subject may generate reply information of the current input information by performing voice recognition on the current input information and the historical input information through a model for recognizing voice. Wherein, the model may include but is not limited to: acoustic Models (AM), Language Models (LM), and so on.

In step 505, the similarity between the image features of the current face image and the image features of the historical face image is determined, and then step 506 is executed.

In the present embodiment, the execution subject described above may determine the degree of similarity between the image features of the current face image and the image features of the history face image. The image features may include, but are not limited to: texture features, skin tone features, contour features, spatial relationship features, and the like. In practice, the electronic device may extract image Features of the current face image and the historical face image through a convolutional neural network, a Speeded Up Robust Features (SURF), or the like. The above similarity calculation methods include, but are not limited to: scale-invariant feature transform (SIFT) algorithm, cosine similarity algorithm, pearson correlation coefficient algorithm, and the like.

In step 506, whether the similarity is greater than a preset similarity threshold is determined, and if yes, step 507 is executed, and if not, step 509 is executed.

In this embodiment, the executing entity may determine whether the similarity obtained in step 505 is greater than a preset similarity threshold. If the similarity is greater than or equal to the preset similarity threshold, the executing entity may execute step 507; if the similarity is less than or equal to the preset similarity threshold, the executing entity may execute step 509.

In step 509, it is determined that the current face image and the historical face image indicate different users, and then step 510 is performed.

In this embodiment, the execution subject described above may determine that the current face image and the history face image indicate different users.

Step 510, generating reply information of the current input information based on the current input information

In this embodiment, the execution subject may generate reply information of the current input information based on the current input information.

As another implementation manner, the execution main body may further perform voice recognition and semantic analysis on the current input information, so as to generate reply information of the current input information.

As can be seen from fig. 5, compared with the embodiment corresponding to fig. 2, the flow 500 of the method for generating information in the present embodiment highlights a step of determining a similarity between the image feature of the current face image and the image feature of the history face image in the case where the result of the face tracking indicates that the subsequently acquired image does not include the face image of the current user. Therefore, according to the scheme described in this embodiment, when the image acquired after the result of the face tracking determination performed by the execution subject indicates the acquisition time does not include the face image of the historical user, the similarity between the image feature of the current face image and the image feature of the historical face image is determined, so that the calculation amount of the execution subject is reduced on the premise that the determined result (that is, whether the current face image and the historical face image indicate the same user) has certain accuracy, the execution subject is facilitated to more accurately generate the reply information of the input information, and the use experience of the user is improved.

With further reference to fig. 6, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for generating information, which corresponds to the method embodiment shown in fig. 2, and which may include the same or corresponding features as the method embodiment shown in fig. 2, in addition to the features described below. The device can be applied to various electronic equipment.

As shown in fig. 6, the apparatus 600 for generating information of the present embodiment includes: an acquisition unit 601, a determination unit 602, and a first generation unit 603. Wherein, the obtaining unit 601 is configured to obtain the current input information and a current image corresponding to the current input information; the determination unit 602 is configured to determine whether the current image and the target history image acquired in advance indicate the same user; the first generation unit 603 is configured to generate reply information of the current input information based on the current input information and history input information, which is information of the corresponding target history image acquired in advance, in response to the current image indicating the same user as the target history image.

In this embodiment, the obtaining unit 601 of the apparatus 600 for generating information may obtain the current input information from other electronic devices (for example, an information collecting apparatus communicatively connected to the apparatus 600) or locally through a wired connection manner or a wireless connection manner, and then the obtaining unit 601 may obtain the current image corresponding to the current input information from other electronic devices (for example, an image collecting apparatus communicatively connected to the apparatus 600) or locally through a wired connection manner or a wireless connection manner.

In the present embodiment, the above-described determination unit 602 may determine whether the current image and the target history image acquired in advance indicate the same user, based on the current input information and the current image obtained by the acquisition unit 601. The target history image may be an image acquired before the current image. The target history image may be a photograph or a video, a whole body image of the user, or a local image (e.g., an eye image, a face image, or the like).

In this embodiment, in a case where the above-described determining unit 602 determines that the current image and the history image indicate the same user, the above-described first generating unit 603 may generate reply information of the current input information based on the current input information and the history input information. Wherein the history input information is information of a corresponding target history image acquired in advance.

As an implementation manner, when the current input information and the historical input information are voice information, the apparatus 600 may generate reply information of the current input information by performing voice recognition on the current input information and the historical input information through a model for recognizing voice. Wherein, the model may include but is not limited to: acoustic Models (AM), Language Models (LM), and so on.

As another implementation manner, the apparatus 600 may further perform voice recognition and semantic analysis on the current input information and the historical input information, so as to generate reply information of the current input information.

In some optional implementations of the present embodiment, the current image comprises a current face image, the target history image comprises a history face image; and a determination unit further configured to determine whether the current face image and the history face image indicate the same user. Wherein the current face image is a face image included in the current image, and the history face image is a face image included in the target history image.

In some optional implementations of the present embodiment, the determining unit 602 is further configured to determine a similarity between the image feature of the current face image and the image feature of the historical face image; in response to the similarity being greater than or equal to a preset similarity threshold, determining that the current facial image and the historical facial image indicate the same user; in response to the similarity being less than the similarity threshold, it is determined that the current facial image and the historical facial image are indicative of different users. The image features may include, but are not limited to: texture features, skin tone features, contour features, spatial relationship features, and the like. In practice, the electronic device may extract image Features of the current face image and the historical face image through algorithms such as a convolutional neural network and Speeded Up Robust Features (SURF). The above similarity calculation methods include, but are not limited to: scale-invariant feature transform (SIFT) algorithm, cosine similarity algorithm, pearson correlation coefficient algorithm, and so on.

It can be understood that, by the face recognition technology, the accuracy of determining whether the current face image and the historical face image indicate the same user can be ensured to some extent.

In some optional implementations of the present embodiment, the determining unit 602 is further configured to extract image features of the historical face image, and perform face tracking on an image acquired after an acquisition time of the historical face image based on the image features; in response to a result of the face tracking indicating that the image acquired after the acquisition time includes a face image of a historical user, determining that the current face image and the historical face image indicate the same user, wherein the historical user is the user indicated by the historical face image; in response to a result of the face tracking indicating that the image acquired after the acquisition time does not include a face image of the historical user, determining that the current face image and the historical face image indicate a different user. The image features may include, but are not limited to: texture features, skin tone features, contour features, spatial relationship features, and the like. In general, the image features may be features determined by a skilled person to distinguish different persons.

It is understood that the operation amount of the apparatus 600 can be reduced to some extent by the face tracking technology, compared to the above-described scheme of determining whether the current face image and the historical face image indicate the same user based on the image features.

In some optional implementations of the present embodiment, the determining unit 602 is further configured to determine a similarity between an image feature of the current face image and an image feature of the historical face image in response to the result of the face tracking indicating that the image acquired after the acquisition time does not include the face image of the historical user; in response to the similarity being greater than or equal to a preset similarity threshold, determining that the current facial image and the historical facial image indicate the same user; in response to the similarity being less than the similarity threshold, it is determined that the current facial image and the historical facial image are indicative of different users.

In some optional implementations of the embodiment, the apparatus further includes a second generating unit (not shown in the figure) configured to generate reply information of the current input information based on the current input information in response to the current image indicating a different user from the target history image.

In some optional implementations of this embodiment, the obtaining unit is further configured to obtain the current input information; and responding to the fact that the current input information comprises the awakening information, and acquiring a current image corresponding to the current input information. The awakening information may be a word or a word in a word set or a set of words predetermined by a technician; or any word, sentence, symbol, string, etc. The wake-up information may be information for instructing the electronic device to acquire the current image. After the device 600 acquires the wake-up information, the device 600 starts to acquire a current image corresponding to the current input information.

In some optional implementations of the embodiment, the obtaining unit is further configured to obtain the current input information and a current image corresponding to the current input information in response to a distance between the user and the target distance sensor being less than a preset distance threshold. The distance threshold may be a distance value (e.g., 30 cm, 40 cm) predetermined by a technician. The target distance sensor may be a sensor communicatively coupled to the device 600. In practice, the specific location of the target distance sensor can be determined by the skilled person according to actual needs. For example, the target distance sensors may be installed around the device 600; or may be mounted above the apparatus 600, etc. The target distance sensor may be integrated into the apparatus 600 or may exist independently as a single device. It can be understood that, under the condition that the distance between the user and the target distance sensor is determined to be smaller than the preset distance threshold, the electronic device acquires the current image corresponding to the current input information, so that frequent acquisition of the image can be avoided to a certain extent, and electric energy is saved.

The apparatus according to the above embodiment of the present application acquires the current input information and the current image corresponding to the current input information through the acquisition unit 601, then the determination unit 602 determines whether the current image and the target history image acquired in advance indicate the same user, and finally the first generation unit 603 generates the reply information of the current input information based on the current input information and the history input information in response to the current image and the target history image indicating the same user, thereby enriching the generation manner of information.

Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use in implementing a server according to embodiments of the present application. The server shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program, when executed by a Central Processing Unit (CPU)701, performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a determination unit, and a generation unit. The names of these units do not in some cases constitute a limitation on the unit itself, and for example, the acquisition unit may also be described as a "unit that acquires current input information and a current image corresponding to the current input information".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring current input information and a current image corresponding to the current input information; determining whether the current image and a pre-acquired target historical image indicate the same user; in response to the current image indicating the same user as the target history image, generating reply information of the current input information based on the current input information and the history input information, wherein the history input information is information of the corresponding target history image acquired in advance.

As an example, the electronic device may be a robot. Referring to fig. 8, an exemplary structural diagram of a robot according to an embodiment of the present application is shown. The robot may include: the information acquisition means 801 is configured such that the acquisition means acquires current input information; the image capture device 802 is configured to obtain a current image corresponding to current input information; one or more processors 803; the storage 804 has stored thereon one or more programs that, when executed by the robot, cause the robot to: acquiring current input information and a current image corresponding to the current input information; determining whether the current image and a pre-acquired target historical image indicate the same user; in response to the current image indicating the same user as the target history image, generating reply information of the current input information based on the current input information and the history input information, wherein the history input information is information of the corresponding target history image acquired in advance.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for generating information, comprising:

acquiring current input information and a current image corresponding to the current input information;

determining whether the current image and a pre-acquired target historical image indicate the same user;

responding to the fact that the current image and the target historical image indicate the same user, and generating reply information of the current input information based on the current input information and historical input information, wherein the historical input information is information which is obtained in advance and corresponds to the target historical image;

the current image is an image of a current user, and the target historical image is an image of a previous user of the current user; and

the current input information is information input by a current user, and the historical input information is information input by a previous user of the current user;

wherein the method further comprises:

in response to the current image and the target historical image indicating different users, generating reply information to the current input information based on the current input information.

2. The method of claim 1, wherein the current image comprises a current facial image, the target history image comprises a history facial image; and

the determining whether the current image and the pre-acquired target history image indicate the same user includes:

determining whether the current facial image and the historical facial image indicate the same user.

3. The method of claim 2, wherein the determining whether the current facial image and the historical facial image indicate a same user comprises:

determining a similarity between image features of the current face image and image features of the historical face image;

in response to the similarity being greater than or equal to a preset similarity threshold, determining that the current facial image and the historical facial image indicate the same user;

in response to the similarity being less than the similarity threshold, determining that the current facial image and the historical facial image are indicative of different users.

4. The method of claim 2, wherein the determining whether the current facial image and the historical facial image indicate a same user comprises:

extracting image features of the historical face image, and performing face tracking on an image acquired after the acquisition time of the historical face image based on the image features;

in response to a result of the face tracking indicating that an image acquired after the acquisition time includes a historical user's facial image, determining that the current facial image and the historical facial image indicate the same user, wherein the historical user is the user indicated by the historical facial image;

in response to a result of the face tracking indicating that an image acquired after the acquisition time does not include a face image of a historical user, determining that the current face image and the historical face image indicate a different user.

5. The method of claim 4, wherein the determining that the current facial image and the historical facial image indicate different users in response to the result of the face tracking indicating that the images acquired after the acquisition time do not include facial images of historical users comprises:

in response to a result of the face tracking indicating that an image acquired after the acquisition time does not include a face image of a historical user, determining a similarity between image features of the current face image and image features of the historical face image;

in response to the similarity being less than a similarity threshold, determining that the current facial image and the historical facial image are indicative of different users.

6. The method of one of claims 1-5, wherein said obtaining current input information, and a current image corresponding to said current input information, comprises:

acquiring current input information;

and responding to the fact that the current input information comprises awakening information, and acquiring a current image corresponding to the current input information.

7. The method of one of claims 1-5, wherein said obtaining current input information, and a current image corresponding to said current input information, comprises:

and responding to the situation that the distance between the user and the target distance sensor is smaller than a preset distance threshold value, and acquiring current input information and a current image corresponding to the current input information.

8. An apparatus for generating information, comprising:

an acquisition unit configured to acquire current input information and a current image corresponding to the current input information;

a determination unit configured to determine whether the current image and a target history image acquired in advance indicate the same user;

a first generation unit configured to generate reply information of the current input information based on the current input information and history input information in response to the current image and the target history image indicating the same user, wherein the history input information is information acquired in advance corresponding to the target history image;

wherein the apparatus further comprises:

a second generating unit configured to generate reply information to the current input information based on the current input information in response to the current image indicating a different user from the target history image.

9. The apparatus of claim 8, wherein the current image comprises a current facial image, the target history image comprises a history facial image; and

the determination unit is further configured to determine whether the current face image and the history face image indicate the same user.

10. The apparatus of claim 9, wherein the determining unit is further configured to:

11. The apparatus of claim 9, wherein the determining unit is further configured to:

12. The apparatus of claim 11, wherein the determining unit is further configured to:

13. The apparatus according to one of claims 8-12, wherein the obtaining unit is further configured to:

acquiring current input information;

14. The apparatus according to one of claims 8-12, wherein the obtaining unit is further configured to:

15. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

16. A robot, comprising:

an information acquisition device configured to acquire current input information;

an image acquisition device configured to acquire a current image corresponding to the current input information;

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to obtain current input information via the information obtaining device and obtain a current image corresponding to the current input information via the image acquisition device to implement the method of any one of claims 1-7.

17. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-7.