CN112307816A

CN112307816A - In-vehicle image acquisition method and device, electronic equipment and storage medium

Info

Publication number: CN112307816A
Application number: CN201910687586.6A
Authority: CN
Inventors: 孙浚凯
Original assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Current assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date: 2019-07-29
Filing date: 2019-07-29
Publication date: 2021-02-02

Abstract

The embodiment of the disclosure provides a method and a device for obtaining an in-vehicle image, an electronic device and a storage medium, and relates to the technical field of vehicles, wherein the method comprises the following steps: acquiring at least one frame of in-car image acquired by an image acquisition device and in-car sound acquired by a sound pickup device; judging whether a preset condition is met or not based on at least one frame of in-vehicle image and in-vehicle sound information, and if so, controlling an image acquisition device to obtain the current frame of in-vehicle image; the preset conditions include: the emotion of the person in the vehicle is a preset target emotion; the method, the device, the electronic equipment and the storage medium can identify the emotion of the people in the vehicle in a multidimensional and multi-characteristic manner, improve the accuracy and reliability of emotion detection, acquire images based on the emotion of the people in the vehicle, and improve the interest and experience of driving.

Description

In-vehicle image acquisition method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of vehicle technologies, and in particular, to an in-vehicle image obtaining method and apparatus, an electronic device, and a storage medium.

Background

With the development of automobile technology, a shooting device is arranged in an automobile, the scene in the automobile is shot through the shooting device, image data in the automobile is recorded, the shot image data is stored in media such as a hard disk, and the scene in the automobile can be monitored and traced through the stored image data. In the driving process, in order to improve the interest and experience of driving, the scene in the vehicle is shot when the personnel in the vehicle are in the mood of happiness and the like, and the image is left. However, the current photographing apparatus can only start photographing according to an on command input by a user and stop photographing according to a stop command input by the user, and cannot perform photographing based on the emotion of a person in the vehicle.

Disclosure of Invention

The present disclosure is proposed to solve the above technical problems. The embodiment of the disclosure provides an in-vehicle image acquisition method and device, electronic equipment and a storage medium.

According to an aspect of the embodiments of the present disclosure, there is provided an in-vehicle image acquisition method including: acquiring at least one frame of in-car image acquired by an image acquisition device and in-car sound acquired by a sound pickup device; judging whether a preset condition is met or not based on the at least one frame of the in-vehicle image and the in-vehicle sound information, and if so, controlling the image acquisition device to obtain the current frame of the in-vehicle image; wherein the preset conditions include: the emotion of the person in the vehicle is a preset target emotion.

According to another aspect of the embodiments of the present disclosure, there is provided an in-vehicle image acquiring apparatus including: the information acquisition module is used for acquiring at least one frame of in-vehicle image acquired by the image acquisition device and in-vehicle sound acquired by the sound pickup device; the image acquisition module is used for judging whether preset conditions are met or not based on the at least one frame of in-vehicle image and the in-vehicle sound information, and if so, controlling the image acquisition device to obtain the current frame of in-vehicle image; wherein the preset conditions include: the emotion of the person in the vehicle is a preset target emotion.

According to another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the above-mentioned method.

According to another aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions; the processor is used for executing the method.

Based on the in-vehicle image acquisition method and device, the electronic equipment and the storage medium provided by the embodiment of the disclosure, the emotion of people in the vehicle can be identified in a multi-dimensional manner, images are acquired based on the emotion of the people in the vehicle, and the interest and experience of driving are improved.

The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in more detail embodiments of the present disclosure with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure, and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings, like reference numbers generally represent like parts or steps.

FIG. 1 is a flow chart of one embodiment of an in-vehicle image acquisition method of the present disclosure;

FIG. 2 is a flow chart of one embodiment of the present disclosure for determining whether a preset condition is met based on images and sounds;

FIG. 3 is a flow diagram of one embodiment of the present disclosure for identifying emotions of people in a vehicle based on images;

FIG. 4 is a flow diagram of one embodiment of the present disclosure for recognizing emotions of a person in a vehicle based on voice;

FIG. 5 is a flow diagram of one embodiment of determining whether to make a conditional determination based on a number of people in accordance with the present disclosure;

FIG. 6 is a schematic structural diagram illustrating one embodiment of an in-vehicle image capture device according to the present disclosure;

FIG. 7 is a schematic structural diagram of one embodiment of an image acquisition module of the present disclosure;

FIG. 8 is a block diagram of one embodiment of an electronic device of the present disclosure.

Detailed Description

Example embodiments according to the present disclosure will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.

It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.

It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more than two and "at least one" may refer to one, two or more than two.

It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.

In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing the associated object, and means that there may be three kinds of relationships, such as a and/or B, and may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Embodiments of the present disclosure may be implemented in electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with an electronic device, such as a terminal device, computer system, or server, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment. In a distributed cloud computing environment, tasks may be performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

Summary of the application

In the process of implementing the present disclosure, the inventor finds that, in the driving process, the current shooting equipment can only start shooting according to the start instruction input by the user, stop shooting according to the stop instruction input by the user, cannot shoot based on the emotion of the person in the vehicle, and shoot the scene in the vehicle when the person in the vehicle is in the emotion of happiness, etc., and leaves the image.

The method for obtaining the in-vehicle image provided by the disclosure judges whether a preset condition is met or not based on the in-vehicle image and the in-vehicle sound information, if so, the image acquisition device is controlled to obtain the current in-vehicle image, and the preset condition comprises the following steps: the emotion of the person in the vehicle is a preset target emotion; the emotion recognition method can recognize the emotion of people in the vehicle in a multi-dimensional and multi-characteristic mode, and accuracy and reliability of emotion detection are improved.

Exemplary method

Fig. 1 is a flowchart of an embodiment of an in-vehicle image acquisition method of the present disclosure, where the method shown in fig. 1 includes the steps of: s101 and S102. The following describes each step.

S101, acquiring at least one frame of in-vehicle image acquired by the image acquisition device and in-vehicle sound acquired by the sound pickup device.

And if the navigation destination is determined to be a preset target and the distance from the navigation destination is greater than a preset distance threshold, prompting the user whether to start a driving shooting function. The preset target can be a park, a tourist area, a restaurant, a shopping center and the like, the distance threshold can be 5 kilometers, 10 kilometers and the like, whether the driving shooting function is started or not is prompted by a user in a voice mode, or whether the snapshot preauthorization is started or not is displayed on a central control display screen. In one example, the user may not be prompted whether to start the driving shooting function, a driving shooting function button is arranged in the vehicle, and the user directly presses the button to start the driving shooting function.

The image acquisition device can be a camera arranged in the vehicle, and the sound pickup device can be a microphone array of an in-vehicle sound system. And judging whether the driving shooting function is started, if so, obtaining the in-vehicle monitoring image acquired by the camera device and the in-vehicle sound information acquired by the pickup device, and if not, not obtaining the image and the sound.

S102, whether preset conditions are met or not is judged based on at least one frame of the in-vehicle image and the in-vehicle sound information, and if yes, the image acquisition device is controlled to obtain the current frame of the in-vehicle image. The preset conditions comprise that the emotion of people in the vehicle is a preset target emotion, and the target emotion can be happy emotion and the like.

The method and the device can perform face recognition on the in-vehicle image, determine the number of people in the vehicle based on the face recognition result, and judge whether the preset condition is met based on at least one frame of in-vehicle image and in-vehicle sound information when the number of people in the vehicle is larger than 1. If only one person exists in the vehicle, namely only the driver exists in the vehicle, the driver needs to concentrate attention and keep stable emotion during driving, and the driver does not communicate with other people during driving, so that the emotion of the driver does not need to be captured usually. When a plurality of people exist in the vehicle, the plurality of people can communicate with each other in the running process of the vehicle. When interested topics are discussed in the vehicle, or when the vehicle passes through beautiful scenes, music is played in the vehicle, comments and the like, the mood of the people in the vehicle can be happy, the image of the mood of the people in the vehicle can be collected, the beautiful time can be recorded, and the interest and experience of driving are improved.

Fig. 2 is a flowchart illustrating an embodiment of the present disclosure for determining whether a preset condition is satisfied based on images and sounds, and the method illustrated in fig. 2 includes the steps of: s201 to S203. The following steps will be described separately

S201, recognizing the emotion of people in the vehicle based on at least one frame of in-vehicle image to obtain a first emotion recognition result.

S202, recognizing the emotion of the person in the vehicle based on the sound in the vehicle to obtain a second emotion recognition result.

And S203, judging whether the emotion of the person in the vehicle is the target emotion according to the first emotion recognition result and the second emotion recognition result.

In one embodiment, there may be a plurality of methods for determining whether the emotion of the person in the vehicle is the target emotion. The first emotion recognition result includes: a first emotion recognition confidence corresponding to the target emotion; the second emotion recognition result includes: a second emotion recognition confidence corresponding to the target emotion. The first emotion recognition confidence and the second emotion recognition confidence may be probability values for the target emotion.

If the first emotion recognition confidence coefficient and the second emotion recognition confidence coefficient are both larger than a preset first confidence coefficient threshold value, determining the emotion of the person in the vehicle as a target emotion; or if the sum of the product of the image recognition coefficient and the first emotion recognition confidence coefficient and the sum of the product of the sound recognition coefficient and the first emotion recognition confidence coefficient is larger than a second confidence coefficient threshold value, determining the emotion of the person in the vehicle as the target emotion.

For example, the first emotion recognition confidence is 70%, the second emotion recognition confidence is 80%, the first confidence threshold is 65%, and the second confidence threshold is 70%. And if the first emotion recognition confidence coefficient 70% and the second emotion recognition confidence coefficient 80% are determined to be greater than the preset first confidence coefficient threshold value 65%, determining the emotion of the person in the vehicle as the target emotion.

The set image recognition coefficient was 0.6 and the voice recognition coefficient was 0.4. And if the sum of the product of the image recognition coefficient 0.6 and the first emotion recognition confidence coefficient 70% and the product of the sound recognition coefficient 0.4 and the first emotion recognition confidence coefficient 80% is 0.74 and is greater than the second confidence threshold value 70%, determining the emotion of the person in the vehicle as the target emotion.

In one embodiment, the preset conditions further include: the volume decibel value is larger than a preset decibel threshold value. And obtaining a volume decibel value based on the in-vehicle voice information, and if the emotion of the person in the vehicle is determined to be a target emotion, or if the emotion of the person in the vehicle is determined to be the target emotion and the volume decibel value obtained based on the in-vehicle voice information is determined to be greater than a preset decibel threshold value, controlling the image acquisition device to obtain and store a current frame in-vehicle image.

The voice of a person speaking can reflect an emotion. When a person is excited, for example, at happy hours, the speech sounds louder than usual. The judgment of the volume exceeding a threshold value is carried out while the emotion of people in the vehicle is judged to meet the preset condition, and the judgment of the atmosphere in the vehicle is assisted by increasing the volume detection, so that the judgment accuracy and stability can be enhanced.

Fig. 3 is a flowchart of an embodiment of the present disclosure for recognizing emotion of a person in a vehicle based on an image, and the method shown in fig. 3 includes the steps of: s301 to S303. The following describes each step.

S301, obtaining the face images of all the persons in the monitoring images in the vehicle.

S302, determining the emotion recognition confidence of each person according to the face image of each person.

S303, a first emotion recognition result is determined based on the number of persons and the emotion recognition confidence of each person.

There are various methods for recognizing the emotion of the person in the vehicle based on the in-vehicle monitoring image to obtain the first emotion recognition result. And obtaining at least one face image in the in-vehicle monitoring image, inputting the face image into the trained first emotion recognition model, and obtaining an image recognition confidence coefficient which is output by the first emotion recognition model and used for representing emotion corresponding to the face image as target emotion. And obtaining an average image emotion recognition confidence coefficient based on the number of the face images and the image recognition confidence coefficient, and taking the average image emotion recognition confidence coefficient as a first emotion recognition confidence coefficient.

For example, three face images in the in-vehicle monitoring image are obtained, emotion recognition feature information is obtained from the three face images, the emotion recognition feature information is input into a first trained emotion recognition model, and the first emotion recognition model outputs three image recognition confidence levels respectively of 60%, 77%, and 69% for representing the emotion corresponding to the three face images as a target emotion. An average image recognition confidence of 68.7% of the three image recognition confidences of 60%, 77%, and 69% was obtained, and the average image recognition confidence of 68.7% was taken as the first emotion recognition confidence.

The first emotion recognition model may be a neural network model, such as a CNN, RNN network model, or the like. The first emotion recognition model includes an input layer neuron model, a middle layer neuron model, and an output layer neuron model, an output of each layer of neuron model is used as an input of a next layer of neuron model, and the middle layer of neuron model is a full connection layer.

The method comprises the steps of detecting an image in a vehicle in advance through a preset face detection algorithm, detecting the face image, calibrating emotion information corresponding to the face image, wherein the emotion can be high heart, sadness, fear, angry, surprise, disgust or slight. And generating a sample training set based on the face image and the labeled emotion information, and training the neural network model based on the sample training set to obtain a trained first emotion recognition model. At least one face image in the monitoring images in the vehicle is obtained, the face image is input into the trained first emotion recognition model, and the image recognition confidence coefficient of the target emotion output by the first emotion recognition model, namely the probability value of the target emotion, is obtained.

Fig. 4 is a flowchart of an embodiment of the present disclosure for recognizing emotion of a person in a vehicle based on voice, where the method shown in fig. 4 includes the steps of: s401 and S402. The following describes each step.

S401, semantic content and intonation information in the in-vehicle sound information are obtained.

S402, obtaining a second emotion recognition result according to the semantic content and the intonation information.

Semantic content and intonation information in the sound information in the vehicle can be obtained through a voice recognition technology, the semantic content can be text content, and the intonation information comprises volume, speed, tone and the like. And extracting emotion recognition keywords and recognized intonation from semantic content and intonation information respectively. For example, semantic content may be analyzed to determine keywords in which the emotion of the user can be clearly indicated, as emotion recognition keywords; the target intonation whose volume exceeds the maximum threshold and is lower than the minimum threshold can be used as the identified intonation, or the target intonation whose speed exceeds the set threshold can also be used as the identified intonation, etc.

The second emotion recognition model may be a neural network model, such as a CNN, RNN network model, or the like. The second emotion recognition model comprises an input layer neuron model, a middle layer neuron model and an output layer neuron model, wherein the output of each layer of neuron model is used as the input of the next layer of neuron model, and the middle layer of neuron model is a full connection layer.

The method comprises the steps of obtaining voice information in advance, obtaining semantic content and intonation information from the voice information, extracting emotion recognition keywords and intonation from the semantic content and the intonation information, and calibrating the emotion information for the emotion recognition keywords and the intonation, wherein emotion can be high heart, sadness, fear, angry, surprise, disgust or slight lible and the like. And generating a sample training set based on the emotion recognition keywords, the recognition intonations and the labeled emotion information, and training the neural network model based on the sample training set to obtain a trained second emotion recognition model. And processing the sound information in the vehicle to obtain an emotion recognition keyword and a recognition tone, inputting the emotion recognition keyword and the recognition tone into a trained second emotion recognition model, and obtaining a second emotion recognition confidence coefficient output by the second emotion recognition model, namely a probability value of the emotion being a target emotion.

FIG. 5 is a flowchart of one embodiment of determining whether to make a conditional determination based on a number of people according to the present disclosure, the method shown in FIG. 5 comprising the steps of: s501 to S503. The following describes each step.

S501, obtaining the number of people in the first vehicle according to a pressure sensor arranged in the vehicle.

For example, a pressure sensor may be provided under the vehicle interior seat, and the first number of persons in the vehicle may be obtained based on the weight information collected by the pressure sensor.

And S502, obtaining the number of people in the second vehicle according to the at least one frame of the image in the vehicle.

And S503, if the number of people in the first vehicle is matched with the number of people in the second vehicle, judging whether preset conditions are met or not based on the monitoring images and the sound information in the vehicle.

For example, the number of people in the second vehicle is determined to be four by performing face recognition on the in-vehicle image. Four seats can be determined to be occupied by the pressure sensors arranged under each seat, and the four pressures measured and collected by the pressure sensors arranged under the four seats all meet the human body weight interval (the human body weight interval can be 25-150 kilograms), so that the number of people in the first vehicle is determined to be four.

If the number of people in the first vehicle is equal to that in the second vehicle, the accuracy of face recognition is verified, omission of people in the vehicle due to face recognition can be avoided, and happy emotion and the like of all people in the vehicle can be captured. And after the number of people in the first vehicle is determined to be equal to the number of people in the second vehicle, judging whether a preset condition is met or not based on the monitoring image in the vehicle and the sound information in the vehicle.

The number of people in the first vehicle is determined by the pressure sensor, the number of people in the second vehicle is determined by the face recognition technology, emotion recognition is carried out after the two number detection results are matched, bidirectional matching of number detection can be achieved, the number detection accuracy is higher, and the emotion detection effectiveness can be improved.

In one embodiment, the driving shooting function is started in navigation, and if the vehicle flameout time is determined to be less than a preset time threshold and the navigation destination is not changed, the driving shooting function is determined to be continuously effective, and the preset time threshold can be 2 hours and the like. If the preset conditions are met, controlling the camera device to obtain and store the current frame of the in-vehicle image based on the preset snapshot strategy, or obtaining and storing multiple frames of in-vehicle images at preset intervals, wherein the preset interval can be 0.1 second, and 15 or 20 frames of in-vehicle images and the like can be obtained.

If the navigation destination is reached or the navigation is cancelled and the vehicle is in a static state, displaying the stored in-vehicle image and prompting to process the stored current in-vehicle image, reminding can be performed through voice and the like, displaying the stored current in-vehicle image on a display screen interface in the vehicle, and enabling a user to select to store and share.

Exemplary devices

In one embodiment, as shown in fig. 6, the present disclosure provides an in-vehicle image capturing apparatus including: an information acquisition module 601 and an image acquisition module 602. The information acquisition module 601 acquires at least one frame of in-vehicle image acquired by the image acquisition device and in-vehicle sound acquired by the sound pickup device. The image capturing module 602 determines whether a preset condition is satisfied based on at least one frame of in-vehicle image and in-vehicle sound information, if so, the image capturing module 602 controls the image capturing device to obtain the current frame of in-vehicle image, where the preset condition includes: the emotion of the person in the vehicle is a preset target emotion. The image capturing module 602 determines the number of people in the vehicle, and determines whether a preset condition is satisfied based on at least one frame of the image and the sound information in the vehicle when the number of people in the vehicle is greater than 1.

In one embodiment, as shown in fig. 7, the image acquisition module 602 includes: a first emotion recognition unit 6021, a second emotion recognition unit 6022, and a target emotion judgment unit 6023. The first emotion recognition unit 6021 recognizes the emotion of the person in the vehicle based on at least one frame of the in-vehicle image to obtain a first emotion recognition result. The second emotion recognition unit 6022 recognizes the emotion of the person in the vehicle based on the sound in the vehicle, and obtains a second emotion recognition result. The target emotion judgment unit 6023 judges whether the emotion of the person in the vehicle is the target emotion according to the first emotion recognition result and the second emotion recognition result.

The target emotion judgment unit 6023 obtains a volume decibel value based on the in-vehicle voice information, and the preset conditions further include: the volume decibel value is larger than a preset decibel threshold value. The first emotion recognition unit 6021 obtains the face images of the respective persons in the in-vehicle monitoring image, determines the emotion recognition confidence degrees of the respective persons according to the face images of the respective persons, and determines a first emotion recognition result based on the number of the persons and the emotion recognition confidence degrees of the respective persons. The second emotion recognition unit 6022 obtains semantic content and intonation information in the in-vehicle sound information, and obtains a second emotion recognition result according to the semantic content and the intonation information.

The target emotion judgment unit 6023 obtains the number of people in the first vehicle from the pressure sensor arranged in the vehicle, obtains the number of people in the second vehicle from the at least one frame of the in-vehicle image, and judges whether the preset condition is satisfied based on the in-vehicle monitoring image and the in-vehicle sound information if the number of people in the first vehicle is judged to be matched with the number of people in the second vehicle.

Fig. 8 is a block diagram of one embodiment of an electronic device of the present disclosure, as shown in fig. 8, the electronic device 81 includes one or more processors 811 and memory 812.

The processor 811 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capability and/or instruction execution capability, and may control other components in the electronic device 81 to perform desired functions.

Memory 812 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory, for example, may include: random Access Memory (RAM) and/or cache memory (cache), etc. The nonvolatile memory, for example, may include: read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 811 to implement the in-vehicle image acquisition methods of the various embodiments of the present disclosure described above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.

In one example, the electronic device 81 may further include: an input device 813, an output device 814, etc., which are interconnected by a bus system and/or other form of connection mechanism (not shown). The input device 813 may also include, for example, a keyboard, a mouse, and the like. The output device 814 may output various information to the outside. The output devices 814 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.

Of course, for simplicity, only some of the components of the electronic device 81 relevant to the present disclosure are shown in fig. 8, and components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 81 may include any other suitable components, depending on the particular application.

In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the in-vehicle image acquisition method according to various embodiments of the present disclosure described in the "exemplary methods" section above of this specification.

The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform steps in an in-vehicle image acquisition method according to various embodiments of the present disclosure described in the "exemplary methods" section above in this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium may include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.

In the method and the device for acquiring an in-vehicle image, the electronic device, and the storage medium in the embodiments, whether a preset condition is satisfied is determined based on the in-vehicle image and the in-vehicle sound information, and if so, the image acquisition device is controlled to acquire a current in-vehicle image, where the preset condition includes: the emotion of the person in the vehicle is a preset target emotion; the emotion recognition method and the emotion recognition device can recognize the emotion of people in the vehicle in a multi-dimensional and multi-feature mode, accuracy and reliability of emotion detection are improved, images are collected based on the emotion of the people in the vehicle, and interestingness and experience of driving are improved.

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, and systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," comprising, "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects, and the like, will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. An in-vehicle image acquisition method includes:

acquiring at least one frame of in-car image acquired by an image acquisition device and in-car sound acquired by a sound pickup device;

judging whether a preset condition is met or not based on the at least one frame of the in-vehicle image and the in-vehicle sound information, and if so, controlling the image acquisition device to obtain the current frame of the in-vehicle image;

wherein the preset conditions include: the emotion of the person in the vehicle is a preset target emotion.

2. The method of claim 1, further comprising:

determining the number of people in the vehicle;

and when the number of people in the vehicle is more than 1, judging whether a preset condition is met or not based on the at least one frame of the image in the vehicle and the sound information in the vehicle.

3. The method of claim 1, wherein the determining whether a preset condition is satisfied based on the at least one frame of the in-vehicle image and the in-vehicle sound comprises:

recognizing the emotion of people in the vehicle based on the at least one frame of in-vehicle image to obtain a first emotion recognition result;

recognizing the emotion of people in the vehicle based on the sound in the vehicle to obtain a second emotion recognition result;

and judging whether the emotion of the person in the vehicle is the target emotion or not according to the first emotion recognition result and the second emotion recognition result.

4. The method of claim 2, further comprising:

obtaining a volume decibel value based on the voice information in the vehicle;

the preset conditions further include: and the volume decibel value is greater than a preset decibel threshold value.

5. The method of claim 3, wherein the identifying of the emotion of the person in the vehicle based on the in-vehicle monitoring image, and obtaining the first emotion identification result comprises:

acquiring a face image of each person in the in-vehicle monitoring image;

determining emotion recognition confidence of each person according to the face image of each person;

determining the first emotion recognition result based on the number of persons and the emotion recognition confidence of the respective persons.

6. The method of claim 3, wherein the recognizing emotion of the person in the vehicle based on the in-vehicle sound information, and the obtaining of the second emotion recognition result comprises:

obtaining semantic content and intonation information in the in-vehicle sound information;

and obtaining the second emotion recognition result according to the semantic content and the intonation information.

7. The method of claim 1, further comprising:

acquiring the number of people in the first vehicle according to a pressure sensor arranged in the vehicle;

obtaining the number of people in a second vehicle according to the at least one frame of the image in the vehicle;

and if the number of people in the first vehicle is judged to be matched with the number of people in the second vehicle, judging whether the preset condition is met or not based on the monitoring image in the vehicle and the sound information in the vehicle.

8. An in-vehicle image acquisition apparatus comprising:

the information acquisition module is used for acquiring at least one frame of in-vehicle image acquired by the image acquisition device and in-vehicle sound acquired by the sound pickup device;

the image acquisition module is used for judging whether preset conditions are met or not based on the at least one frame of in-vehicle image and the in-vehicle sound information, and if so, controlling the image acquisition device to obtain the current frame of in-vehicle image; wherein the presetting comprises: the emotion of the person in the vehicle is a preset target emotion.

9. A computer-readable storage medium, the storage medium storing a computer program for performing the method of any of the preceding claims 1-7.

10. An electronic device, the electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method of any one of claims 1-7.