JP7160454B2

JP7160454B2 - Method, apparatus and system, electronic device, computer readable storage medium and computer program for outputting information

Info

Publication number: JP7160454B2
Application number: JP2021053715A
Authority: JP
Inventors: ションヨンズオ; イーバオヤン
Original assignee: Apollo Intelligent Connectivity Beijing Technology Co Ltd
Current assignee: Apollo Intelligent Connectivity Beijing Technology Co Ltd
Priority date: 2020-05-29
Filing date: 2021-03-26
Publication date: 2022-10-25
Anticipated expiration: 2041-03-26
Also published as: CN111489522A; JP2021182131A; KR20210042860A

Description

本開示の実施例は、自動車用ネットワーク技術の分野に関し、具体的には情報を出力するための方法、装置及びシステム、電子デバイス、コンピュータ可読記憶媒体及びコンピュータプログラムに関する。 TECHNICAL FIELD Embodiments of the present disclosure relate to the field of automotive network technology, and specifically to methods, apparatus and systems, electronic devices, computer-readable storage media and computer programs for outputting information.

現在、オンライン配車の安全事故は頻発し、オンライン配車の基数は大きくて、違法事故は予兆なしに発生し、オンライン配車のプラットフォームの安全技術手段は限られ、重大な遅延性があり、事故の原因となり、被害者が警戒して警察に通報しても、通報の受信が遅れ、対応が難しいという問題がある。 At present, online dispatch safety accidents are frequent, the number of online dispatch is large, illegal accidents occur without warning, the safety technical measures of online dispatch platforms are limited, and there are serious delays, causing accidents. As a result, even if the victim is cautious and reports to the police, there is a problem that the reception of the report is delayed and it is difficult to respond.

現在、オンライン配車のプラットフォームの事故異常処理メカニズムとは、被害者が事前に脅威を感じたときに、携帯のＡＰＰ側の緊急連絡先や遭難信号を通じて、電話を接続するか、連絡先プラットフォームにメッセージを送信することである。このメカニズムに基づいて、違法行為や犯罪行為が発生した場合、被害者は当時の環境に制約され、携帯電話を正常に使用して救助や警察への通報を行うことができない一方、緊急連絡先が救助電話や情報を受信した後、警察に通報し、且つオンライン予約プラットフォームに連絡して、事故車両の位置情報を取得するには、治安出動を容易にするためにプラットフォームが事故車両の位置と車両の特徴を提供する必要がある。 At present, the accident abnormality handling mechanism of the online dispatch platform is that when the victim feels threatened in advance, he/she can connect the phone or send a message to the contact platform through the emergency contact or distress call on the mobile APP side. is to send Based on this mechanism, when an illegal or criminal act occurs, the victim is constrained by the environment at the time and cannot normally use the mobile phone to rescue or call the police, while the emergency contact After receiving the rescue call and information, contact the police, and contact the online reservation platform to obtain the location information of the accident vehicle. Vehicle characteristics should be provided.

既存の車載インテリジェントアラームのプランでは、車載システムのカメラをオンにすることにより、前列の人物画像情報を捉え、画像分析によって隠れた危険シーンを検出することにより、インテリジェントアラームの対策を実現する。 In the existing in-vehicle intelligent alarm plan, by turning on the camera of the in-vehicle system, the image information of the person in the front row is captured, and the hidden dangerous scene is detected by image analysis to realize intelligent alarm countermeasures.

しかしながら、既存のインテリジェントアラームシステムは、トリップコンピュータカメラに依存し、まずトリップコンピュータカメラをオンにしてから、人物画像情報を捉えることができるが、危険シーンでは、直ちにカメラをオンにすることは不可能である。カメラをオンにした後、トリップコンピュータのインターフェース全体がカメラ画面になり、察知されやすく、オフにされやすくなり、カメラをオフにした後、アラームシステムは、すぐに無効になる。 However, the existing intelligent alarm system relies on the trip computer camera, which can first turn on the trip computer camera and then capture the person image information, but in the danger scene, it is impossible to turn on the camera immediately. is. After turning on the camera, the entire interface of the trip computer becomes the camera screen, easy to be detected and turned off, and after turning off the camera, the alarm system is immediately disabled.

本開示の実施例は、情報を出力するための方法、装置及びシステム、電子デバイス、コンピュータ可読記憶媒体及びコンピュータプログラムを提案する。 Embodiments of the present disclosure propose methods, apparatus and systems, electronic devices, computer-readable storage media and computer programs for outputting information.

第１態様において、本開示の実施例は、車内の異なる位置にある少なくとも２つのオーディオ収集装置によって、それぞれオーディオデータを収集することと、収集された少なくとも２ウェイのオーディオデータに対してエコー除去を行うことと、エコー除去された後のデータをそれぞれ少なくとも２つの音声認識エンジンに入力し、音声認識を行うことにより、少なくとも２人のユーザのテキスト情報を得ることと、サービス側が予めトレーニングされた深層学習モデルを通じてテキスト情報の分析を行うために、少なくとも２つのオーディオ収集装置の位置及び対応する少なくとも２人のユーザのテキスト情報をサービス側にアップロードし、分析結果がアラーム条件を満たした場合、アラーム情報を出力することと、を含む情報を出力するための方法に関する。 In a first aspect, embodiments of the present disclosure include collecting audio data by at least two audio collecting devices at different locations in the vehicle, respectively, and performing echo cancellation on the collected at least two-way audio data. obtaining text information of at least two users by inputting the echo-removed data into at least two speech recognition engines, respectively, and performing speech recognition; Upload the location of at least two audio collecting devices and the corresponding text information of at least two users to the service side to analyze the text information through the learning model, and if the analysis result meets the alarm conditions, the alarm information. and a method for outputting information including.

いくつかの実施例において、当該方法は、エコー除去された後のデータをそれぞれ少なくとも２つの感情認識エンジンに入力し、感情認識を行うことにより、少なくとも２人のユーザの感情情報を得ることと、サービス側が予めトレーニングされた深層学習モデルを通じてテキスト情報および感情の分析を行うために、少なくとも２つのオーディオ収集装置の位置、対応する少なくとも２人のユーザのテキスト情報、及び２人のユーザの感情情報をサービス側にアップロードし、分析結果がアラーム条件を満たした場合、アラーム情報を出力することと、をさらに含む。 In some embodiments, the method includes inputting the echo-cancelled data into at least two emotion recognition engines, respectively, and performing emotion recognition to obtain emotion information of at least two users; The location of at least two audio collecting devices, the corresponding text information of at least two users, and the emotion information of two users for the service side to analyze the text information and sentiment through a pre-trained deep learning model. uploading to the service side, and outputting alarm information if the analysis result satisfies alarm conditions.

第２態様において、本開示の実施例は、車両からアップロードされた少なくとも２つのオーディオ収集装置の位置及び対応する少なくとも２人のユーザのテキスト情報を受信することと、少なくとも２つのオーディオ収集装置の位置に基づいて、２人のユーザのテキスト情報を会話ストリームに構成することと、会話ストリームを予めトレーニングされた深層学習モデルに入力することにより、会話が異常である確率を得ることと、確率が所定の第１閾値より高い場合、アラームをトリガすることと、を含む情報を出力するための方法に関する。 In a second aspect, embodiments of the present disclosure include receiving at least two audio gathering device locations and corresponding textual information of at least two users uploaded from a vehicle; , obtaining the probability that the conversation is abnormal by composing the textual information of the two users into a conversation stream, inputting the conversation stream into a pre-trained deep learning model, and obtaining the probability that the conversation is anomalous; above a first threshold of, triggering an alarm.

いくつかの実施例において、当該方法は、車両の属性情報および位置情報を取得することと、位置情報に基づいて、車両に最も近い交通警察を検索することと、車両の属性情報および位置情報を交通警察に送信することと、をさらに含む。 In some embodiments, the method comprises: obtaining attribute information and location information of the vehicle; searching for the closest traffic police to the vehicle based on the location information; transmitting to the traffic police;

いくつかの実施例において、当該方法は、少なくとも２人のユーザの感情情報を受信することと、会話が異常である確率が所定の第２閾値未満であり、乗客の感情情報に恐怖感が含まれる場合、アラームをトリガすることと、をさらに含む。 In some embodiments, the method includes receiving the emotional information of at least two users, the probability that the conversation is abnormal is less than a predetermined second threshold, and the passenger's emotional information includes a feeling of fear. and triggering an alarm if so.

いくつかの実施例において、当該方法は、車両のオーディオ収集装置の回路接続の切断が検出されたことに応答して、アラームをトリガすることをさらに含む。 In some embodiments, the method further includes triggering an alarm in response to detecting a loss of circuit connection in the vehicle's audio collection device.

いくつかの実施例において、当該方法は、乗客が回答するためのテスト問題を車両に定期的に送信することと、所定の時間内に深層学習モデルによって正常であると判定された応答情報を受信しない場合、アラームをトリガすることと、をさらに含む。 In some embodiments, the method includes periodically sending test questions to the vehicle for passengers to answer and receiving response information determined to be successful by the deep learning model within a predetermined time. if not, triggering an alarm.

第３態様において、本開示の実施例は、車内の異なる位置にある少なくとも２つのオーディオ収集装置によって、それぞれオーディオデータを収集するように配置されたオーディオ収集ユニットと、収集された少なくとも２ウェイのオーディオデータに対してエコー除去を行うように配置されたエコー除去ユニットと、エコー除去された後のデータをそれぞれ少なくとも２つの音声認識エンジンに入力し、音声認識を行うことにより、少なくとも２人のユーザのテキスト情報を得るように配置された音声認識ユニットと、サービス側が予めトレーニングされた深層学習モデルを通じてテキスト情報の分析を行うために、少なくとも２つのオーディオ収集装置の位置及び対応する少なくとも２人のユーザのテキスト情報をサービス側にアップロードし、分析結果がアラーム条件を満たした場合、アラーム情報を出力するように配置された情報アップロードユニットと、を含む情報を出力するための装置に関する。 In a third aspect, embodiments of the present disclosure provide an audio collection unit arranged to collect audio data respectively by at least two audio collection devices at different locations in the vehicle; an echo canceling unit arranged to perform echo canceling on data; A voice recognition unit arranged to obtain text information, and at least two audio collection device locations and corresponding at least two users for the service side to perform analysis of the text information through pre-trained deep learning models. an information uploading unit arranged to upload text information to a service side and output alarm information if the analysis result satisfies an alarm condition.

いくつかの実施例において、当該装置は、エコー除去された後のデータをそれぞれ少なくとも２つの感情認識エンジンに入力し、感情認識を行うことにより、少なくとも２人のユーザの感情情報を得るように配置された感情認識ユニットをさらに含み、情報アップロードユニットは、さらに、サービス側が予めトレーニングされた深層学習モデルを通じてテキスト情報および感情の分析を行うために、少なくとも２つのオーディオ収集装置の位置、対応する少なくとも２人のユーザのテキスト情報、及び２人のユーザの感情情報をサービス側にアップロードし、分析結果がアラーム条件を満たした場合、アラーム情報を出力するように配置される。 In some embodiments, the apparatus is arranged to obtain the emotional information of at least two users by inputting the echo-cancelled data into at least two emotion recognition engines, respectively, and performing emotion recognition. and the information uploading unit further comprises a position of at least two audio collecting devices, corresponding at least two audio collecting devices, for the service side to perform analysis of text information and emotion through pre-trained deep learning models. It is arranged to upload the text information of one user and the emotion information of two users to the service side, and output the alarm information when the analysis result meets the alarm conditions.

第４態様において、本開示の実施例は、車両からアップロードされた少なくとも２つのオーディオ収集装置の位置及び対応する少なくとも２人のユーザのテキスト情報を受信するように配置された受信ユニットと、少なくとも２つのオーディオ収集装置の位置に基づいて、２人のユーザのテキスト情報を会話ストリームに構成するように配置されたテキストスティッチングユニットと、会話ストリームを予めトレーニングされた深層学習モデルに入力することにより、会話が異常である確率を得るように配置された会話認識ユニットと、確率が所定の第１閾値より高い場合、アラームをトリガするように配置されたアラームユニットと、を含む情報を出力するための装置に関する。 In a fourth aspect, embodiments of the present disclosure provide a receiving unit arranged to receive at least two audio collection device locations and corresponding at least two user textual information uploaded from a vehicle; A text stitching unit arranged to compose the textual information of the two users into a dialogue stream based on the positions of the two audio collection devices, and by inputting the dialogue stream into a pre-trained deep learning model, for outputting information including a speech recognition unit arranged to obtain a probability that the speech is abnormal and an alarm unit arranged to trigger an alarm if the probability is higher than a predetermined first threshold; Regarding the device.

いくつかの実施例において、アラームユニットは、さらに、車両の属性情報および位置情報を取得し、位置情報に基づいて、車両に最も近い交通警察を検索し、車両の属性情報および位置情報を交通警察に送信する、ように配置される。 In some embodiments, the alarm unit further obtains the attribute information and location information of the vehicle, searches for the closest traffic police to the vehicle based on the location information, and sends the attribute information and location information of the vehicle to the traffic police. to send to, arranged as follows.

いくつかの実施例において、受信ユニットは、さらに、少なくとも２人のユーザの感情情報を受信するように配置され、アラームユニットは、さらに、会話が異常である確率が所定の第２閾値未満であり、乗客の感情情報に恐怖感が含まれる場合、アラームをトリガするように配置される。 In some embodiments, the receiving unit is further arranged to receive the emotional information of at least two users, and the alarm unit is further configured that the probability that the speech is abnormal is less than a predetermined second threshold. , is arranged to trigger an alarm if the passenger's emotional information includes a sense of fear.

いくつかの実施例において、アラームユニットは、さらに、車両のオーディオ収集装置の回路接続の切断が検出されたことに応答して、アラームをトリガするように配置される。 In some embodiments, the alarm unit is further arranged to trigger an alarm in response to detecting a loss of circuit connection in the vehicle's audio collection device.

いくつかの実施例において、アラームユニットは、さらに、乗客が回答するためのテスト問題を車両に定期的に送信し、所定の時間内に深層学習モデルによって正常であると判定された応答情報を受信しない場合、アラームをトリガする、ように配置される。 In some embodiments, the alarm unit also periodically sends test questions to the vehicle for passengers to answer, and receives response information determined to be normal by the deep learning model within a predetermined period of time. If not, it is arranged to trigger an alarm.

第５態様において、本開示の実施例は、車内の異なる位置にある少なくとも２つのオーディオ収集装置によって、それぞれオーディオデータを収集し、収集された少なくとも２ウェイのオーディオデータに対してエコー除去を行い、エコー除去された後のデータをそれぞれ少なくとも２つの音声認識エンジンに入力し、音声認識を行うことにより、少なくとも２人のユーザのテキスト情報を得て、少なくとも２つのオーディオ収集装置の位置及び対応する少なくとも２人のユーザのテキスト情報をサービス側にアップロードするように配置された車両と、車両からアップロードされた少なくとも２つのオーディオ収集装置の位置及び対応する少なくとも２人のユーザのテキスト情報を受信し、少なくとも２つのオーディオ収集装置の位置に基づいて、２人のユーザのテキスト情報を会話ストリームに構成し、会話ストリームを予めトレーニングされた深層学習モデルに入力することにより、会話が異常である確率を得て、確率が所定の第１閾値より高い場合、アラームをトリガするように配置されたサービス側と、を含む情報を出力するためのシステムに関する。 In a fifth aspect, an embodiment of the present disclosure collects audio data respectively by at least two audio collection devices at different locations in the vehicle, performs echo cancellation on the collected at least two-way audio data, The echo-removed data are respectively input to at least two speech recognition engines and speech recognition is performed to obtain text information of at least two users, and to obtain the positions of at least two audio collecting devices and the corresponding at least a vehicle arranged to upload textual information of two users to a service side, and receiving locations of at least two audio collection devices and corresponding textual information of at least two users uploaded from the vehicle; Based on the positions of the two audio collection devices, the textual information of the two users is composed into a conversation stream, and the conversation stream is input to a pre-trained deep learning model to obtain the probability that the conversation is abnormal. , a service side arranged to trigger an alarm if the probability is higher than a predetermined first threshold.

第６態様において、本開示の実施例は、１つ以上のプロセッサと、１つ以上のプログラムが記憶された記憶装置と、を含み、１つ以上のプログラムが１つ以上のプロセッサによって実行されるとき、第１態様および第２態様のいずれかに記載の方法を１つ以上のプロセッサに実現させる情報を出力するための電子機器に関する。 In a sixth aspect, embodiments of the present disclosure include one or more processors and a storage device storing one or more programs, the one or more programs being executed by the one or more processors. When it relates to an electronic device for outputting information that causes one or more processors to implement the method according to any of the first and second aspects.

第７態様において、本開示の実施例は、コンピュータプログラムが記憶されたコンピュータ可読媒体であって、コンピュータプログラムがプロセッサによって実行されるとき、第１態様および第２態様のいずれかに記載の方法を実現するコンピュータ可読記憶媒体に関する。 In a seventh aspect, an embodiment of the present disclosure is a computer readable medium having stored thereon a computer program which, when executed by a processor, performs the method of any of the first and second aspects. A computer-readable storage medium for implementation.

第８態様において、本開示の実施例は、コンピュータプログラムであって、コンピュータプログラムがプロセッサによって実行されると、第１態様および第２態様のいずれかに記載の方法を実現するコンピュータプログラムに関する。 In an eighth aspect, an embodiment of the present disclosure relates to a computer program for implementing the method according to any of the first and second aspects when the computer program is executed by a processor.

本発明は、主に車載シーンにおける人身安全問題について、効果的なリアルタイム音声監視を行い、車載シーンにおけるユーザの音声会話を監視することにより、ユーザの会話内容を取得し、事前に危険を予測し、且つタイムリーに予防措置を行い、車載移動シーンにおける多くの潜在的な安全問題を解決する。 The present invention is mainly for personal safety issues in in-vehicle scenes, by performing effective real-time voice monitoring, and by monitoring the user's voice conversations in in-vehicle scenes, the user's conversation content can be obtained and danger can be predicted in advance. , and take preventive measures in a timely manner, solving many potential safety problems in the car-moving scene.

本開示のその他の特徴、目的および利点をより明確にするために、以下の図面を参照してなされた非限定的な実施例の詳細な説明を参照する。
本開示の一実施例が適用可能な例示的なシステムアーキテクチャ図である。本開示による情報を出力するための方法の一実施例のフローチャートである。本開示による情報を出力するための方法の別の実施例のフローチャートである。本開示による情報を出力するための方法の一応用シーンを示す概略図である。本開示による情報を出力するための装置の一実施例の概略構成図である。本開示による情報を出力するための装置の別の実施例の概略構成図である。本開示の実施例を実現するために適用される電子機器のコンピュータシステムの概略構成図である。 To make other features, objects and advantages of the present disclosure more apparent, reference is made to the following detailed description of non-limiting examples made with reference to the drawings.
1 is an exemplary system architecture diagram to which an embodiment of the present disclosure is applicable; FIG. 4 is a flowchart of one embodiment of a method for outputting information according to the present disclosure; 4 is a flowchart of another embodiment of a method for outputting information according to the present disclosure; 1 is a schematic diagram illustrating one application scene of a method for outputting information according to the present disclosure; FIG. 1 is a schematic block diagram of one embodiment of an apparatus for outputting information according to the present disclosure; FIG. Fig. 2 is a schematic block diagram of another embodiment of an apparatus for outputting information according to the present disclosure; 1 is a schematic configuration diagram of a computer system of an electronic device applied to implement an embodiment of the present disclosure; FIG.

以下、図面及び実施例を参照して本開示についてより詳細に説明する。ここで説明された具体的な実施例は、関連する発明を説明するためだけのものであり、この発明を制限するものではないことを理解できる。なお、説明を容易にするために、図面には関連する発明に関連する部分のみを示している。 The present disclosure will now be described in more detail with reference to the drawings and examples. It can be understood that the specific embodiments described herein are only illustrative of the relevant invention and are not intended to limit the invention. For ease of explanation, only the parts related to the related inventions are shown in the drawings.

なお、矛盾しない場合には、本開示の実施例及び実施例における特徴が互いに組み合わせることができる。以下、図面を参照して、実施例に合わせて本開示を詳細に説明する。 It should be noted that embodiments of the present disclosure and features in embodiments can be combined with each other where not inconsistent. Hereinafter, the present disclosure will be described in detail according to embodiments with reference to the drawings.

図１には、本開示が適用され得る、情報を出力するための方法又は情報を出力するための装置の実施例の例示的なシステムアーキテクチャ１００が示されている。 FIG. 1 shows an exemplary system architecture 100 of an embodiment of a method or apparatus for outputting information to which the present disclosure may be applied.

図１に示すように、システムアーキテクチャ１００は、車両１０１、ネットワーク１０２およびサーバ１０３を含むことができ、ここで、車両１０１に、オーディオ収集装置１０１１、１０１２、１０１３、１０１４、及びコントローラ１０１５が取り付けられてもよい。当該コントローラ１０１５は、ステップ２０１？２０５を実行することができる。ネットワーク１０２は、車両１０１とサーバ１０３との間に通信リンクの媒体を提供するために使用される。ネットワーク１０１２は、例えば有線、無線通信リンク、または光ファイバケーブルなどの様々な接続タイプを含むことができる。 As shown in FIG. 1, system architecture 100 may include vehicle 101, network 102 and server 103, where vehicle 101 is equipped with audio collection devices 1011, 1012, 1013, 1014, and controller 1015. may The controller 1015 can perform steps 201-205. Network 102 is used to provide a medium for communication links between vehicle 101 and server 103 . Network 1012 may include various connection types such as wired, wireless communication links, or fiber optic cables, for example.

乗客は、車両１０１を使用してネットワーク１０２を介してサーバ１０３とインタラクティブすることにより、メッセージなどを送受信することができる。車両１０１には、複数のオーディオ収集装置が取り付けられることができる。車両１０１は、音声が検出された後に、ローカルに音声認識を行うことができる。各オーディオ収集装置は、１ウェイのオーディオデータを受信することができ、各ウェイのオーディオデータは、１つの音声認識エンジンによって音声認識され、このようにすると、各オーディオ収集装置は、１人のユーザのテキスト情報に対応する。オーディオ収集装置の位置に基づいて、当該ウェイのオーディオデータに対応するユーザのアイデンティティを確定することができ、例えば、運転位置の付近で収集された音声を音声認識して得られたテキスト情報は、運転手に属する。 Passengers can use vehicle 101 to interact with server 103 over network 102 to send and receive messages and the like. Vehicle 101 may be fitted with multiple audio collection devices. Vehicle 101 can perform voice recognition locally after voice is detected. Each audio gathering device can receive one way of audio data, and each way of audio data is speech recognized by one speech recognition engine, thus each audio gathering device can be used by one user. corresponding to the text information of Based on the position of the audio collection device, the identity of the user corresponding to the audio data of the way can be determined, for example, the text information obtained by speech recognition of the sound collected near the driving position belongs to the driver.

車両１０１に取り付けられたオーディオ収集装置の数は、２個に限定されない。３つ以上であってもよい。複数のオーディオ収集装置を取り付ける目的は、主に話し手の位置を認識することにより、運転手が乗客に脅威を与えるかどうかを判断することである。オーディオ収集装置の数は、最大乗車定員数と一致することができる。 The number of audio collection devices attached to vehicle 101 is not limited to two. It may be three or more. The purpose of installing multiple audio collection devices is primarily to determine if the driver poses a threat to the passenger by recognizing the location of the speaker. The number of audio collection devices can match the maximum passenger capacity.

サーバ１０３は、様々なサービスを提供するサーバであってもよく、例えば、車両１０１からアップロードされた音声認識結果に対しテキスト分析を提供するアラームサーバである。アラームサーバには、ニューラルネットワークモデルがインストールされ、ユーザのアイデンティティに基づいて受信されたテキスト情報を会話ストリームにスティッチングし、この後、ニューラルネットワークモデルによって会話がアラーム条件を満たすかどうかを判断することができる。アラーム条件を満たした場合、アラームをトリガし、車両の近くの交通警察に当該車両の位置情報と属性情報（ナンバープレート、車種、車主アイデンティティ情報、電話など）を通知する。 The server 103 may be a server that provides various services, for example an alarm server that provides text analysis on speech recognition results uploaded from the vehicle 101 . The alarm server is installed with a neural network model for stitching the received text information into the conversation stream based on the user's identity, and then determining whether the conversation satisfies the alarm conditions by the neural network model. can be done. If the alarm conditions are met, it will trigger an alarm and notify the traffic police near the vehicle of the vehicle's location information and attribute information (license plate, vehicle type, vehicle owner identity information, telephone, etc.).

なお、サーバは、ハードウェアでもソフトウェアでもよい。サーバがハードウェアである場合、複数のサーバからなる分散型サーバクラスターとして実現されてもよく、単一のサーバとして実現されてもよい。サーバがソフトウェアである場合、複数のソフトウェアまたはソフトウェアモジュール（例えば分散型サービスを提供するための複数のソフトウェアまたはソフトウェアモジュール）として実現されてもよく、単一のソフトウェアまたはソフトウェアモジュールとして実現されてもよい。ここで、具体的に限定しない。 Note that the server may be hardware or software. Where the server is hardware, it may be implemented as a distributed server cluster of multiple servers, or it may be implemented as a single server. If the server is software, it may be implemented as multiple software or software modules (e.g., multiple software or software modules for providing distributed services) or as a single software or software module. . Here, it is not specifically limited.

なお、本開示の実施例による情報を出力するための方法は、一般的に車両１０１とサーバ１０３によって実行されてもよく、これに対応して、情報を出力するための装置は、一般的に車両１０１とサーバ１０３に配置される。１台のサーバは、複数の車に対してサービスを提供することができ、それらはアラームシステムを構成する。 It should be noted that the method for outputting information according to embodiments of the present disclosure may generally be performed by vehicle 101 and server 103, and correspondingly, the device for outputting information is generally It is arranged in the vehicle 101 and the server 103 . One server can serve multiple cars, which constitute an alarm system.

図１の端末機器、車両、ネットワーク、およびサーバの数は単なる例示であることを理解すべきである。必要に応じて、任意の数の車両、ネットワーク、およびサーバを備えることができる。 It should be understood that the numbers of terminals, vehicles, networks, and servers in FIG. 1 are merely exemplary. Any number of vehicles, networks, and servers can be provided as desired.

続けて図２を参照すると、本発明による情報を出力するための方法の一実施例のフロー２００が示されている。当該情報を出力するための方法は、車両に適用され、以下のステップを含む。 With continued reference to FIG. 2, a flow 200 of one embodiment of a method for outputting information in accordance with the present invention is shown. A method for outputting such information is applied to a vehicle and includes the following steps.

ステップ２０１において、車内の異なる位置にある少なくとも２つのオーディオ収集装置によって、それぞれオーディオデータを収集する。 At step 201, audio data is collected by at least two audio collection devices at different locations in the vehicle, respectively.

本実施例において、情報を出力するための方法の実行主体（例えば図１に示された車両）は、車内の異なる位置にある少なくとも２つのオーディオ収集装置によって、それぞれオーディオデータを収集する。オーディオ収集装置は、マイク、ピックアップ、テープレコーダーなどのオーディオを収集するための装置を含むことができる。各オーディオ収集装置は、１ウェイのデータを収集する。オーディオ収集装置は、座席の横に取り付けられ、当該座席に座ったユーザの音声を収集できる。オーディオ収集装置の位置から、収集されたオーディオデータがどのユーザに属するかを判断することができる。２つのオーディオ収集装置のうちの１つは、運転手が使用するオーディオ収集装置である。他のオーディオ収集装置は、乗客に使用される。一般的に、図１に示すように、４つのオーディオ収集装置を配置することができる。 In this embodiment, the subject performing the method for outputting information (eg the vehicle shown in FIG. 1) collects audio data respectively by at least two audio collection devices located at different locations in the vehicle. Audio collection devices may include devices for collecting audio such as microphones, pickups, tape recorders, and the like. Each audio collection device collects one way of data. The audio collection device is attached to the side of the seat and can collect the voice of the user seated in the seat. From the location of the audio collecting device, it can be determined to which user the collected audio data belongs. One of the two audio gathering devices is the audio gathering device used by the driver. Other audio collection devices are used by passengers. Generally, four audio collection devices can be deployed as shown in FIG.

ステップ２０２において、収集された少なくとも２ウェイのオーディオデータに対してエコー除去を行う。 At step 202, echo cancellation is performed on the collected at least two-way audio data.

本実施例において、エコー除去（ＡｃｏｕｓｔｉｃＥｃｈｏＣａｎｃｅｌｌａｔｉｏｎ、ＡＥＣ）の問題に対して、今最も人気のあるアルゴリズムは、適応フィルタに基づくエコー除去アルゴリズムである。異なる適応フィルタアルゴリズムを使用してフィルタの重みベクトルを調整し、近似的なエコー経路を推定し実際のエコー経路に近似することにより、推定されたエコー信号を得て、純粋な音声とエコーとの混合信号からこの信号を除去してエコーの除去を実現した。 In this embodiment, for the Acoustic Echo Cancellation (AEC) problem, the most popular algorithm today is the adaptive filter-based echo cancellation algorithm. By adjusting the filter weight vector using different adaptive filter algorithms, estimating the approximate echo path and approximating the actual echo path, an estimated echo signal is obtained and the pure speech and echo are separated. Echo cancellation was achieved by removing this signal from the mixed signal.

ステップ２０３において、エコー除去された後のデータをそれぞれ少なくとも２つの音声認識エンジンに入力し、音声認識を行うことにより、少なくとも２人のユーザのテキスト情報を得る。 In step 203, the echo-removed data are respectively input to at least two speech recognition engines for speech recognition to obtain text information of at least two users.

本実施例において、各オーディオ収集装置は、１つの音声認識エンジンに対応する。音声認識エンジンは、音声認識技術を採用し、自動音声認識ＡｕｔｏｍａｔｉｃＳｐｅｅｃｈＲｅｃｏｇｎｉｔｉｏｎ（ＡＳＲ）とも呼ばれ、その目標は、人間の音声における語彙の内容を、例えばボタン、バイナリコード、または文字シーケンスなどのコンピュータ読み取り可能な入力に変換することである。本開示の解決策では、音声認識は、ウェイクアップなしで開始される。 In this example, each audio collection device corresponds to one speech recognition engine. A speech recognition engine employs speech recognition technology, also called Automatic Speech Recognition (ASR), whose goal is to translate the lexical content in human speech into computer to convert it into a readable input. In the solution of the present disclosure, speech recognition is started without wakeup.

ステップ２０４において、サービス側が予めトレーニングされた深層学習モデルを通じてテキスト情報の分析を行うために、少なくとも２つのオーディオ収集装置の位置及び対応する少なくとも２人のユーザのテキスト情報をサービス側にアップロードし、分析結果がアラーム条件を満たした場合、アラーム情報を出力する。 In step 204, the locations of the at least two audio collecting devices and the corresponding text information of the at least two users are uploaded to the service side for analysis, so that the service side performs analysis of the text information through a pre-trained deep learning model; If the result satisfies the alarm condition, output the alarm information.

本実施例において、音声の終点が検出された後、テキスト情報の認識を開始して、サービス側にアップロードしてもよいし、音声認識をリアルタイムで行って、認識結果をリアルタイムでアップロードしてもよい。各ユーザのオーディオデータの音声認識結果は、それぞれ異なる送信装置によってアップロードされてもよいし、１つの送信装置を共有して、複数のユーザのテキスト情報を１つのパッケージにしてアップロードしてもよい。リソースを節約するために、異なるユーザのテキスト情報を時間別に報告することもできる。報告されたテキスト情報は、オーディオ収集装置の位置標識を携帯し、サービス側は、これを利用して、受信されたテキスト情報が誰の話であるかを判断できる。テキスト情報を時間別に報告する際にリソースが衝突する場合、乗客のテキスト情報を優先的に報告し、次に運転手のテキスト情報を報告することができる。時間別に報告する場合、タイムスタンプを付与することにより、会話の時系列を区別する必要がある。 In this embodiment, after the end point of the speech is detected, the recognition of the text information can be started and uploaded to the service side, or the speech recognition can be performed in real time and the recognition result can be uploaded in real time. good. The speech recognition results of the audio data of each user may be uploaded by different transmission devices, or one transmission device may be shared to upload the text information of a plurality of users in one package. To save resources, different users' text information can also be reported by time. The reported text information carries the location indicator of the audio collection device, which the service can use to determine whose story the received text information is about. If there is a resource conflict in reporting the text information by time, the passenger's text information may be reported first, followed by the driver's text information. When reporting by time, it is necessary to distinguish chronological sequences of conversations by adding timestamps.

テキスト情報がサービス側に報告された後、サービス側は、予めトレーニングされた深層学習モデルを通じてテキスト情報の分析を行い、分析結果がアラーム条件を満たした場合、アラーム情報を出力する。 After the text information is reported to the service side, the service side analyzes the text information through a pre-trained deep learning model, and outputs alarm information if the analysis result meets the alarm conditions.

本実施例のいくつかの選択可能な実施形態において、当該方法は、エコー除去された後のデータをそれぞれ少なくとも２つの感情認識エンジンに入力し、感情認識を行うことにより、少なくとも２人のユーザの感情情報を得ることと、サービス側が予めトレーニングされた深層学習モデルを通じてテキスト情報および感情の分析を行うために、少なくとも２つのオーディオ収集装置の位置、対応する少なくとも２人のユーザのテキスト情報、及び２人のユーザの感情情報をサービス側にアップロードし、分析結果がアラーム条件を満たした場合、アラーム情報を出力することと、をさらに含む。感情認識エンジンは、ニューラルネットワークの分類器であってもよく、音声の特徴を抽出することにより、例えば緊張、恐怖、喜び、悲しみなどのユーザの感情を判断し、通常、恐怖を感じたユーザの音声が震える。トレーニングプロセスを簡単にするために、二次分類器を使用して、ユーザの感情が恐怖である確率を認識すればよい。トレーニングの際に、恐怖を感じたユーザの音声を正のサンプルとしてトレーニングを行う。 In some optional embodiments of the present example, the method inputs the echo-cancelled data into at least two emotion recognition engines, respectively, to perform emotion recognition, thereby providing at least two users with In order to obtain emotional information and the service side to analyze the textual information and sentiment through pre-trained deep learning models, the locations of at least two audio collecting devices, the corresponding textual information of at least two users, and two uploading the human user's emotion information to the service side, and outputting the alarm information if the analysis result meets the alarm condition. The emotion recognition engine, which may be a neural network classifier, extracts speech features to determine the user's emotion, such as tension, fear, joy, sadness, etc., and typically identifies the user's feeling of fear. Voice trembles. To simplify the training process, a secondary classifier may be used to recognize the probability that the user's emotion is fear. During training, the voice of the user who felt fear is used as a positive sample for training.

乗客が運転手の要求に従って正常に会話するように脅迫された場合、音声で認識されたテキスト情報が異常であるかどうかを判断することができない。感情情報を認識することは、音声認識されたテキスト情報がユーザの精神状態を表現できないことを補うためである。テキスト情報が正常であるが、乗客の感情が異常である場合、運転手は乗客を脅迫して正常な会話をして、音声監視を誤らせる可能性が高い。 If the passenger is blackmailed into speaking normally according to the driver's request, it cannot judge whether the text information recognized by voice is abnormal. Recognizing emotional information is to compensate for the inability of voice-recognized text information to express the user's mental state. If the text information is normal, but the passenger's emotions are abnormal, the driver is likely to intimidate the passenger into a normal conversation, thus misleading the voice monitoring.

さらに図３を参照して、情報を出力するための方法の別の実施例のフロー３００が示されている。当該情報を出力するための方法のフロー３００は、サービス側に適用され、以下のステップを含む。 Still referring to FIG. 3, another example flow 300 of a method for outputting information is shown. The method flow 300 for outputting the information is applied on the service side and includes the following steps.

ステップ３０１において、車両からアップロードされた少なくとも２つのオーディオ収集装置の位置及び対応する少なくとも２人のユーザのテキスト情報を受信する。 At step 301, the locations of at least two audio collection devices uploaded from a vehicle and the corresponding textual information of at least two users are received.

本実施例において、情報を出力するための方法の実行主体（例えば図１に示されたサービス側）は、無線接続を介して車両からアップロードされた少なくとも２つのオーディオ収集装置の位置及び対応する少なくとも２人のユーザのテキスト情報を受信する。 In this embodiment, the executing entity of the method for outputting information (for example, the service side shown in FIG. 1) receives the location of at least two audio collection devices uploaded from the vehicle via a wireless connection and the corresponding at least Receive text information for two users.

ステップ３０２において、少なくとも２つのオーディオ収集装置の位置に基づいて、２人のユーザのテキスト情報を会話ストリームに構成する。 At step 302, the textual information of the two users is organized into a conversation stream based on the locations of the at least two audio collection devices.

本実施例において、受信されたテキスト情報に対応する位置に基づいて、テキスト情報が属するユーザを区別することができる。次に、テキスト情報を受信した時間に基づいて、異なるユーザの話を完全な対話にスティッチングする。車両がテキスト情報を時間別に報告する場合、タイムスタンプを付与し、サービス側がタイムスタンプに基づいて会話ストリームをスティッチングする。 In this embodiment, the user to whom the text information belongs can be distinguished based on the location corresponding to the received text information. Then, stitch different user stories into a complete dialogue based on the time the text information was received. When the vehicle reports textual information by time, it gives a time stamp, and the service side stitches the conversation stream based on the time stamp.

ステップ３０３において、会話ストリームを予めトレーニングされた深層学習モデルに入力することにより、会話が異常である確率を得る。 At step 303, the probability that the dialogue is abnormal is obtained by inputting the dialogue stream into a pre-trained deep learning model.

本実施例において、過去の事件の会話情報をトレーニングサンプルとして使用して当該深層学習モデルをトレーニングすることができる。過去の事件には、運転手と被害者との会話記録（運転手を逮捕した後の尋問記録）が記録され、これを正のサンプルとして監督トレーニングを行う。トレーニングによって得られた深層学習モデルは、入力された会話ストリームに基づいて会話が異常である確率を得ることができる。 In this embodiment, the deep learning model can be trained using conversational information from past incidents as training samples. Conversation records between the driver and the victim (records of interrogations after arresting the driver) are recorded in past incidents, and this is used as a positive sample for supervisory training. The deep learning model obtained by training can obtain the probability that the conversation is abnormal based on the input conversation stream.

ステップ３０４において、確率が所定の第１閾値より高い場合、アラームをトリガする。 At step 304, if the probability is higher than a predetermined first threshold, trigger an alarm.

本実施例において、会話が異常である確率が所定の第１閾値より高い場合、アラームをトリガする。アラームをトリガするプロセスは、１１０番をかけて、疑わしい車両の位置および属性情報（車種、色、所有者情報など）を通報することを含む。位置情報に基づいて、車両に最も近い交通警察を検索し、車両の属性情報および位置情報を交通警察に送信することができる。 In this embodiment, if the probability that the conversation is abnormal is higher than a predetermined first threshold, an alarm is triggered. The process of triggering an alarm involves dialing 911 and reporting the location and attribute information (model, color, owner information, etc.) of the suspected vehicle. Based on the location information, the closest traffic police to the vehicle can be searched, and the attribute information and location information of the vehicle can be sent to the traffic police.

本実施例のいくつかの選択可能な実施形態において、当該方法は、少なくとも２人のユーザの感情情報を受信することと、会話が異常である確率が所定の第２閾値未満であり、乗客の感情情報に恐怖感が含まれる場合、アラームをトリガすることと、をさらに含む。オーディオ収集装置の位置に応じて、受信された音声が運転手のものであるか乗客のものであるかを確定することができ、この結果、少なくとも２人のユーザの感情情報のうちのどの感情情報が乗客のものであるかを認識することができる。感情情報には、恐怖感、緊張感、高揚感などが含まれることができる。感情情報によって、受信されたテキスト情報をさらに検証してもよい。テキスト情報に問題はないが、乗客が運転手に脅迫された可能性を排除できないので、音声の特徴から乗客の感情が正常であるかどうかを判断する必要があり、恐怖にもかかわらず、会話が完璧にできれば、非常に疑わしいので警察に通報する必要がある。第２閾値は、第１閾値以下であってもよい。 In some optional embodiments of this example, the method comprises receiving the emotional information of at least two users, the probability that the speech is abnormal is less than a predetermined second threshold, and the passenger's and triggering an alarm if the emotional information includes a feeling of fear. Depending on the position of the audio gathering device, it can be established whether the received speech is that of the driver or that of the passenger, so that which of the at least two users' emotional information can be determined. It is possible to recognize whether the information belongs to the passenger. Emotional information can include fear, tension, exhilaration, and the like. Affective information may further validate the received textual information. There is no problem with the text information, but the possibility that the passenger was threatened by the driver cannot be ruled out, so it is necessary to judge whether the passenger's emotions are normal or not based on the voice characteristics. If you can do it perfectly, it is very suspicious and you should report it to the police. The second threshold may be less than or equal to the first threshold.

本実施例のいくつかの選択可能な実施形態において、当該方法は、車両のオーディオ収集装置の回路接続の切断が検出されたことに応答して、アラームをトリガすることをさらに含む。オーディオ収集装置の回路の切断とは、オーディオ収集装置が取り外されていることを意味する。本開示が正常に動作することができる前提条件は、オーディオ収集装置が正常に使用されることである。したがって、運転手に取り外されることを防ぐために、オーディオ収集装置をいくつかの検査を行う必要がある。運営者からの許可を得ていない場合、オーディオ収集装置を取り外すと、警察に通報する。 In some optional embodiments of this example, the method further includes triggering an alarm in response to detecting a disconnection of the circuit connection of the vehicle's audio collection device. Disconnecting the circuit of the audio gathering device means that the audio gathering device is removed. A prerequisite for the successful operation of the present disclosure is the successful use of the audio collection device. Therefore, it is necessary to perform some checks on the audio collection device to prevent it from being removed by the driver. If you remove the audio collection device without permission from the operator, call the police.

本実施例のいくつかの選択可能な実施形態において、当該方法は、乗客が回答するためのテスト問題を車両に定期的に送信することと、所定の時間内に深層学習モデルによって正常であると判定された応答情報を受信しない場合、アラームをトリガすることと、をさらに含む。車内の会話をずっと傍受していない場合、乗客が話したくないのか、それとも運転手が乗客に話しにくいようにさせているのかを判断できないので、テスト問題を定期的に送信して乗客に答えさせ、乗客が安全かどうかを確認する必要がある。乗客が一定の時間内で応答しなかったり、応答後に音声認識で解析した結果が深層学習モデルによって異常と判断された場合、アラームをトリガする。 In some optional embodiments of this example, the method includes periodically sending test questions to the vehicle for passengers to answer and answering the questions as normal by the deep learning model within a predetermined time. and triggering an alarm if the determined response information is not received. If you haven't listened to the conversation in the car all along, you can't tell if the passenger doesn't want to talk or if the driver is making it difficult for the passenger to talk, so send test questions periodically to get the passenger to answer. , it is necessary to check whether the passengers are safe. An alarm is triggered when a passenger does not respond within a certain period of time, or when the deep learning model determines that the results of speech recognition analysis after responding are abnormal.

続けて図４を参照すると、図４は、本実施例による情報を出力するための方法の応用シーンを示す概略図である。図４の応用シーンにおいて、乗客は、乗車後に運転手の後ろに座って、彼に最も近いオーディオ収集装置によって乗客の音声を収集する。運転手の隣のオーディオ収集装置によって運転手の音声を収集する。次に、２人の音声は、それぞれ音声認識エンジンによって認識され、２つのテキスト情報を得る。車両は、この２つのテキスト情報をサービス側に送信する。サービス側は、オーディオ収集装置の位置に基づいて、どのテキスト情報が運転手によって話されたか、どのテキスト情報が乗客によって話されたかを確定する。この後、受信された時間に基づいて、２つのテキスト情報を会話にスティッチングする。最後に、会話を予めトレーニングされた深層学習モデルに入力して、会話が異常である確率を判断する。所定の第１閾値より高い場合、アラームをトリガする。 Continuing to refer to FIG. 4, FIG. 4 is a schematic diagram showing the application scene of the method for outputting information according to this embodiment. In the application scene of FIG. 4, the passenger sits behind the driver after boarding and collects the passenger's voice by the audio collecting device closest to him. The driver's voice is collected by an audio collection device next to the driver. Then the voices of the two people are respectively recognized by the voice recognition engine to obtain two pieces of text information. The vehicle transmits these two pieces of text information to the service side. The service determines which textual information was spoken by the driver and which textual information was spoken by the passenger based on the location of the audio collecting device. After this, the two pieces of text information are stitched into the conversation based on the received time. Finally, the speech is fed into a pre-trained deep learning model to determine the probability that the speech is anomalous. If higher than a predetermined first threshold, trigger an alarm.

本開示の上記実施例による方法は、以下の利点を備える。 The method according to the above embodiments of the present disclosure has the following advantages.

１．車両の安全属性を増やし、車メーカーの安全技術ソリューションを豊富にする。 1. Increase the safety attributes of vehicles and enrich safety technology solutions for car manufacturers.

２．例えばタクシーやオンライン配車などの自動車から派生した輸送サービス業界では、当該ソリューションを搭載した車両が乗客の安全保障を向上させることができる。 2. For example, in the automotive-derived transport services industry, such as taxis and online ride-hailing, vehicles equipped with such solutions can improve passenger security.

３．ユーザの会話内容を検出することにより、トリップコンピュータの関連推薦サービスを最適化し、より興味のあるニュースをユーザに推薦し、より良い内容の文章または製品をユーザに推薦する。 3. By detecting the user's conversation content, the related recommendation service of the trip computer is optimized, and more interesting news is recommended to the user, and better content articles or products are recommended to the user.

さらに図５を参照して、上記の各図に示された方法の実現として、本発明は情報を出力するための装置の一実施例を提供し、当該装置の実施例は、図２に示す方法実施例に対応し、当該装置は、具体的に様々な電子機器に適用できる。 Further referring to FIG. 5, as an implementation of the methods shown in the above figures, the present invention provides an embodiment of a device for outputting information, an embodiment of the device is shown in FIG. Corresponding to the method embodiments, the apparatus can be specifically applied to various electronic devices.

図５に示すように、本実施例に係る情報を出力するための装置５００は、オーディオ収集ユニット５０１と、エコー除去ユニット５０２と、音声認識ユニット５０３と、情報アップロードユニット５０４とを含む。ここで、オーディオ収集ユニット５０１は、車内の異なる位置にある少なくとも２つのオーディオ収集装置によって、それぞれオーディオデータを収集するように配置される。エコー除去ユニット５０２は、収集された少なくとも２ウェイのオーディオデータに対してエコー除去を行うように配置される。音声認識ユニット５０３は、エコー除去された後のデータをそれぞれ少なくとも２つの音声認識エンジンに入力し、音声認識を行うことにより、少なくとも２人のユーザのテキスト情報を得るように配置される。情報アップロードユニット５０４は、サービス側が予めトレーニングされた深層学習モデルを通じてテキスト情報の分析を行うために、少なくとも２つのオーディオ収集装置の位置及び対応する少なくとも２人のユーザのテキスト情報をサービス側にアップロードし、分析結果がアラーム条件を満たした場合、アラーム情報を出力するように配置される。 As shown in FIG. 5, the device 500 for outputting information according to this embodiment includes an audio collecting unit 501, an echo cancellation unit 502, a speech recognition unit 503 and an information uploading unit 504. FIG. Here, the audio collecting unit 501 is arranged to collect audio data respectively by at least two audio collecting devices at different positions in the vehicle. The echo cancellation unit 502 is arranged to perform echo cancellation on the collected at least two-way audio data. The speech recognition unit 503 is arranged to input the echo-cancelled data respectively to at least two speech recognition engines and perform speech recognition to obtain text information of at least two users. The information uploading unit 504 uploads the locations of the at least two audio collecting devices and the corresponding text information of the at least two users to the service side for the service side to analyze the text information through a pre-trained deep learning model. , is arranged to output alarm information if the analysis result satisfies the alarm condition.

本実施例において、情報を出力するための装置５００のオーディオ収集ユニット５０１、エコー除去ユニット５０２、音声認識ユニット５０３及び情報アップロードユニット５０４の具体的な処理について、図２の対応する実施例におけるステップ２０１、ステップ２０２、ステップ２０３およびステップ２０４を参照することができる。 In this embodiment, for the specific processing of the audio collection unit 501, echo cancellation unit 502, speech recognition unit 503 and information upload unit 504 of the device 500 for outputting information, see step 201 in the corresponding embodiment of FIG. , steps 202 , 203 and 204 .

本実施例のいくつかの選択可能な実施形態において、装置５００は、エコー除去された後のデータをそれぞれ少なくとも２つの感情認識エンジンに入力し、感情認識を行うことにより、少なくとも２人のユーザの感情情報を得るように配置された感情認識ユニットをさらに含み、情報アップロードユニットは、さらに、サービス側が予めトレーニングされた深層学習モデルを通じてテキスト情報および感情の分析を行うために、少なくとも２つのオーディオ収集装置の位置、対応する少なくとも２人のユーザのテキスト情報、及び２人のユーザの感情情報をサービス側にアップロードし、分析結果がアラーム条件を満たした場合、アラーム情報を出力するように配置される。 In some optional embodiments of the present example, the device 500 inputs the echo-cancelled data to at least two emotion recognition engines, respectively, to perform emotion recognition, thereby providing a sense of emotion to at least two users. further comprising an emotion recognition unit arranged to obtain emotion information, the information uploading unit further including at least two audio collectors for the service side to analyze the text information and emotion through pre-trained deep learning models; , the corresponding text information of at least two users, and the emotion information of the two users are uploaded to the service side, and arranged to output alarm information when the analysis result meets the alarm conditions.

さらに図６を参照して、上記の各図に示された方法の実現として、本発明は情報を出力するための装置の一実施例を提供し、当該装置の実施例は、図３に示す方法実施例に対応し、当該装置は、具体的に様々な電子機器に適用できる。 Further referring to FIG. 6, as an implementation of the methods shown in the above figures, the present invention provides an embodiment of a device for outputting information, an embodiment of the device is shown in FIG. Corresponding to the method embodiments, the apparatus can be specifically applied to various electronic devices.

図６に示すように、本実施例に係る情報を出力するための装置６００は、受信ユニット６０１と、テキストスティッチングユニット６０２と、会話認識ユニット６０３と、アラームユニット６０４とを含む。ここで、受信ユニット６０１は、車両からアップロードされた少なくとも２つのオーディオ収集装置の位置及び対応する少なくとも２人のユーザのテキスト情報を受信するように配置される。テキストスティッチングユニット６０２は、少なくとも２つのオーディオ収集装置の位置に基づいて、２人のユーザのテキスト情報を会話ストリームに構成するように配置される。会話認識ユニット６０３は、会話ストリームを予めトレーニングされた深層学習モデルに入力することにより、会話が異常である確率を得るように配置される。アラームユニット６０４は、確率が所定の第１閾値より高い場合、アラームをトリガするように配置される。 As shown in FIG. 6, the device 600 for outputting information according to this embodiment includes a receiving unit 601 , a text stitching unit 602 , a speech recognition unit 603 and an alarm unit 604 . Here, the receiving unit 601 is arranged to receive the positions of at least two audio collecting devices and the corresponding textual information of at least two users uploaded from the vehicle. A text stitching unit 602 is arranged to compose the textual information of the two users into a speech stream based on the positions of the at least two audio gathering devices. The speech recognition unit 603 is arranged to obtain the probability that the speech is anomalous by inputting the speech stream into a pre-trained deep learning model. Alarm unit 604 is arranged to trigger an alarm if the probability is higher than a predetermined first threshold.

本実施例において、情報を出力するための装置６００の受信ユニット６０１、テキストスティッチングユニット６０２、会話認識ユニット６０３及びアラームユニット６０４の具体的な処理について、図２の対応する実施例におけるステップ２０１、ステップ２０２、ステップ２０３およびステップ２０４を参照することができる。 In this embodiment, for the specific processing of the receiving unit 601, the text stitching unit 602, the speech recognition unit 603 and the alarm unit 604 of the device 600 for outputting information, the steps 201 in the corresponding embodiment of FIG. Steps 202, 203 and 204 can be referred to.

本実施例のいくつかの選択可能な実施形態において、アラームユニット６０４は、さらに、車両の属性情報および位置情報を取得し、位置情報に基づいて、車両に最も近い交通警察を検索し、車両の属性情報および位置情報を交通警察に送信する、ように配置される。 In some optional embodiments of the present example, the alarm unit 604 further obtains the attribute information and location information of the vehicle, searches the closest traffic police to the vehicle based on the location information, and It is arranged to transmit attribute information and location information to the traffic police.

本実施例のいくつかの選択可能な実施形態において、受信ユニット６０１は、さらに、少なくとも２人のユーザの感情情報を受信するように配置され、アラームユニット６０４は、さらに、会話が異常である確率が所定の第２閾値未満であり、乗客の感情情報に恐怖感が含まれる場合、アラームをトリガするように配置される。 In some optional embodiments of this example, the receiving unit 601 is further arranged to receive the emotional information of at least two users, and the alarm unit 604 is further arranged to receive the probability that the conversation is abnormal. is less than a second predetermined threshold and the passenger's emotional information includes a sense of fear, the alarm is arranged to trigger.

本実施例のいくつかの選択可能な実施形態において、アラームユニット６０４は、さらに、車両のオーディオ収集装置の回路接続の切断が検出されたことに応答して、アラームをトリガするように配置される。 In some optional embodiments of this example, the alarm unit 604 is further arranged to trigger an alarm in response to a detected disconnection of the circuit connection of the vehicle's audio collection device. .

本実施例のいくつかの選択可能な実施形態において、アラームユニット６０４は、さらに、乗客が回答するためのテスト問題を車両に定期的に送信し、所定の時間内に深層学習モデルによって正常であると判定された応答情報を受信しない場合、アラームをトリガする、ように配置される。 In some optional embodiments of the present example, the alarm unit 604 also periodically sends test questions to the vehicle for passengers to answer, and within a predetermined time the deep learning model If no response information is received determined to trigger an alarm.

以下、図７を参照して、本開示の実施例を実現するために適用される電子機器（例えば図１におけるサーバまたは車両コントローラ）７００の概略構成図が示されている。図７に示された車両コントローラ／サーバは、一例に過ぎず、本開示の実施例の機能および使用範囲を制限しない。 Referring now to FIG. 7, there is shown a schematic configuration diagram of an electronic device (eg, server or vehicle controller in FIG. 1) 700 applied to implement an embodiment of the present disclosure. The vehicle controller/server illustrated in FIG. 7 is merely an example and does not limit the functionality and scope of use of the embodiments of this disclosure.

図７に示すように、電子機器７００は、読み出し専用メモリ（ＲＯＭ）７０２に記憶されているプログラムまたは記憶部７０８からランダムアクセスメモリ（ＲＡＭ）７０３にロードされたプログラムに従って各種の適切な動作と処理を行うことができる処理装置（例えば中央処理装置、グラフィックスプロセッサなど）７０１を含むことができる。ＲＡＭ７０３には、電子機器７００の操作に必要な様々なプログラムとデータが記憶されている。処理装置７０１、ＲＯＭ７０２、およびＲＡＭ７０３は、バス７０４によって相互に接続されている。入力／出力（Ｉ／Ｏ）インターフェース７０５もバス７０４に接続されている。 As shown in FIG. 7, electronic device 700 performs various appropriate operations and processes according to programs stored in read only memory (ROM) 702 or programs loaded from storage unit 708 into random access memory (RAM) 703. A processing unit (eg, central processing unit, graphics processor, etc.) 701 can be included. The RAM 703 stores various programs and data necessary for operating the electronic device 700 . Processing unit 701 , ROM 702 and RAM 703 are interconnected by bus 704 . Input/output (I/O) interface 705 is also connected to bus 704 .

通常、Ｉ／Ｏインターフェース７０５には、例えばタッチスクリーン、タッチパネル、キーボード、マウス、カメラ、オーディオ収集装置、加速度計、ジャイロなどを含む入力装置７０６と、例えば液晶ディスプレイ（ＬＣＤ）、スピーカー、振動器などを含む出力装置７０７と、例えば磁気テープ、ハードディスクなどを含む記憶装置７０８と、通信装置７０９とが接続されている。通信装置７０９は、データを交換するために電子機器７００が他の機器と無線通信または有線通信することを許可できる。図７は、様々な装置を有する電子機器７００を示しているが、図示されたすべての装置を実施または備えることが要求されないことを理解されたい。代わりに、より多くまたはより少ない装置を実施または備えることができる。図７に示した各ブロックは、１つの装置を表してもよく、必要に応じて複数の装置を表してもよい。 I/O interfaces 705 typically include input devices 706 including, for example, touch screens, touch panels, keyboards, mice, cameras, audio collection devices, accelerometers, gyros, etc.; , a storage device 708 including, for example, a magnetic tape or a hard disk, and a communication device 709 are connected. Communications device 709 may permit electronic device 700 to communicate wirelessly or by wire with other devices to exchange data. Although FIG. 7 illustrates electronic device 700 with various devices, it should be understood that you are not required to implement or include all of the devices shown. Alternatively, more or fewer devices may be implemented or provided. Each block shown in FIG. 7 may represent one device, or multiple devices, if desired.

特に、本開示の実施例によると、上記のフローチャートを参照して説明されたプロセスは、コンピュータソフトウェアのプログラムとして実現されることができる。例えば、本開示の実施例は、コンピュータ可読媒体に担持されたコンピュータプログラムを含むコンピュータプログラム製品を含み、当該コンピュータプログラムは、フローチャートに示された方法を実行するためのプログラムコードを含む。このような実施例では、このコンピュータプログラムは、通信装置７０９を介してネットワークからダウンロードされてインストールされ、または記憶装置７０８からインストールされ、またはＲＯＭ７０２からインストールされることができる。このコンピュータプログラムが処理装置７０１によって実行されるときに、本開示の実施例の方法で限定された上記の機能を実行する。なお、本開示の実施例に記載のコンピュータ可読媒体は、コンピュータ可読信号媒体、あるいはコンピュータ可読記憶媒体、または上記の両方の任意の組合せであってもよい。コンピュータ可読記憶媒体は、例えば、電気、磁気、光、電磁気、赤外線、あるいは半導体のシステム、装置またはデバイス、あるいは上記の任意の組合せであってもよいが、これらに限らない。コンピュータ可読記憶媒体のより具体的な例には、１本以上のワイヤによる電気的接続、携帯型コンピュータディスク、ハードディスク、ランダムアクセスメモリ（ＲＡＭ）、読み出し専用メモリ（ＲＯＭ）、消去可能プログラマブル読み取り専用メモリ（ＥＰＲＯＭまたはフラッシュメモリ）、光ファイバ、コンパクトディスク読み取り専用メモリ（ＣＤ－ＲＯＭ）、光記憶装置、磁気記憶装置、または上記の任意の組み合わせが含まれるが、これらに限らない。本開示の実施例では、コンピュータ可読記憶媒体は、プログラムを含むかまたは記憶する任意の有形の媒体であることができ、このプログラムは、指令実行システム、装置またはデバイスによって使用され、またはそれらと組み合わせて使用されることができる。本開示の実施例では、コンピュータが読み取り可能な信号媒体は、コンピュータが読み取り可能なプログラムコードを担持した、ベースバンド内でまたは搬送波の一部として伝播されるデータ信号を含んでもよい。このような伝播されたデータ信号は、多種の形式を採用でき、電磁気信号、光信号、または上記の任意の適切な組み合わせを含むが、これらに限らない。コンピュータが読み取り可能な信号媒体は、コンピュータ可読記憶媒体以外のいかなるコンピュータ可読媒体であってもよく、このコンピュータ可読信号媒体は、指令実行システム、装置またはデバイスによって使用され、またはそれらと組み合わせて使用されるためのプログラムを送信、伝播または伝送することができる。コンピュータ可読媒体に含まれるプログラムコードは、任意の適切な媒体で伝送されることができ、ワイヤ、光ファイバケーブル、ＲＦ（無線周波数）など、または上記の任意の適切な組み合わせを含むが、これらに限らない。 In particular, according to embodiments of the present disclosure, the processes described with reference to the flowcharts above may be implemented as computer software programs. For example, an embodiment of the present disclosure includes a computer program product comprising a computer program carried on a computer readable medium, the computer program including program code for performing the methods illustrated in the flowcharts. In these examples, the computer program may be downloaded and installed from a network via communication device 709 , installed from storage device 708 , or installed from ROM 702 . When this computer program is executed by the processing unit 701, it performs the above-described functions limited in the manner of the embodiments of the present disclosure. It is noted that the computer-readable media described in the embodiments of the present disclosure may be computer-readable signal media or computer-readable storage media or any combination of the above. A computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination of the foregoing. More specific examples of computer-readable storage media include electrical connections through one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, compact disc read only memory (CD-ROM), optical storage, magnetic storage, or any combination of the above. In the embodiments of the present disclosure, a computer-readable storage medium can be any tangible medium that contains or stores a program that is used by or in combination with an instruction execution system, apparatus or device. can be used In an embodiment of the present disclosure, a computer readable signal medium may include a data signal carried in baseband or as part of a carrier wave carrying computer readable program code. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer readable signal medium can be any computer readable medium other than a computer readable storage medium for use by or in combination with an instruction execution system, apparatus or device. You may send, propagate or transmit a program for Program code embodied in a computer readable medium may be transmitted over any suitable medium including wires, fiber optic cables, RF (radio frequency), etc., or any suitable combination of the above, including but not limited to Not exclusively.

上記コンピュータ可読媒体は、上記電子機器に含まれてもよく、個別に存在しこの電子機器に組み込まれなくてもよい。上記のコンピュータ可読媒体は、１つ以上のプログラムを担持し、上記の１つ以上のプログラムが当該電子機器によって実行されたとき、当該電子機器は、車内の異なる位置にある少なくとも２つのオーディオ収集装置によって、それぞれオーディオデータを収集し、収集された少なくとも２ウェイのオーディオデータに対してエコー除去を行い、エコー除去された後のデータをそれぞれ少なくとも２つの音声認識エンジンに入力し、音声認識を行うことにより、少なくとも２人のユーザのテキスト情報を得て、サービス側が予めトレーニングされた深層学習モデルを通じてテキスト情報の分析を行うために、少なくとも２つのオーディオ収集装置の位置及び対応する少なくとも２人のユーザのテキスト情報をサービス側にアップロードし、分析結果がアラーム条件を満たした場合、アラーム情報を出力する。または、当該電子機器は、車両からアップロードされた少なくとも２つのオーディオ収集装置の位置及び対応する少なくとも２人のユーザのテキスト情報を受信し、少なくとも２つのオーディオ収集装置の位置に基づいて、２人のユーザのテキスト情報を会話ストリームに構成し、会話ストリームを予めトレーニングされた深層学習モデルに入力することにより、会話が異常である確率を得て、確率が所定の第１閾値より高い場合、アラームをトリガする。 The computer-readable medium may be included in the electronic device or may exist separately and not be incorporated into the electronic device. The computer-readable medium carries one or more programs, and when the one or more programs are executed by the electronic device, the electronic device communicates with at least two audio collection devices at different locations in the vehicle. collect audio data respectively, perform echo removal on the collected at least two-way audio data, input the echo-removed data to at least two speech recognition engines, and perform speech recognition. obtains the text information of at least two users, and the location of at least two audio collection devices and the corresponding data of at least two users for the service side to analyze the text information through pre-trained deep learning models. Upload the text information to the service side, and output the alarm information when the analysis result satisfies the alarm conditions. Alternatively, the electronic device receives at least two audio gathering device locations and corresponding textual information of at least two users uploaded from the vehicle, and based on the locations of the at least two audio gathering devices, the two By composing the user's textual information into a conversation stream and inputting the conversation stream into a pre-trained deep learning model, a probability that the conversation is abnormal is obtained, and if the probability is higher than a predetermined first threshold, an alarm is generated. trigger.

本開示の実施例の操作を実行するためのコンピュータプログラムコードを、１以上のプログラミング言語またはそれらの組み合わせで書くことができ、前記プログラミング言語には、Ｊａｖａ（登録商標）、Ｓｍａｌｌｔａｌｋ（登録商標）、Ｃ＋＋などのオブジェクト指向プログラミング言語を含み、さらに「Ｃ」言語または同様のプログラミング言語などの従来の手続き型プログラミング言語も含まれる。プログラムコードは、完全にユーザのコンピュータ上で、部分的にユーザのコンピュータ上、１つの単独のソフトウェアパッケージとして、部分的にユーザのコンピュータ上かつ部分的にリモートコンピュータ上で、あるいは完全に遠隔コンピュータまたはサーバ上で実行されることができる。遠隔コンピュータに関する場合には、遠隔コンピュータは、ローカルエリアネットワーク（ＬＡＮ）または広域ネットワーク（ＷＡＮ）を含む任意の種類のネットワークを介してユーザのコンピュータに接続されることができ、または、外部のコンピュータに接続されることができる（例えばインターネットサービスプロバイダを利用してインターネットを介して接続する）。 Computer program code for carrying out operations of the embodiments of the present disclosure can be written in one or more programming languages, or combinations thereof, including Java®, Smalltalk®, It includes object-oriented programming languages such as C++, as well as traditional procedural programming languages such as the "C" language or similar programming languages. The program code may reside entirely on the user's computer, partially on the user's computer, as one single software package, partially on the user's computer and partially on a remote computer, or entirely on the remote computer or Can be run on a server. When referring to a remote computer, the remote computer can be connected to the user's computer via any kind of network, including a local area network (LAN) or wide area network (WAN), or can be connected to an external computer. Can be connected (eg, via the Internet using an Internet service provider).

図の中のフローチャートおよびブロック図には、本開示の様々な実施例によるシステム、方法とコンピュータプログラム製品の実現可能なアーキテクチャ、機能、および操作が示されている。この点で、フローチャート又はブロック図の各ブロックは、１つのモジュール、プログラミングのセグメント、またはコードの一部を代表でき、当該モジュール、プログラミングのセグメント、またはコードの一部は、所定のロジック機能を実現するための１つ以上の実行可能指令を含む。また、いくつかの代替の実施例では、ブロックに示されている機能は、図面に示された順序と異なる順序で発生してもよいことに留意されたい。例えば、連続して示す２つのブロックは実際に並行して実行されてもよく、それらは係る機能に応じて時に逆の順序で実行されてもよい。ブロック図および／またはフローチャートの各ブロック、およびブロック図および／またはフローチャートのブロックの組み合わせは、特定の機能または操作を実行する専用のハードウェアによるシステムによって実現されてもよく、または専用ハードウェアとコンピュータ指令の組み合わせによって実現されてもよいことにも留意されたい。 The flowcharts and block diagrams in the figures illustrate possible architectures, functionality, and operation of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block of a flowchart or block diagram can represent a module, segment of programming, or portion of code, which module, segment of programming, or portion of code implements a given logic function. contains one or more executable instructions for It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may actually be executed in parallel, or they may sometimes be executed in the reverse order, depending on the functionality involved. Each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, may be implemented by a system of dedicated hardware that performs the specified functions or operations, or by dedicated hardware and a computer system. Note also that it may be implemented by a combination of commands.

本開示の実施例に係るユニットは、ソフトウェアによって実現されてもよく、ハードウェアによって実現されてもよい。説明されたユニットは、プロセッサに設置されてもよく、例えば、「オーディオ収集ユニットと、エコー除去ユニットと、音声認識ユニットと、情報アップロードユニットとを含むプロセッサである」と記載してもよい。ここで、これらのユニットの名は、ある場合にはそのユニット自体を限定しなくて、例えば、オーディオ収集ユニットを「車内の異なる位置にある少なくとも２つのオーディオ収集装置によって、それぞれオーディオデータを収集するユニット」と記載してもよい。 A unit according to an embodiment of the present disclosure may be implemented by software or by hardware. The described units may be located in a processor, eg, may be described as "a processor including an audio acquisition unit, an echo cancellation unit, a speech recognition unit, and an information upload unit". Here, the names of these units do not in some cases limit the units themselves, e.g. an audio collection unit "collects audio data by at least two audio collection devices located at different locations in the vehicle, respectively. It may be written as a unit.

上記の説明は、本開示の好ましい実施例および応用された技術の原理の説明にすぎない。本開示の実施例に係る発明の範囲が、上記の技術的特徴を組み合わせて得られた技術案に限定されず、同時に上記の発明の概念から逸脱しない場合に、上記の技術的特徴またはこれと同等の技術的特徴を任意に組み合わせて得られた他の技術案を含むべきであることを当業者は理解すべきである。例えば、上記の特徴が本開示において開示されているもの（しかしこれに限らず）と類似した機能を有する技術的特徴と相互に入れ替わって形成された技術案が挙げられる。 The above description is merely that of the preferred embodiment of the present disclosure and the principles of the applied technology. The scope of the invention according to the embodiments of the present disclosure is not limited to technical solutions obtained by combining the above technical features, and at the same time, if the concept of the above invention is not deviated from, the above technical features or Those skilled in the art should understand that other technical solutions obtained by arbitrarily combining equivalent technical features should be included. For example, a technical solution is formed by replacing the features described above with technical features having similar functions to those disclosed in the present disclosure (but not limited thereto).

Claims

collecting audio data by at least two audio collection devices at different locations in the vehicle, respectively;
performing echo cancellation on the collected at least two-way audio data;
obtaining text information of at least two users by respectively inputting the echo-removed data to at least two speech recognition engines and performing speech recognition;
uploading the location of the at least two audio collecting devices and the corresponding text information of at least two users to the service side, for the service side to analyze the text information through a pre-trained deep learning model, and the analysis result alarms; outputting alarm information when the conditions are met;
A method for outputting information containing

obtaining emotion information of at least two users by respectively inputting the echo-removed data to at least two emotion recognition engines and performing emotion recognition;
Locations of the at least two audio collecting devices, corresponding at least two users' text information, and two users' sentiment information, for the service side to analyze the text information and sentiment through a pre-trained deep learning model. is uploaded to the service side, and if the analysis result satisfies the alarm conditions, alarm information is output;
2. The method of claim 1, further comprising:

receiving at least two audio collection device locations and corresponding at least two user textual information uploaded from a vehicle;
composing textual information of two users into a conversation stream based on the positions of the at least two audio collection devices;
obtaining a probability that the speech is anomalous by inputting the speech stream into a pre-trained deep learning model;
triggering an alarm if the probability is higher than a first predetermined threshold;
A method for outputting information containing

obtaining attribute information and location information of the vehicle;
retrieving the closest traffic police to the vehicle based on the location information;
transmitting attribute information and location information of the vehicle to the traffic police;
4. The method of claim 3, further comprising:

receiving affective information of at least two users;
triggering an alarm if the probability that the speech is abnormal is less than a second predetermined threshold and the passenger's emotional information includes fear;
4. The method of claim 3, further comprising:

triggering an alarm in response to detecting a loss of circuit connection in the vehicle's audio collection device;
4. The method of claim 3, further comprising:

periodically sending test questions to the vehicle for passengers to answer;
triggering an alarm if response information determined to be normal by the deep learning model is not received within a predetermined time;
4. The method of claim 3, further comprising:

an audio collection unit arranged to collect audio data respectively by at least two audio collection devices at different locations in the vehicle;
an echo cancellation unit arranged to perform echo cancellation on the collected at least two-way audio data;
a speech recognition unit arranged to obtain textual information of at least two users by respectively inputting echo-cancelled data to at least two speech recognition engines and performing speech recognition;
uploading the location of the at least two audio collecting devices and the corresponding text information of at least two users to the service side, for the service side to analyze the text information through a pre-trained deep learning model, and the analysis result alarms; an information upload unit arranged to output alarm information if a condition is met;
A device for outputting information containing

further comprising an emotion recognition unit arranged to obtain emotion information of at least two users by respectively inputting the echo-cancelled data into at least two emotion recognition engines and performing emotion recognition;
The information uploading unit further comprises the locations of the at least two audio collecting devices, the corresponding text information of at least two users, for the service side to conduct text information and sentiment analysis through pre-trained deep learning models; and uploading the emotional information of two users to the service side, and arranged to output alarm information when the analysis results meet the alarm conditions;
9. Apparatus according to claim 8.

a receiving unit arranged to receive the location of the at least two audio collecting devices and the corresponding textual information of the at least two users uploaded from the vehicle;
a text stitching unit arranged to compose textual information of two users into a dialogue stream based on the positions of the at least two audio gathering devices;
a speech recognition unit arranged to obtain a probability that the speech is anomalous by inputting the speech stream into a pre-trained deep learning model;
an alarm unit arranged to trigger an alarm if the probability is higher than a first predetermined threshold;
A device for outputting information containing

The alarm unit further
Acquiring attribute information and location information of the vehicle;
searching for the closest traffic police to the vehicle based on the location information;
transmitting attribute information and location information of the vehicle to the traffic police;
are arranged as
11. Apparatus according to claim 10.

the receiving unit is further arranged to receive emotional information of at least two users;
The alarm unit is further arranged to trigger an alarm if the probability that the speech is abnormal is below a predetermined second threshold and the passenger's emotional information includes a sense of fear.
11. Apparatus according to claim 10.

the alarm unit is further arranged to trigger an alarm in response to a detected disconnection of a circuit connection of an audio collection device of the vehicle;
11. Apparatus according to claim 10.

The alarm unit further
periodically sending test questions to said vehicle for passengers to answer;
triggering an alarm if response information determined to be normal by the deep learning model is not received within a predetermined time;
are arranged as
11. Apparatus according to claim 10.

Audio data is collected by at least two audio collecting devices located at different positions in the vehicle, echo cancellation is performed on the collected at least two-way audio data, and echo-cancelled data is collected by at least two audio data collection devices. Obtaining text information of at least two users by inputting to a speech recognition engine and performing speech recognition, and providing the locations of the at least two audio collecting devices and the corresponding text information of at least two users to the service side. a vehicle arranged to upload;
Receiving locations of at least two audio gathering devices and corresponding textual information of at least two users uploaded from a vehicle, and speaking textual information of the two users based on the locations of the at least two audio gathering devices. obtain a probability that the dialogue is abnormal by structuring it into a stream and inputting said dialogue stream into a pre-trained deep learning model, and triggering an alarm if said probability is higher than a predetermined first threshold a deployed service side;
A system for outputting information containing

one or more processors;
a storage device storing one or more programs;
including
causing said one or more processors to implement the method of any one of claims 1 to 7 when said one or more programs are executed by said one or more processors;
An electronic device for outputting information.

A computer readable medium having a computer program stored thereon,
realizing the method of any one of claims 1 to 7 when said computer program is executed by a processor;
computer readable medium.

A computer program,
The computer program, when executed by a processor, implements the method of any one of claims 1 to 7,
computer program.