JP7211384B2

JP7211384B2 - Voice recognition device, personal identification method and personal identification program

Info

Publication number: JP7211384B2
Application number: JP2020025364A
Authority: JP
Inventors: 渉藤井
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2020-02-18
Filing date: 2020-02-18
Publication date: 2023-01-24
Anticipated expiration: 2040-02-18
Also published as: JP2021131417A

Description

本発明は、音声を認識して個人を識別する音声認識装置、個人識別方法および個人識別プログラムに関する。 The present invention relates to a voice recognition device, a personal identification method, and a personal identification program for recognizing speech to identify an individual.

音声認識の発達に伴い、音声認識による機械操作が様々な手法により行われている。その中でも、利用者の自由発話による機械操作は、スマートフォンやスマートスピーカと呼ばれる商品に用いられ、最近の主流になりつつある。これらの多くは、近年の機械学習を用いた「音声認識」と「文意認識」の用途向けに作られた学習モデルによって実現されていることも多い。 With the development of speech recognition, machine operation by speech recognition is performed by various methods. Among them, machine operation by user's free speech is used in products called smart phones and smart speakers, and is becoming mainstream recently. Many of these are often realized by learning models created for the use of "speech recognition" and "sentence recognition" using machine learning in recent years.

また、音声認識技術の一般化、低価格化により、事業用途のみならず個人用途での音声認識技術の利活用が可能になっている。これにより、目視と手動で行っていた機器の操作を音声で行うことが求められるようになっている。特に、音声通話に代表されるコミュニケーション手段を、音声で操作したいという要求が多い。その際、課題の一つになるのが、コミュニケーションを行いたい相手をシステム上の相手（ＩＤ）と紐づける処理「名前解決」である。 In addition, due to the generalization and price reduction of voice recognition technology, it has become possible to utilize voice recognition technology not only for business purposes but also for personal purposes. As a result, it is now required to operate devices by voice instead of visually and manually. In particular, there are many requests to operate communication means represented by voice calls by voice. At that time, one of the problems is "name resolution", which is a process of linking a partner (ID) on the system with a partner to be communicated with.

システムで音声を認識して文字列化する技術は広く知られている。また、入力された文章を元に品詞を推定する形態素解析の技術も広く知られている。これにより、例えば「田中一郎に電話する」という入力に対して「名詞：田中一郎」「動詞：電話（する）」という情報を検出する事で、田中一郎氏の電話番号に電話をかけることが可能になる。 Techniques for recognizing speech in a system and converting it into a character string are widely known. A technique of morphological analysis for estimating a part of speech based on an input sentence is also widely known. As a result, for example, by detecting the information "noun: Ichiro Tanaka" and "verb: telephone (to call)" for the input "call Ichiro Tanaka", it is possible to make a call to Mr. Ichiro Tanaka's phone number. be possible.

また、特許文献１には、大規模なデータベースを生成することなく対話相手を特定できる電話応答支援装置が記載されている。特許文献１に記載された装置は、発呼者に関する情報及び発呼者の対話相手に関する履歴情報を保存する発呼者データベースを照会して、発呼者に関する情報と発呼者の対話相手に関する履歴情報に基づいて、発呼者から送信されてきた音声データの内容に対応した対話相手を識別する。 Further, Patent Literature 1 describes a telephone answering support device capable of specifying a dialogue partner without creating a large-scale database. The apparatus described in US Pat. No. 5,900,003 queries a caller database that stores information about the caller and historical information about the caller's interlocutors to obtain information about the caller and Based on the history information, the conversation partner corresponding to the content of the voice data transmitted from the caller is identified.

特開２００３－１５８５７９号公報JP-A-2003-158579

機械学習の前提として、学習モデルの作成には膨大な学習用のデータを用意する必要がある。「音声認識」に関して、利用者の発話を抽出するような汎用性のあるモデルを構築することは可能である。しかし、「文意認識」については利用者の経験や環境、他者との関係等様々な要因によって言葉のブレが生じるため、文意を一意に認識するのは困難である。 As a premise of machine learning, it is necessary to prepare a huge amount of data for learning to create a learning model. Regarding "speech recognition", it is possible to construct a versatile model that extracts the user's utterances. However, with regard to "meaning recognition," it is difficult to uniquely recognize the meaning of a sentence because the words are blurred due to various factors such as the user's experience, environment, and relationship with others.

文意を一意に認識させるために、例えば、機械が認識する語句を予め利用者に提示し、提示した語句を使って利用者に指示を行わせるシステムが想定される。しかし、画一の語句を利用者に認識させて発言させることは、利用者にとってストレスが大きい。また、機械学習を用いた方法の場合、例えば、個人ごとに文意認識の学習モデルを構築する方法が想定される。しかし、学習データとして「正解」を準備する必要があり、学習データの収集にも実運用を行う等の時間を要するため、適応性のほか即応性にも課題があった。 In order to uniquely recognize the meaning of a sentence, for example, a system is conceivable in which words and phrases recognized by a machine are presented to the user in advance, and the user is instructed using the presented words and phrases. However, it is very stressful for the user to make the user recognize and speak uniform words and phrases. In the case of a method using machine learning, for example, a method of constructing a learning model of sentence recognition for each individual is assumed. However, since it is necessary to prepare "correct answers" as learning data, and it takes time to collect learning data and perform actual operations, there were issues in terms of adaptability and responsiveness.

また、形態素解析についても、「正しい文」が入力されることを前提として設計される場合が多いため、話し言葉のように崩された日本語への対応が難しい。実際、日本人がコミュニケーションに使う話し言葉において、本来の文から語句が省略されて発言される事は多い。例えば、姓と役職の組でのみ発言される場合や、単に役職のみで発言される場合などである。 Also, morphological analysis is often designed on the premise that "correct sentences" are input, so it is difficult to deal with broken Japanese like spoken language. In fact, in the spoken language used for communication by Japanese people, words are often omitted from the original sentences. For example, there may be a case where only a pair of surname and position is spoken, or a case where only position is spoken.

しかし、社長などの例を除き、一般に企業内で同一の役職の人物は複数存在する。また、発言した本人の立場（所属）によって、役職に対応する個人は様々である。そのため、単純に単語を認識する方法の場合、役職名だけで個人を識別することは難しい。一方で、このような話し言葉が発せられる様々な状況を想定して多大な設定や情報を保持することは現実的ではない。 However, except for examples such as the president, there are generally a plurality of persons with the same position within a company. Also, depending on the position (affiliation) of the person who made the statement, there are various individuals who correspond to the position. Therefore, in the case of the method of simply recognizing words, it is difficult to identify individuals only by job titles. On the other hand, it is not realistic to hold a large amount of settings and information assuming various situations in which such spoken words are uttered.

また、特許文献１に記載された装置は、個人の氏名の一部が既知であることが前提であるため、組織の役職名だけからは個人を識別することは難しい。また、特許文献１に記載された装置は、履歴情報に基づいて対話相手を識別するため、過去の履歴情報を保持しておかなければならず、ストレージの面でも課題が残る。 In addition, since the device described in Patent Document 1 assumes that a part of the name of the individual is known, it is difficult to identify the individual only from the title of the organization. In addition, since the apparatus described in Patent Document 1 identifies a dialogue partner based on history information, it must retain past history information, which leaves a problem in terms of storage.

上記のような話し言葉の特性を考慮すると、機械操作での利便性を改善させるためには、音声情報を用いて組織における役職名などで個人を特定できる精度を、簡易な設定と論理で向上できることが好ましい。 Considering the characteristics of spoken language as described above, in order to improve the convenience of machine operation, it is necessary to improve the accuracy of identifying individuals by job title etc. in an organization using voice information with simple settings and logic. is preferred.

そこで、本発明は、組織に存在する個人を音声により識別する場合に、その識別精度を簡易な設定と論理で向上させることができる音声認識装置、個人識別方法および個人識別プログラムを提供することを目的とする。 Accordingly, the present invention aims to provide a speech recognition apparatus, an individual identification method, and an individual identification program capable of improving the identification accuracy with simple settings and logic when identifying an individual existing in an organization by voice. aim.

本発明の音声認識装置は、音声に基づいて個人を識別する音声認識装置であって、利用者の音声に基づいてテキスト化された文字列から、組織における役割を表わす単語である対象語を抽出する対象語抽出部と、個人の識別情報と、その個人が属する組織およびその組織における個人の役割を表わす情報である所属情報とを対応付けた個人情報に基づいて、抽出された対象語が示す個人の候補を、利用者の属する組織との関係に応じて特定する個人特定部と、特定された個人の候補を出力する出力部とを備え、個人特定部が、勤務時間中の個人の候補の信頼度を、その勤務時間中でない個人の候補の信頼度より高くなるように決定し、同一の識別情報を有する個人の候補が複数存在する場合に、利用者の属する組織におけるその利用者より上位の役職にある個人の候補よりも、その上位の役職以外の個人の候補を優先的に選択し、出力部が、信頼度に基づいて個人の候補を出力することを特徴とする。 The speech recognition apparatus of the present invention is a speech recognition apparatus for identifying individuals based on voice, and extracts target words, which are words representing roles in an organization, from character strings converted into text based on the user's voice. A target word extracted based on a target word extracting unit, personal information that associates an individual's identification information with affiliation information that is information representing the organization to which the individual belongs and the role of the individual in the organization, and the extracted target word indicates An individual identification unit that identifies individual candidates according to the relationship with the organization to which the user belongs, and an output unit that outputs the identified individual candidates, wherein the individual identification unit outputs individual candidates during working hours. is determined to be higher than the reliability of individual candidates who are not working hours, and when there are multiple individual candidates with the same identification information, the user in the organization to which the user belongs It is characterized in that candidates for individuals other than those in higher positions are preferentially selected over candidates for individuals in higher positions, and the output unit outputs the candidates for individuals based on the degree of reliability.

本発明の個人識別方法は、音声に基づいて個人を識別する個人識別方法であって、利用者の音声に基づいてテキスト化された文字列から、組織における役割を表わす単語である対象語を抽出し、個人の識別情報と、その個人が属する組織およびその組織における個人の役割を表わす情報である所属情報とを対応付けた個人情報に基づいて、抽出された対象語が示す個人の候補を、利用者の属する組織との関係に応じて特定し、勤務時間中の個人の候補の信頼度を、その勤務時間中でない個人の候補の信頼度より高くなるように決定し、同一の識別情報を有する個人の候補が複数存在する場合に、利用者の属する組織におけるその利用者より上位の役職にある個人の候補よりも、その上位の役職以外の個人の候補を優先的に選択し、特定された個人の候補を信頼度に基づいて出力することを特徴とする。 The personal identification method of the present invention is a personal identification method for identifying an individual based on voice, and extracts a target word, which is a word representing a role in an organization, from a character string converted into text based on the user's voice. Then, based on the personal information that associates the identification information of the individual with the affiliation information that is information representing the organization to which the individual belongs and the role of the individual in that organization, the candidate of the individual indicated by the extracted target word is Identify according to the relationship with the organization to which the user belongs, determine the reliability of the individual candidate during working hours so that it is higher than the reliability of the individual candidate who is not working hours, and provide the same identification information If there are multiple candidates for individuals who have personal information, candidates for individuals other than those in higher positions in the organization to which the user belongs are preferentially selected and identified. It is characterized by outputting individual candidates based on the degree of reliability.

本発明の個人識別プログラムは、音声に基づいて個人を識別するコンピュータに適用される個人識別プログラムであって、コンピュータに、利用者の音声に基づいてテキスト化された文字列から、組織における役割を表わす単語である対象語を抽出する対象語抽出処理、個人の識別情報と、その個人が属する組織およびその組織における個人の役割を表わす情報である所属情報とを対応付けた個人情報に基づいて、抽出された対象語が示す個人の候補を利用者の属する組織との関係に応じて特定する個人特定処理、および、特定された個人の候補を出力する出力処理を実行させ、個人特定処理で、勤務時間中の個人の候補の信頼度を、その勤務時間中でない個人の候補の信頼度より高くなるように決定させ、同一の識別情報を有する個人の候補が複数存在する場合に、利用者の属する組織におけるその利用者より上位の役職にある個人の候補よりも、その上位の役職以外の個人の候補を優先的に選択させ、出力処理で、信頼度に基づいて個人の候補を出力させることを特徴とする。 The personal identification program of the present invention is a personal identification program applied to a computer that identifies an individual based on voice, and the computer identifies a role in an organization from a character string converted into text based on the user's voice. Based on the target word extraction process for extracting the target word that is the word to be represented, and the personal information that associates the identification information of the individual with the affiliation information that is information representing the organization to which the individual belongs and the role of the individual in that organization, An individual identification process for identifying individual candidates indicated by the extracted target words according to the relationship with the organization to which the user belongs, and an output process for outputting the identified individual candidates, and in the individual identification process, Determine the reliability of individual candidates during working hours so that it is higher than the reliability of individual candidates who are not during working hours, and if there are multiple individual candidates with the same identification information, the user's Preferential selection of candidates for individuals other than those in higher positions than candidates for individuals in higher positions than the user in the organization to which the user belongs, and outputting candidates for individuals based on the degree of reliability in the output process. characterized by

本発明によれば、組織に存在する個人を音声により識別する場合に、その識別精度を簡易な設定と論理で向上させることができる。 According to the present invention, when an individual existing in an organization is identified by voice, the identification accuracy can be improved with simple settings and logic.

本発明による音声認識装置の一実施形態の構成例を示すブロック図である。1 is a block diagram showing a configuration example of an embodiment of a speech recognition device according to the present invention; FIG. 組織情報の例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of organization information; 辞書の例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of a dictionary; 組織とその組織に属する人物との関係の例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of a relationship between an organization and persons belonging to the organization; 辞書に含まれる情報の例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of information included in a dictionary; 組織における予定の例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of a schedule in an organization; 音声認識装置の具体的な構成例を示す説明図である。FIG. 2 is an explanatory diagram showing a specific configuration example of a speech recognition device; 音声認識装置の動作例を示すフローチャートである。4 is a flowchart showing an operation example of the speech recognition device; 音声認識装置を用いた処理の具体例を示す説明図である。FIG. 4 is an explanatory diagram showing a specific example of processing using a speech recognition device; 本発明による音声認識装置の概要を示すブロック図である。1 is a block diagram showing an overview of a speech recognition device according to the present invention; FIG.

本発明は、音声による機器操作、特に個人を特定する必要のあるコミュニケーション機能において、組織体における一般的な文化に着目し、名簿（氏名）に組織に関する少量の付加情報を加えることでユーザの利便性を確保するものである。また、組織に関する情報以外の付加情報をも保持することで、様々な場面で応用が可能な仕組みを提供する。以下、本発明の実施形態を図面を参照して説明する。 The present invention focuses on the general culture of an organization in voice-based device operation, especially in communication functions that require the identification of individuals, and adds a small amount of additional information about the organization to the list (name) to improve the user's convenience. It is intended to ensure the In addition, by holding additional information other than the information about the organization, a mechanism that can be applied in various situations is provided. BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明による音声認識装置の一実施形態の構成例を示すブロック図である。本実施形態の音声認識装置１００は、音声に基づいて個人を識別する装置である。音声認識装置１００は、入力部１０と、音声データ生成部２０と、音声認識部３０と、対象語抽出部４０と、個人特定部５０と、出力部６０と、記憶部７０とを備えている。 FIG. 1 is a block diagram showing a configuration example of an embodiment of a speech recognition device according to the present invention. The speech recognition device 100 of this embodiment is a device that identifies an individual based on speech. The speech recognition device 100 includes an input unit 10, a speech data generation unit 20, a speech recognition unit 30, a target word extraction unit 40, an individual identification unit 50, an output unit 60, and a storage unit 70. .

入力部１０は、音声認識装置１００の利用者による音声の入力を受け付ける。具体的には、入力部１０は、利用者の音声を録音し電気信号に変換する。入力部１０は、例えば、マイクロフォン等により実現される。このように入力部１０は、利用者の音声を録音する機能を有することから、入力部１０のことを録音部と言うことができる。 The input unit 10 receives speech input from the user of the speech recognition apparatus 100 . Specifically, the input unit 10 records the user's voice and converts it into an electrical signal. The input unit 10 is implemented by, for example, a microphone. Since the input unit 10 thus has a function of recording the user's voice, the input unit 10 can be called a recording unit.

また、入力部１０は、利用者による音声の入力と共に、利用者を識別するための情報（以下、利用者識別情報と記す。）の入力を受け付けてもよい。利用者識別情報は、録音前に入力されてもよく、任意の個人認証技術により外部からの入力を変換して生成されてもよい。なお、入力された音声から利用者を識別可能な場合、利用者識別情報は、音声の入力とは別に入力されなくてもよい。 Further, the input unit 10 may receive input of information for identifying the user (hereinafter referred to as user identification information) together with voice input by the user. User identification information may be input before recording, or may be generated by converting input from the outside using any personal authentication technology. Note that when the user can be identified from the input voice, the user identification information does not have to be input separately from the input of the voice.

音声データ生成部２０は、受け付けた音声から音声データを生成する。音声データ生成部２０は、例えば、変換された電気信号に対してＡＤ（Analog-to-digital ）変換を行うことで、音声データを生成する。 The audio data generation unit 20 generates audio data from the received audio. The audio data generation unit 20 generates audio data by, for example, performing AD (Analog-to-Digital) conversion on the converted electric signal.

音声認識部３０は、生成された音声データをテキスト化する。テキスト化とは、音声データを文字コードで構成された文字や文字列（単文）に変換する処理を意味する。なお、文字列は、１つの単語または複数の単語の集合である。音声認識部３０は、一般的に知られた音声認識技術を用いて音声データをテキスト化すればよい。なお、テキスト化する言語は任意であり、例えば日本語であってもよく、英語であってもよい。本実施形態では、入力された音声がそのまま文字列に変換されることを想定する。 The speech recognition unit 30 converts the generated speech data into text. Text conversion means a process of converting voice data into characters or character strings (single sentences) composed of character codes. A character string is a single word or a set of multiple words. The speech recognition unit 30 may convert speech data into text using commonly known speech recognition technology. Note that any language may be used for text conversion, and may be, for example, Japanese or English. In this embodiment, it is assumed that the input voice is converted into a character string as it is.

なお、利用者識別情報が入力されない場合、入力部１０、音声データ生成部２０および音声認識部３０は、任意の音声認識技術を用いて話者を特定することにより、利用者識別情報を生成してもよい。 When the user identification information is not input, the input unit 10, the voice data generation unit 20, and the voice recognition unit 30 generate the user identification information by specifying the speaker using any voice recognition technology. may

記憶部７０は、音声認識装置１００が処理に用いる各種情報を記憶する。本実施形態では、記憶部７０は、前提情報として氏名やＩＤ以外にも、その個人が所属する組織上の役割、運用される組織の独自ルール、語彙に対応する表などを記憶する。以下の説明では、これらの表の集まりを「辞書」と表現する。 The storage unit 70 stores various types of information that the speech recognition apparatus 100 uses for processing. In this embodiment, the storage unit 70 stores, in addition to the name and ID, the prerequisite information such as the role in the organization to which the individual belongs, the unique rules of the operating organization, a table corresponding to vocabulary, and the like. In the following description, a collection of these tables will be referred to as a "dictionary".

以下、記憶部７０が記憶する辞書について詳述する。辞書には、根幹の情報として個人情報が登録される。個人情報には、個人の氏名の他、付加情報として、個人が属する組織やその組織上の役割が含まれる。以下の説明では、組織の構造を表わす情報のことを組織情報と記し、個人が属する組織およびその組織における個人の役割を表わす情報のことを所属情報と記す。会社の場合、個人の役割として、例えば、役職などが挙げられる。 The dictionary stored in the storage unit 70 will be described in detail below. Personal information is registered in the dictionary as basic information. Personal information includes an individual's name and, as additional information, the organization to which the individual belongs and the role of the organization. In the following description, information representing the structure of an organization is referred to as organization information, and information representing an organization to which an individual belongs and the role of the individual in that organization is referred to as affiliation information. In the case of a company, the roles of individuals include, for example, job titles.

また、個人情報は、個人の識別情報（氏名、ＩＤなど）と所属情報とが対応付けられた情報である。なお、個人情報には、所属情報以外の情報が対応付けられていてもよい。また、セパレータを用いて表現した所属情報のことを組織パスと記す。組織パスの指定方法は任意であり、例えば、階層構造を有する組織の上位から順にセパレータを使って組織パスが指定されてもよい。 Personal information is information in which personal identification information (name, ID, etc.) and belonging information are associated with each other. Information other than the affiliation information may be associated with the personal information. Affiliation information expressed using a separator is referred to as an organization path. An organization path can be specified by any method. For example, an organization path may be specified using a separator in order from the top of the organization having a hierarchical structure.

図２は、組織情報の例を示す説明図である。図２に例示する組織情報は、Ｃｏｐ株式会社には、Ｄ部が存在し、そのＤ部配下にはＡ課およびＢ課が含まれ、さらにＡ課配下には、Ｘ班およびＹ班が含まれることを示す。 FIG. 2 is an explanatory diagram showing an example of organization information. The organizational information exemplified in FIG. 2 is that Cop Co., Ltd. has Department D, which includes Sections A and B under Section D, and Sections X and Y under Section A. indicates that

図２に例示する組織情報を組織パスで表す場合、例えば、組織パスのセパレータを「/ 」とすると、Ｃｏｐ株式会社のＤ部に含まれるＡ課のＸ班は、「Ｃｏｐ株式会社/ Ｄ部/ Ａ課/ Ｘ班」と指定できる。また、例えば、Ａ課の課長に対しては、「Ｃｏｐ株式会社/ Ｄ部/ Ａ課/ Ｘ班/ 課長」のように役職名を指定できる。さらに、この組織パスを用いることで、実行委員会のような仮想の組織についても指定可能である。例えば、Ｃｏｐ株式会社のＣｏｍ委員会は、「Ｃｏｐ株式会社/ Ｃｏｍ委員会」のように指摘できる。 When the organization information illustrated in FIG. 2 is represented by an organization path, for example, if the separator of the organization path is "/", the X group of the A section included in the D department of Cop Co., Ltd. is "Cop Co., Ltd. / D department / Section A / Group X". For example, for the manager of Section A, a job title can be specified such as "Cop Co., Ltd./Department D/Section A/Group X/Manager." Furthermore, by using this organization path, it is possible to designate a virtual organization such as an executive committee. For example, the Com committee of Cop Corporation can be pointed out as "Cop Corporation/Com Committee".

なお、上記例では、セパレータとして「/ 」を用いたが、セパレータに用いられる記号は、「/ 」に限定されない。この記号は、組織に含まれない文字または文字列を用いてシステム環境ごとに一意になるように定義されればよい。このような文字または文字列をセパレータとして文字列を区切る方法は、広く知られており、様々な情報プラットフォーム上で実現することが可能である。 Although "/" is used as the separator in the above example, the symbol used for the separator is not limited to "/". This symbol may be defined uniquely for each system environment using characters or character strings that are not included in the organization. Such a method of separating character strings using characters or character strings as separators is widely known and can be implemented on various information platforms.

また、上記例では、組織の構造が、階層構造である場合について説明した。ただし、組織に階層が存在しない場合（組織上の階層関係が存在しない場合）、組織の構造は、必ずしも階層構造でなくてもよい。 Also, in the above example, the case where the structure of the organization is a hierarchical structure has been described. However, when there is no hierarchy in the organization (when there is no hierarchical relationship in the organization), the structure of the organization does not necessarily have to be a hierarchical structure.

また、辞書には、個人情報以外の関連情報として、組織に対する役割名を対応付けた情報が含まれていてもよい。以下、この組織と役割名とを対応付けた情報を、役割テーブルと記す。 Further, the dictionary may include information in which role names are associated with organizations as related information other than personal information. Information in which the organization and the role name are associated with each other is hereinafter referred to as a role table.

また、この役割テーブルを、読替え情報として、各組織独自の（ローカル）ルールに対して名前解決を行う際に用いられてもよい。例えば、組織体の特徴として、日本に限らず、実際には「組織名」とその組織を統べる「役職名」との関係が、容易に推定できるように設定されていないことがある。例えば、組織名「Ａ課」と役職名「課長」とは、関係の推定が容易であるが、組織名「Ａチーム」と役職名「マネージャ」、および、組織名「Ａ省」と役職名「大臣」は、関係の推定が困難である。一方で、これらの呼称は、その組織（企業）内では固定された文化であるため、企業（または部門）単位での対照表を用意できる。このような関係を示す対照表を役割テーブルとして辞書に格納しておくことで、様々な企業文化（体制）に適応することが可能である。 Also, this role table may be used as replacement information when performing name resolution for each organization's unique (local) rule. For example, as a characteristic of an organization, not only in Japan, but actually, the relationship between the "organization name" and the "position name" that governs the organization is not set in such a way that it can be easily guessed. For example, it is easy to infer the relationship between the organization name “A Section” and the position name “Section Manager”. "Minister" is difficult to presume related. On the other hand, since these names are a fixed culture within the organization (company), a comparison table can be prepared for each company (or department). By storing a comparison table showing such relationships in a dictionary as a role table, it is possible to adapt to various corporate cultures (systems).

さらに、役割名が独自の呼称に変換されて発言される場合や、省略されて発言される場合がある。例えば、役割「マネージャ」に対して、「マネ」と省略される場合などである。そこで、辞書には、変換された（例えば、省略された）役割名と対象語とを対応付けたテーブル（以下、変換テーブルと記す。）が含まれていてもよい。このように、入力された口語文に対し、変換可能な情報を辞書に登録しておくことで、単純な比較で名前解決が可能になる。 Furthermore, there are cases where the role name is converted into a unique name and uttered, or abbreviated and uttered. For example, the role "manager" may be abbreviated as "manager". Therefore, the dictionary may include a table (hereinafter referred to as a conversion table) in which converted (for example, omitted) role names and target words are associated with each other. In this way, by registering convertible information for an input colloquial sentence in the dictionary, name resolution can be performed by simple comparison.

組織情報や所属情報は、一般に、その組織が有する人事システム（図示せず）から取得することが可能である。そこで、辞書に含まれる組織情報や所属情報が、上記人事システムから生成されてもよい。さらに、人事システムには、個人の予定を管理する機能を有する場合がある。そこで、辞書には、個人の予定表が含まれていてもよい。 Organization information and affiliation information can generally be acquired from a personnel system (not shown) owned by the organization. Therefore, the organization information and affiliation information included in the dictionary may be generated from the personnel system. In addition, personnel systems may have the ability to manage personal schedules. Therefore, the dictionary may include a personal schedule.

図３は、記憶部７０が記憶する辞書の例を示す説明図である。図３に例示する辞書には、個人情報を保持した個人情報テーブルＴ１（名簿と記すこともある。）、役割テーブルに対応する組織・役職テーブルＴ２、および、予定表Ｔ３が含まれる FIG. 3 is an explanatory diagram showing an example of a dictionary stored in the storage unit 70. As shown in FIG. The dictionary illustrated in FIG. 3 includes a personal information table T1 (also referred to as a list) holding personal information, an organization/position table T2 corresponding to the role table, and a schedule table T3.

例えば、図３に例示する個人情報テーブルＴ１は、個人の識別情報であるＩＤおよび個人名と、付加情報である所属情報とが対応付けられていることを示す。また、図３に例示する組織・役職テーブルＴ２は、組織での役職をそれぞれ対応付けた表である。例えば、図３示す例では、組織における「グループ」には、「部長」および「部長代理」と呼ばれる役職が存在することを示す。また、予定表Ｔ３は、個人情報テーブルＴ１に含まれる個人と、その個人が在籍している時刻とを対応付けている。 For example, the personal information table T1 illustrated in FIG. 3 indicates that an ID and a personal name, which are personal identification information, and affiliation information, which is additional information, are associated with each other. Also, the organization/position table T2 illustrated in FIG. 3 is a table in which positions in the organization are associated with each other. For example, the example shown in FIG. 3 indicates that the "group" in the organization has positions called "general manager" and "deputy manager". Further, the schedule table T3 associates individuals included in the personal information table T1 with the times when the individuals are enrolled.

なお、図３に示す例では、個人情報テーブルに、所属情報が１つ対応付けられている場合を例示したが、所属情報は複数対応付けられていてもよい。例えば、組織改編や組織の名称変更の発生に伴い、新旧前後の所属情報が必要になる場合、個人情報テーブルＴ１は、新旧両方の所属情報を保持していてもよい。また、組織においては、実際の所属とは別に、委員会のような別の組織が設定されることがある。このような委員会の場合、例えば、役職として「ＸＸ委員」のような役割が考えられる。このように、いわゆる仮想的な組織を示す所属情報が個人情報テーブルに別途対応付けられていてもよい。このような所属情報を対応付けることで、個人の所属と同様の仕組みで検索することが可能になる。 In the example shown in FIG. 3, the personal information table is associated with one piece of affiliation information, but a plurality of pieces of affiliation information may be associated with the personal information table. For example, when new and old affiliation information is required due to organizational restructuring or organization name change, the personal information table T1 may hold both old and new affiliation information. Also, in an organization, apart from the actual affiliation, another organization such as a committee may be established. In the case of such a committee, for example, a role such as "XX committee member" can be considered as a position. In this way, affiliation information indicating a so-called virtual organization may be separately associated with the personal information table. By associating such affiliation information, it becomes possible to perform a search in the same manner as for personal affiliation.

また、役割テーブルは、組織および役割名の他、対象語がその役割らしさを示す程度を含んでいてもよい。以下、この程度のことを信頼度と記す。信頼度は、その役割が選ばれた場合の尤もらしさということができる。同様に、個人情報の所属情報にも、その個人の役割らしさを示す信頼度が付与されていてもよい。初期状態では、例えば、全ての項目に対し、同じ信頼度を設定しておけばよい。 Also, the role table may include the extent to which the target word indicates the role-likeness in addition to the organization and role name. Hereinafter, this degree is referred to as reliability. The reliability can be said to be the likelihood that the role will be chosen. Similarly, the affiliation information of personal information may also be given a degree of reliability indicating the role-likeness of the individual. In the initial state, for example, the same reliability may be set for all items.

対象語抽出部４０は、音声認識部３０によってテキスト化された文字列から、組織における役割を表わす単語（以下、対象語と記す。）を抽出する。例えば、組織内に「課長」と呼ばれる役職が存在する場合、対象語抽出部４０は、テキスト化された文字列から「課長」という単語を抽出してもよい。また、対象語抽出部４０は、対象語だけでなく、個人の識別情報（ＩＤや氏名）を抽出してもよい。個人の識別情報が直接抽出できれば、その対象語を識別する精度を向上させることができるからである。 The target word extraction unit 40 extracts words representing roles in an organization (hereinafter referred to as target words) from the character strings converted into text by the speech recognition unit 30 . For example, if there is a position called "section manager" in the organization, the target word extraction unit 40 may extract the word "section manager" from the text string. Moreover, the target word extraction unit 40 may extract not only the target word but also personal identification information (ID and name). This is because, if individual identification information can be directly extracted, the accuracy of identifying the target word can be improved.

対象語抽出部４０は、例えば、形態素解析により、テキスト化された文字列を語句に分割し、品詞を推定することにより対象語を抽出してもよい。なお、形態素解析の方法について、数々商用化されており、ここではその手法は問わない。ここで抽出された対象語を用いて、個人の特定（名前解決）が行われることになる。 The target word extraction unit 40 may, for example, divide the character string converted into text into words and phrases by morphological analysis, and extract the target word by estimating the part of speech. It should be noted that many methods of morphological analysis have been commercialized, and the method is not limited here. Individual identification (name resolution) is performed using the target words extracted here.

また、記憶部７０が上述する役割テーブルを保持している場合、この役割テーブルには、役割名が含まれていることから、対象語抽出部４０は、役割テーブルを用いて、テキスト化された文字列から役割テーブルに含まれる対象語を抽出してもよい。 Further, when the storage unit 70 holds the above-described role table, the role table includes role names. Target words included in the role table may be extracted from the character string.

さらに、記憶部７０が、上述する変換テーブルを保持している場合、この変換テーブルには、省略された役割名を示す単語が含まれているため、対象語抽出部４０は、変換テーブルを参照して文字列から省略された役割名を抽出し、抽出された役割名を対象語に変換してもよい。 Furthermore, when the storage unit 70 holds the conversion table described above, since the conversion table includes words indicating omitted role names, the target word extraction unit 40 refers to the conversion table. and extract the omitted role name from the character string, and convert the extracted role name into the target word.

個人特定部５０は、個人情報（すなわち、個人の識別情報と所属情報とを対応付けた情報）に基づいて、抽出された対象語が示す個人の候補を、利用者の属する組織との関係に応じて特定する。具体的には、個人特定部５０は、利用者の属する組織の構造における一部または全部が一致する個人のうち、対象語が示す役割を有する個人を候補として特定する。 Based on personal information (that is, information in which personal identification information and affiliation information are associated), the individual identification unit 50 identifies individual candidates indicated by the extracted target words in relation to the organization to which the user belongs. Identify accordingly. Specifically, the individual identifying unit 50 identifies, as candidates, individuals having roles indicated by the target words, among individuals who partially or wholly match the structure of the organization to which the user belongs.

個人特定部５０は、例えば、辞書（役割テーブルおよび個人情報）とのマッチングにより、形態素解析等によって抽出された対象語の名前解決を行い、個人の候補の絞り込みを行ってもよい。例えば、所属情報が上述する組織パスで表されている場合、個人特定部５０は、構造の一部または全部が一致するか否かを、上述する組織パスの上位の一部または全部が一致するか否かで判断してもよい。また、個人特定部５０は、個人の候補を個人名で特定してもよく、ＩＤなどの識別情報で特定してもよい。 The individual identification unit 50 may, for example, perform name resolution of target words extracted by morphological analysis or the like by matching with a dictionary (role table and personal information) to narrow down individual candidates. For example, when the affiliation information is represented by the above-described organization path, the individual identification unit 50 determines whether part or all of the structure matches. You can judge whether or not In addition, the individual identification unit 50 may identify individual candidates by individual names or identification information such as IDs.

以下、本実施形態の個人特定部５０が行う処理を具体例を用いて説明する。図４は、組織とその組織に属する人物との関係の例を示す説明図である。また、図５は、辞書に含まれる情報の例を示す説明図である。図５に示す例では、辞書には、個人情報テーブルＴ４（名簿）および組織・役職テーブルＴ５が含まれる。 The processing performed by the personal identification unit 50 of this embodiment will be described below using a specific example. FIG. 4 is an explanatory diagram showing an example of a relationship between an organization and persons belonging to the organization. FIG. 5 is an explanatory diagram showing an example of information included in the dictionary. In the example shown in FIG. 5, the dictionary includes a personal information table T4 (name list) and an organization/post table T5.

なお、個人特定部５０が行う処理に先立ち、利用者識別情報として利用者であるユーザＡＡＡが特定されているものとする。図４に示す例では、ユーザＡＡＡは、Ａ課のＸ班に属する。 It is assumed that the user AAA is specified as the user identification information prior to the processing performed by the individual specifying unit 50 . In the example shown in FIG. 4, user AAA belongs to X group of A section.

まず、コミュニケーションを取りたいユーザＡＡＡが、その希望を発声して音声を録音し、録音された音声から生成された音声データがそのままテキスト化される。そして、そのテキスト化された音声データに対して形態素解析が行われ、指示語である名詞が抽出されているとする。ここでは、ユーザＡＡＡが課長を呼ぶため、「課長」と発声したとする。 First, a user AAA who wants to communicate vocalizes his/her desire to make a voice recording, and the voice data generated from the recorded voice is converted into text as it is. Then, it is assumed that morphological analysis is performed on the speech data converted into text, and nouns, which are referents, are extracted. Here, it is assumed that the user AAA utters "section manager" to call the section manager.

個人特定部５０は、抽出された指示語が、事前に定義された辞書に含まれる役職の対照表（役割テーブル）に存在するか検索する。図５に例示する組織・役職テーブルＴ５を検索することにより、「課長」が組織「課」の役職であることが分かる。そこで、個人特定部５０は、ユーザＡＡＡの所属情報「Ｃｏｐ株式会社/ Ｄ部/ Ａ課/ Ｘ班」から、組織「課」まで（すなわち、「Ｃｏｐ株式会社/ Ｄ部/ Ａ課」）を抽出し、共通の所属情報を有する個人を、個人情報テーブルＴ４から抽出する。図５に示す例では、ユーザＩＤが０００１から０００７までのユーザ（すなわち、ユーザＤＤＤ、ユーザＣＣＣ、ユーザＢＢＢ、ユーザＥＥＥ）が抽出される。 The individual identification unit 50 searches whether the extracted reference word exists in a comparison table (role table) of positions included in a predefined dictionary. By searching the organization/position table T5 illustrated in FIG. 5, it is found that "section manager" is the position of the organization "section". Therefore, the individual identification unit 50 identifies the user AAA's affiliation information "Cop Co., Ltd./D department/Section A/Group X" to the organization "Section" (that is, "Cop Co., Ltd./D department/Section A"). Individuals having common affiliation information are extracted from the personal information table T4. In the example shown in FIG. 5, users with user IDs from 0001 to 0007 (that is, users DDD, users CCC, users BBB, and users EEE) are extracted.

そして、個人特定部５０は、抽出した組織に属する個人のうち、抽出された指示語を役割とする個人を候補として特定する。図５に示す例では、組織「Ｃｏｐ株式会社/ Ｄ部/ Ａ課」におけるユーザＤＤＤの役割が「課長」であるため、個人特定部５０は、ユーザＤＤＤを候補として特定する。 Then, the individual identifying unit 50 identifies, as candidates, individuals whose role is the extracted referent from among the individuals belonging to the extracted organization. In the example shown in FIG. 5, the role of user DDD in the organization "Cop Co., Ltd./Dept./A Section" is "Section Manager", so the individual identifying unit 50 identifies user DDD as a candidate.

すなわち、図４に例示する組織には、Ｂ課の課長（ユーザＦＦＦ）も存在するが、個人特定部５０は、結果として「Ａ課」に属する課長ＤＤＤを候補として特定する。 That is, although the organization illustrated in FIG. 4 also includes a section manager (user FFF) of section B, the individual identification unit 50 identifies section manager DDD belonging to "section A" as a candidate.

なお、抽出された候補が一意に絞り込める事が望ましいが、複数の候補が挙げられる場合も想定される。そこで、個人特定部５０は、各候補に対する信頼度を決定してもよい。例えば、役割テーブルに信頼度が含まれている場合、個人特定部５０は、特定された候補の組織における役職に基づいて、その候補の信頼度を役割テーブルから決定してもよい。また、例えば、個人情報の所属情報に信頼度が含まれている場合、個人特定部５０は、特定された個人の所属情報から、信頼度を決定してもよい。 Although it is desirable that the extracted candidates can be uniquely narrowed down, it is also assumed that a plurality of candidates may be listed. Therefore, the individual identification unit 50 may determine the reliability for each candidate. For example, if the role table includes trust, the individual identifier 50 may determine the trust of the identified candidate from the role table based on the candidate's position in the organization. Further, for example, when the affiliation information of the personal information includes the reliability, the individual identification unit 50 may determine the reliability from the affiliation information of the identified individual.

さらに、個人情報に複数の所属情報（例えば、新旧の所属情報）が存在し、それぞれの所属情報について信頼度が設定されている場合、個人特定部５０は、複数の所属情報のうち、特定された個人の所属情報から信頼度を決定してもよい。 Furthermore, when personal information includes a plurality of pieces of affiliation information (for example, old and new affiliation information) and a reliability level is set for each piece of affiliation information, the individual identification unit 50 determines which of the plurality of pieces of affiliation information is identified. Reliability may be determined from personal affiliation information.

また、個人特定部５０は、個人の予定に基づいて信頼度を算出してもよい。図６は、組織における予定の例を示す説明図である。図６に示す例では、Ｘ班の勤務時間が９時から１６時であり、Ｙ班の勤務時間が１７時から２４時であることを示す。このような予定の場合、ユーザＡＡＡが１２時にコミュニケーションをとりたい相手はＸ班のメンバー（ユーザＣＣＣ、ユーザＢＢＢ）である可能性が高い。そこで、個人特定部５０は、勤務時間帯（組織に関わっている時間帯）に存在する個人の信頼度を、勤務時間帯以外の個人の信頼度より高くなるように決定してもよい。 Also, the individual identification unit 50 may calculate the reliability based on an individual's schedule. FIG. 6 is an explanatory diagram showing an example of a schedule in an organization. In the example shown in FIG. 6, the working hours of the X group are from 9:00 to 16:00, and the working hours of the Y group are from 17:00 to 24:00. In such a schedule, there is a high possibility that the person with whom user AAA wants to communicate at 12:00 is a member of team X (user CCC, user BBB). Therefore, the individual identifying unit 50 may determine the reliability of individuals present during working hours (time spent in the organization) to be higher than the reliability of individuals outside of working hours.

なお、個人の候補を特定する際、複数の信頼度が決定される場合がある。この場合、個人特定部５０は、決定された複数の信頼度に対する演算処理（例えば、加算、乗算など）を行って、最終的な信頼度を決定してもよい。 It should be noted that multiple confidence levels may be determined when identifying individual candidates. In this case, the individual identification unit 50 may perform arithmetic processing (for example, addition, multiplication, etc.) on the determined reliability levels to determine the final reliability level.

出力部６０は、特定された個人の候補を出力する。出力部６０は、個人の候補のうち、１名を出力してもよく、全てを出力してもよい。例えば、個人特定部５０が、個人の候補に対する信頼度を算出している場合、出力部６０は、特定された信頼度に基づいて個人の候補を出力してもよい。出力部６０は、例えば、信頼度が最も高い１名を出力してもよく、信頼度の高い順から所定の人数（または全て）を出力してもよい。 The output unit 60 outputs the identified individual candidates. The output unit 60 may output one or all of the individual candidates. For example, when the individual identifying unit 50 calculates the reliability of individual candidates, the output unit 60 may output individual candidates based on the identified reliability. The output unit 60 may output, for example, one person with the highest reliability, or may output a predetermined number of persons (or all) in descending order of reliability.

また、出力部６０は、特定された候補を辞書に基づいてＩＤに変換してもよい。このようなＩＤは、システム上の様々なアプリケーション（例えば、通話などのコミュニケーション）に利用可能である。 Also, the output unit 60 may convert the specified candidate into an ID based on a dictionary. Such IDs can be used for various applications on the system (for example, communications such as telephone calls).

また、出力部６０が、信頼度が最も高い個人の候補の識別情報のみを出力してもよい。このような識別情報のみを出力することで、この識別情報を利用する他のアプリケーションが、処理を行う対象を一意に特定できる。本実施形態で想定されるような組織内であれば、多少の認識誤りを許容できる可能性があるためである。 Alternatively, the output unit 60 may output only the identification information of the candidate with the highest reliability. By outputting only such identification information, another application that uses this identification information can uniquely identify the object to be processed. This is because there is a possibility that some recognition errors can be tolerated within the organization assumed in this embodiment.

図７は、本実施形態の音声認識装置１００の具体的な構成例を示す説明図である。利用者１０１の音声はマイクロフォン１０２によって電気信号に変換される。このマイクロフォン１０２は、図１の入力部１０に対応する。 FIG. 7 is an explanatory diagram showing a specific configuration example of the speech recognition device 100 of this embodiment. A voice of the user 101 is converted into an electric signal by the microphone 102 . This microphone 102 corresponds to the input unit 10 in FIG.

また、図７に例示する処理部１０３および処理部１０４は、図１に例示する音声データ生成部２０、音声認識部３０、対象語抽出部４０、個人特定部５０および出力部６０が行う処理の一部または全部を行う。また、アプリケーション１０５は、処理部１０３または処理部１０４により出力された結果をもとに、実際の動作を実行する。 Further, the processing unit 103 and the processing unit 104 illustrated in FIG. 7 perform processing performed by the speech data generation unit 20, the speech recognition unit 30, the target word extraction unit 40, the individual identification unit 50, and the output unit 60 illustrated in FIG. do some or all Also, the application 105 executes actual operations based on the results output by the processing unit 103 or the processing unit 104 .

図７に示す例では、処理部１０３は、処理をローカルで行う場合を想定し、処理部１０４は、処理をオンラインで行う場合を想定する。処理部１０３および処理部１０４が行う処理は、本実施形態の音声認識装置１００による処理を実行する環境に応じて決定される。 In the example shown in FIG. 7, it is assumed that the processing unit 103 performs processing locally, and the processing unit 104 performs processing online. The processing performed by the processing units 103 and 104 is determined according to the environment in which the processing by the speech recognition apparatus 100 of this embodiment is performed.

一般に、電気信号になった音声データのＡＤ変換を行う処理（録音処理）はローカルに配置される事が望ましい。また、音声認識処理のような多くの演算を要する処理はリソースを確保しやすいオンライン上の環境で処理されるのが一般的である。しかし、例えば、アナログの音声信号自体をそのままオンライン上に伝送できる環境であれば、録音処理がオンラインで行われてもよい。一方、音声認識処理がローカルで実現できる環境であれば、音声認識処理がローカルで行われてもよい。 In general, it is desirable that processing (recording processing) for performing AD conversion of audio data converted into electric signals be locally arranged. Processing that requires a large number of operations, such as speech recognition processing, is generally performed in an online environment where resources can be easily secured. However, for example, in an environment where the analog audio signal itself can be transmitted online as it is, the recording process may be performed online. On the other hand, if there is an environment in which speech recognition processing can be implemented locally, speech recognition processing may be performed locally.

音声データ生成部２０と、音声認識部３０と、対象語抽出部４０と、個人特定部５０と、出力部６０とは、プログラム（個人識別プログラム）に従って動作するコンピュータのプロセッサ（例えば、ＣＰＵ（Central Processing Unit ）、ＧＰＵ（Graphics
Processing Unit））によって実現される。例えば、プログラムは、記憶部７０に記憶され、プロセッサは、そのプログラムを読み込み、プログラムに従って、音声データ生成部２０、音声認識部３０、対象語抽出部４０、個人特定部５０、および、出力部６０として動作してもよい。また、音声認識装置１００の機能がＳａａＳ（Software as a Service ）形式で提供されてもよい。 The speech data generation unit 20, the speech recognition unit 30, the target word extraction unit 40, the individual identification unit 50, and the output unit 60 are implemented by a computer processor (e.g., CPU (Central Processing Unit), GPU (Graphics
Processing Unit)). For example, the program is stored in the storage unit 70, the processor reads the program, and according to the program, the voice data generation unit 20, the voice recognition unit 30, the target word extraction unit 40, the individual identification unit 50, and the output unit 60 may operate as Also, the functions of the speech recognition device 100 may be provided in a SaaS (Software as a Service) format.

また、音声データ生成部２０と、音声認識部３０と、対象語抽出部４０と、個人特定部５０と、出力部６０とは、それぞれが専用のハードウェアで実現されていてもよい。また、各装置の各構成要素の一部又は全部は、汎用または専用の回路（circuitry ）、プロセッサ等やこれらの組合せによって実現されてもよい。これらは、単一のチップによって構成されてもよいし、バスを介して接続される複数のチップによって構成されてもよい。各装置の各構成要素の一部又は全部は、上述した回路等とプログラムとの組合せによって実現されてもよい。 Further, the speech data generation unit 20, the speech recognition unit 30, the target word extraction unit 40, the individual identification unit 50, and the output unit 60 may each be realized by dedicated hardware. Also, part or all of each component of each device may be implemented by general-purpose or dedicated circuitry, processors, etc., or combinations thereof. These may be composed of a single chip, or may be composed of multiple chips connected via a bus. A part or all of each component of each device may be implemented by a combination of the above-described circuits and the like and programs.

また、音声認識装置１００の各構成要素の一部又は全部が複数の情報処理装置や回路等により実現される場合には、複数の情報処理装置や回路等は、集中配置されてもよいし、分散配置されてもよい。例えば、情報処理装置や回路等は、クライアントサーバシステム、クラウドコンピューティングシステム等、各々が通信ネットワークを介して接続される形態として実現されてもよい。 Further, when part or all of each component of the speech recognition device 100 is realized by a plurality of information processing devices, circuits, etc., the plurality of information processing devices, circuits, etc. may be centrally arranged, They may be distributed. For example, the information processing device, circuits, and the like may be implemented as a form in which each is connected via a communication network, such as a client-server system, a cloud computing system, or the like.

また、記憶部７０は、例えば、磁気ディスク等により実現される。 Also, the storage unit 70 is realized by, for example, a magnetic disk or the like.

次に、本実施形態の動作例を説明する。図８は、本実施形態の音声認識装置１００の動作例を示すフローチャートである。対象語抽出部４０は、利用者の音声に基づいてテキスト化された文字列から対象語を抽出する（ステップＳ１１）。個人特定部５０は、個人情報に基づいて、抽出された対象語が示す個人の候補を、利用者の属する組織との関係に応じて特定する（ステップＳ１２）。出力部６０は、特定された個人の候補を出力する（ステップＳ１３）。 Next, an operation example of this embodiment will be described. FIG. 8 is a flow chart showing an operation example of the speech recognition device 100 of this embodiment. The target word extracting unit 40 extracts a target word from the character string converted into text based on the user's voice (step S11). Based on the personal information, the individual identification unit 50 identifies individual candidates indicated by the extracted target words according to the relationship with the organization to which the user belongs (step S12). The output unit 60 outputs the identified individual candidates (step S13).

次に、本実施形態の音声認識装置１００を用いた具体的な処理を説明する。図９は、本実施形態の音声認識装置１００を用いた処理の具体例を示す説明図である。 Next, specific processing using the speech recognition apparatus 100 of this embodiment will be described. FIG. 9 is an explanatory diagram showing a specific example of processing using the speech recognition apparatus 100 of this embodiment.

まず、入力部１０が音声の録音を行い、音声データ生成部２０が、録音された音声から音声データの生成を行う（ステップＳ１０１）。次に、音声認識部３０が、音声データから、例えば、話者認識を行うための情報を用いて音声認識を行い、単文（文字列）および話者情報を出力する（ステップＳ１０２）。そして、音声認識部３０は、形態素解析を行い、名詞のテキスト句および話者情報を出力する（ステップＳ１０３）。対象語抽出部４０は、例えば、辞書１７０に含まれる名簿（ユーザ一覧）１７１、役職名１７２および予定表１７３などを参照して、相手候補および信頼度を出力する（ステップＳ１０４）。個人特定部５０は、出力された候補および信頼度に基づいて個人の識別情報（ＩＤ）を出力する。 First, the input unit 10 records voice, and the voice data generation unit 20 generates voice data from the recorded voice (step S101). Next, the speech recognition unit 30 performs speech recognition from the speech data using, for example, information for speaker recognition, and outputs simple sentences (character strings) and speaker information (step S102). Then, the speech recognition unit 30 performs morphological analysis and outputs the noun text phrase and speaker information (step S103). The target word extraction unit 40, for example, refers to the name list (user list) 171, job titles 172, schedule 173, etc. included in the dictionary 170, and outputs partner candidates and reliability (step S104). The individual identification unit 50 outputs individual identification information (ID) based on the output candidates and reliability.

出力されたＩＤは、各アプリケーションで利用される（ステップＳ１０６）。なお、ステップＳ１０３において、動詞や行動（機能）を表わす句が抽出されている場合、各アプリケーションは、その動詞に応じた処理を行ってもよい。その際、各アプリケーションは、動詞や他の言葉に対して、同様の読替え（変換）処理を行ってもよい。これにより、個人の経験や環境（状況や文化）により、同じ行動でも異なる表現（例えば、訛りなど）を用いる場合であっても、機械的に吸収して適切な行動を行うことが可能になる。これは、日本語以外にも応用可能である。 The output ID is used in each application (step S106). In step S103, if a verb or a phrase representing an action (function) is extracted, each application may perform processing according to the verb. At that time, each application may perform similar reading replacement (conversion) processing for verbs and other words. This makes it possible to mechanically absorb and perform appropriate actions even when different expressions (for example, accent) are used for the same behavior depending on the individual's experience and environment (situation and culture). . This is also applicable to languages other than Japanese.

なお、このような辞書の生成や更新に機械学習が用いられてもよい。例えば、学習の初期状態では、上記に示したような静的情報が必要と考えられるが、機械学習により、辞書の精度を向上できることが見込まれる。また、本実施形態は、コンピュータシステムによって実現することが可能であるが、一部の処理が人手で行われてもよい。例えば、既に作成済みの議事録文書を元に動詞（アクション）の抽出が行われ、候補の絞り込みまで行われた情報を利用者に提示することで、最終的な判断や実行を利用者に委ねる構成も想定できる。 Note that machine learning may be used to generate and update such dictionaries. For example, in the initial state of learning, static information as described above is considered necessary, but machine learning is expected to improve the accuracy of the dictionary. Also, although the present embodiment can be implemented by a computer system, part of the processing may be performed manually. For example, verbs (actions) are extracted based on the minutes document that has already been created, and by presenting information that has been narrowed down to candidates, the final decision and execution are left to the user. A configuration can also be envisioned.

次に、本実施形態の音声認識装置１００の変形例を説明する。上記実施形態では、音声に基づいてテキスト化された文字列から対象語を抽出し、その対象語に基づいて個人の候補を特定する場合について説明した。本変形例では、同一の識別情報（例えば、同姓同名）を有する個人を特定する方法を説明する。 Next, a modified example of the speech recognition device 100 of this embodiment will be described. In the above embodiment, a case has been described in which a target word is extracted from a character string converted into text based on voice, and individual candidates are specified based on the target word. In this modified example, a method for identifying individuals having the same identification information (for example, same surname and same name) will be described.

例えば、図４に例示する組織には、個人名が「ＤＤＤ」である個人が２名存在する。１名は、Ａ課の課長であり、もう１名は、Ａ課Ｙ班の「ＤＤＤ」である。この状況において、ユーザＡＡＡが個人名「ＤＤＤ」を発言したとする。この場合、ユーザＡＡＡが対象とする個人は、Ａ課の課長である「ＤＤＤ」よりもＡ課Ｙ班の「ＤＤＤ」である可能性が高い。これは、日本人が相手を呼ぶときの傾向として、役職が上位の相手ほど名前で呼ばない（つまり、役職名で指定する）からである。そこで、個人特定部５０は、自身が所属するグループとは別のグループ（例えば、班）に検索範囲を段階的に広げることで、個人の候補を特定してもよい。 For example, in the organization illustrated in FIG. 4, there are two individuals whose personal names are "DDD". One is the manager of Section A, and the other is "DDD" of Section Y of Section A. In this situation, assume that user AAA speaks the personal name "DDD". In this case, the individual targeted by user AAA is more likely to be 'DDD' of Y group in A section than 'DDD' who is the manager of A section. This is because Japanese people tend to call others less by their names (that is, they designate them by their job titles) if they have a higher position. Therefore, the individual identification unit 50 may identify individual candidates by expanding the search range step by step to include groups (for example, groups) other than the group to which the individual belongs.

以上のように、本実施形態では、対象語抽出部４０が、利用者の音声に基づいてテキスト化された文字列から対象語を抽出し、個人特定部５０が、個人情報に基づいて、抽出された対象語が示す個人の候補を、利用者の属する組織との関係に応じて特定する。そして、出力部６０が、特定された個人の候補を出力する。よって、組織に存在する個人を音声により識別する場合に、その識別精度を簡易な設定と論理で向上させることができる。 As described above, in the present embodiment, the target word extraction unit 40 extracts the target word from the character string converted into text based on the user's voice, and the individual identification unit 50 extracts the target word based on the personal information. Individual candidates indicated by the given target words are identified according to the relationship with the organization to which the user belongs. Then, the output unit 60 outputs the identified individual candidates. Therefore, when an individual existing in an organization is identified by voice, the identification accuracy can be improved with simple settings and logic.

すなわち、本実施形態では日本人の習慣に着目し、利用者の指示の不確定性をデータベースを組み合わせる事で補っている。例えば、近年では、このような課題を解決するため、利用者個人の普段の言動から機械学習により関係性を推定することも行われている。機械学習は、演算量が高い一方、本実施形態の構成は、汎用性も高く、組織の構成情報など自動生成が容易な静的データを用いている。そのため、より少ない演算量で利用者の利便性をより高める事ができる。また、本実施形態の構成は、学習する必要がないため、例えば、職場等で体制変更等のあった場合でも、その変更をすぐに反映させることが可能になる。 That is, in the present embodiment, attention is focused on Japanese habits, and the uncertainty of the user's instructions is compensated for by combining databases. For example, in recent years, in order to solve such a problem, it is also performed to estimate the relationship by machine learning from the usual speech and behavior of individual users. While machine learning requires a large amount of calculation, the configuration of the present embodiment has high versatility and uses static data that can be easily automatically generated, such as organization configuration information. Therefore, the user's convenience can be further improved with a smaller amount of calculation. In addition, since the configuration of the present embodiment does not require learning, for example, even if there is a change in the system at the workplace or the like, the change can be reflected immediately.

また、本実施形態では、人間同士でコミュニケーションをとる方法と近い表現の発話に対して、機械が自動でその文意を把握して動作する。そのため、本実施形態の音声認識装置を用いることで、適応性の高いアプリケーションを提供できるようになる。また、発言の文意を元に対象を限定する手法は、例えば、話中の語句が示す指示対象を自動で補完する用途に用いることが可能である。また、ヒト対ヒト以外にも、ヒト対機械について同様の手法を用いる事で、機械操作を実現できる。 In addition, in this embodiment, the machine automatically grasps the meaning of utterances with expressions similar to those used for communication between humans and operates. Therefore, by using the speech recognition apparatus of this embodiment, it is possible to provide highly adaptable applications. Also, the technique of limiting the target based on the meaning of the statement can be used, for example, for automatically complementing the referent indicated by the word in speech. In addition to human-to-human interaction, machine operation can also be realized by using a similar method for human-to-machine interaction.

次に、本発明の概要を説明する。図１０は、本発明による音声認識装置の概要を示すブロック図である。本発明による音声認識装置８０は、音声に基づいて個人を識別する音声認識装置（例えば、音声認識装置１００）であって、利用者の音声に基づいてテキスト化された文字列から、組織における役割（例えば、役職）を表わす単語である対象語を抽出する対象語抽出部８１（例えば、対象語抽出部４０）と、個人の識別情報（例えば、氏名、ＩＤ）と、その個人が属する組織およびその組織における個人の役割を表わす情報である所属情報とを対応付けた個人情報に基づいて、抽出された対象語が示す個人の候補を、利用者の属する組織との関係に応じて特定する個人特定部８２（例えば、個人特定部５０）と、特定された個人の候補を出力する出力部８３（例えば、出力部６０）とを備えている。 Next, an outline of the present invention will be described. FIG. 10 is a block diagram showing an outline of a speech recognition device according to the invention. The speech recognition device 80 according to the present invention is a speech recognition device (for example, the speech recognition device 100) that identifies an individual based on voice, and uses a character string converted into text based on the user's voice to determine the role in the organization. A target word extraction unit 81 (for example, the target word extraction unit 40) that extracts a target word that is a word representing (for example, a position), an individual identification information (for example, name, ID), an organization to which the individual belongs Individuals who identify individual candidates indicated by the extracted target words according to their relationship with the organization to which the user belongs It includes an identification unit 82 (for example, the individual identification unit 50) and an output unit 83 (for example, the output unit 60) that outputs candidates for the identified individual.

そのような構成により、組織に存在する個人を音声により識別する場合に、その識別精度を簡易な設定と論理で向上させることができる。 With such a configuration, when an individual existing in an organization is identified by voice, the identification accuracy can be improved with simple settings and logic.

具体的には、個人特定部８２は、利用者の属する組織の構造における一部または全部が一致する個人のうち、対象語が示す役割を有する個人を候補として特定してもよい。 Specifically, the individual identification unit 82 may identify, as a candidate, an individual having a role indicated by the target word, among individuals who partially or wholly match the structure of the organization to which the user belongs.

また、個人特定部８２は、特定された候補の組織における役割に基づいて、組織における役割と対象語がその役割らしさを示す信頼度とを対応付けた信頼度テーブルから、特定された候補の信頼度を決定し、出力部８３は、特定された信頼度に基づいて、個人の候補を出力してもよい。 Further, based on the role of the identified candidate in the organization, the individual identification unit 82 selects the reliability of the identified candidate from a reliability table in which the role in the organization is associated with the reliability indicating the likelihood of the role of the target word. and the output unit 83 may output individual candidates based on the identified confidence level.

具体的には、個人情報は、複数の所属情報を含み、その所属情報ごとに信頼度が付与されていてもよい。そして、個人特定部８２は、複数の所属情報のうち、特定された個人の所属情報から信頼度を決定してもよい。 Specifically, the personal information may include a plurality of pieces of affiliation information, and a degree of reliability may be assigned to each piece of affiliation information. Then, the individual identification unit 82 may determine the reliability from the affiliation information of the identified individual among the plurality of affiliation information.

その際、出力部８３は、信頼度が最も高い個人の候補の識別情報を出力してもよい。 At that time, the output unit 83 may output the identification information of the individual candidate with the highest reliability.

また、対象語抽出部８１は、省略された役割名と対象語とを対応付けた変換テーブルを参照して、テキスト化された文字列から省略された役割名を抽出し、抽出された役割名を対応する対象語に変換してもよい。 In addition, the target word extracting unit 81 refers to a conversion table that associates the omitted role names with target words, extracts the omitted role names from the character strings converted into text, and extracts the extracted role names. may be converted to the corresponding target term.

本発明は、音声を認識して個人を識別する音声認識装置に好適に適用される。例えば、音声コミュニケーションを第一に想定した場合、人の曖昧な口頭による指示に基づいて動作させるシステムに本発明を好適に適用できる。 INDUSTRIAL APPLICABILITY The present invention is preferably applied to a voice recognition device that recognizes voice and identifies an individual. For example, assuming voice communication first, the present invention can be suitably applied to a system that operates based on vague verbal instructions from a person.

１０入力部
２０音声データ生成部
３０音声認識部
４０対象語抽出部
５０個人特定部
６０出力部
７０記憶部
１００音声認識装置 REFERENCE SIGNS LIST 10 input unit 20 speech data generation unit 30 speech recognition unit 40 target word extraction unit 50 individual identification unit 60 output unit 70 storage unit 100 speech recognition device

Claims

A speech recognition device for identifying individuals based on speech,
a target word extraction unit for extracting target words, which are words representing roles in an organization, from character strings converted into text based on user's voice;
Based on the personal information that associates the identification information of the individual with the affiliation information that is information representing the organization to which the individual belongs and the role of the individual in the organization, the candidate for the individual indicated by the extracted target word an individual identification part that identifies according to the relationship with the organization to which the user belongs;
an output unit that outputs the identified individual candidate,
The individual identification unit determines the reliability of the individual candidate during working hours to be higher than the reliability of the individual candidate who is not during the working hours, and there are a plurality of individual candidates having the same identification information. In this case, preferentially select candidates for individuals other than those in higher positions than candidates for individuals in higher positions than the user in the organization to which the user belongs,
The speech recognition device, wherein the output unit outputs individual candidates based on the reliability.

2. The speech recognition apparatus according to claim 1, wherein the individual identifying unit identifies, as a candidate, an individual who has a role indicated by the target word from among individuals who partially or wholly match the structure of the organization to which the user belongs.

Based on the role of the identified candidate in the organization, the individual identification unit determines the reliability of the identified candidate from a reliability table that associates the role in the organization with the reliability indicating that the target word is likely to play the role. decide and
3. The speech recognition apparatus according to claim 1, wherein the output unit outputs individual candidates based on the specified reliability.

Personal information includes multiple pieces of affiliation information, and reliability is assigned to each piece of affiliation information.
4. The speech recognition apparatus according to claim 3, wherein the individual identification unit determines the reliability from the affiliation information of the identified individual among the plurality of affiliation information.

5. The speech recognition device according to any one of claims 1 to 4, wherein the output unit outputs the identification information of the individual candidate with the highest reliability.

The target word extraction unit refers to a conversion table that associates abbreviated role names with target words, extracts the abbreviated role names from the textual character string, and associates the extracted role names with each other. 6. The speech recognition device according to any one of claims 1 to 5, wherein the target word is converted into a target word.

A personal identification method for identifying an individual based on voice, comprising:
Extracting target words, which are words representing roles in an organization, from character strings converted into text based on the user's voice,
Based on the personal information that associates the identification information of the individual with the affiliation information that is information representing the organization to which the individual belongs and the role of the individual in the organization, the candidate for the individual indicated by the extracted target word Identify according to the relationship with the organization to which the user belongs,
The reliability of candidates for individuals during working hours is determined to be higher than the reliability for candidates for individuals who are not during working hours, and when there are multiple candidates for individuals with the same identification information, the said user preferentially select candidates for individuals other than those in higher positions than candidates in higher positions than the user in the organization to which the user belongs,
An individual identification method, characterized by outputting an identified individual candidate based on the reliability.

8. The personal identification method according to claim 7, wherein, among individuals who partially or wholly match the structure of the organization to which the user belongs, individuals having roles indicated by the target words are specified as candidates.

A personal identification program applied to a computer that identifies a person based on sound, comprising:
to the computer;
Target word extraction processing for extracting target words, which are words representing roles in an organization, from character strings converted into text based on the user's voice,
Based on the personal information that associates the identification information of the individual with the affiliation information that is information representing the organization to which the individual belongs and the role of the individual in the organization, the candidate for the individual indicated by the extracted target word Individual identification processing that identifies according to the relationship with the organization to which the user belongs, and
Execute output processing for outputting the identified individual candidates,
In the individual identification process, the reliability of the candidate for the individual during working hours is determined to be higher than the reliability for the candidate for the individual who is not during the working hours, and there are multiple candidates for the individual having the same identification information. In this case, preferentially select candidates for individuals other than those in higher positions than candidates for individuals in higher positions than the user in the organization to which the user belongs,
An individual identification program for outputting individual candidates based on the reliability in the output process.

to the computer,
10. The personal identification program according to claim 9, wherein in the personal identification processing, among individuals who partially or wholly match the structure of the organization to which the user belongs, the individual having the role indicated by the target word is identified as a candidate.