WO2008044669A1 - Programme de recherche d'informations audio et son support d'enregistrement, système de recherche d'informations audio, et procédé de recherche d'informations audio - Google Patents

Programme de recherche d'informations audio et son support d'enregistrement, système de recherche d'informations audio, et procédé de recherche d'informations audio Download PDF

Info

Publication number
WO2008044669A1
WO2008044669A1 PCT/JP2007/069655 JP2007069655W WO2008044669A1 WO 2008044669 A1 WO2008044669 A1 WO 2008044669A1 JP 2007069655 W JP2007069655 W JP 2007069655W WO 2008044669 A1 WO2008044669 A1 WO 2008044669A1
Authority
WO
WIPO (PCT)
Prior art keywords
search
database
voice information
character string
characters
Prior art date
Application number
PCT/JP2007/069655
Other languages
English (en)
Japanese (ja)
Inventor
Toshifumi Okuhara
Original Assignee
Toshifumi Okuhara
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshifumi Okuhara filed Critical Toshifumi Okuhara
Publication of WO2008044669A1 publication Critical patent/WO2008044669A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/685Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates

Definitions

  • Voice information search program and its recording medium voice information search system, and voice information search method
  • the present invention relates to a speech information retrieval program. More specifically, the present invention relates to a voice information search program and its recording medium, a voice information search system, and a voice information search method. Background art
  • Patent Document 1 describes a technique for searching for speech information in consideration of the possibility that a recognition error has occurred when generating a speech element sequence from an input arbitrary word. It ’s been disclosed.
  • Patent Document 1 Japanese Patent Application Laid-Open No. 2005-257954. Disclosure of the invention Problems to be solved by the invention
  • the main object of the present invention is to provide a speech information search program that can obtain more accurate search results and can be easily used in various databases. Means for solving the problem
  • a step of converting an audio element recorded in audio information into a character string a step of measuring the reproduction time of the audio information, and a reproduction time of the audio information If it is longer than a predetermined time, the character string of the audio information is divided into predetermined time intervals and registered in the database.
  • the character of the audio information is Registering in the database without separating columns, determining a search item for searching the database, and a search condition for the search item, and for the database according to the search condition.
  • a voice information search program for causing a computer to execute at least a step of executing a search and a step of outputting a result of the search.
  • a speech information search program characterized by causing a computer to execute a step of determining whether a character string specified by a search condition is included in a character string of each speech information.
  • a recording time position of voice information in which a voice element corresponding to the character specified in the search condition is recorded is specified.
  • a voice information retrieval program characterized by causing a computer to execute a voice reproduction step. By causing the computer to execute each of these steps, the sound S can be reproduced from the more accurate reproduction position that matches the search condition.
  • the present invention provides a computer-readable recording medium on which the program is recorded.
  • the “recording medium” used in the present invention means any computer-readable recording medium on which a program is recorded, which is used for program installation, execution, program distribution and distribution, and the like.
  • the voice element recorded in the voice information is converted into a character string, the voice information playback time measuring means, and the voice information playback time is longer than a predetermined time.
  • the character string IJ of the voice information is registered in the database by dividing it at predetermined time intervals, and when the reproduction time of the voice information is equal to or shorter than the predetermined time, the character string of the voice information is not divided into the database.
  • a procedure for converting an audio element recorded in audio information into a character string a procedure for measuring a reproduction time of the audio information, and a reproduction time of the audio information from a predetermined time If the length of the voice information is too long, the character ⁇
  • the voice information A procedure for registering in the database without dividing the character string, a search item for searching the database, a procedure for determining a search condition for the search item, and a search condition for the search item, and the search condition Accordingly, there is provided a voice information search method comprising at least a procedure for executing a search on the database and a procedure for outputting a result of the search. According to the voice information retrieval method, it is possible to perform voice information retrieval with high retrieval accuracy and high search speed.
  • audio information in the present invention means any information in which at least an audio element is recorded, unless otherwise specified, and includes, for example, moving image information.
  • voice information file in the present invention includes at least a moving picture information file in which a voice element is recorded, unless otherwise specified.
  • steps, procedures, means, etc. in the present invention are not limited to the order of performing these steps as long as the object of the present invention can be achieved, and within the scope of the present invention! ,.
  • the audio information search program of the present invention it is possible to search audio information with high search accuracy and to perform audio information search with a high search speed.
  • FIG. 1 is a conceptual diagram for explaining a speech information retrieval program according to the present invention.
  • a speech information retrieval program includes a speech information file text conversion function 1, a speech information file playback time measurement function 2, and a database registration function 3.
  • Search condition determination function 4 search processing function 5, and search result output function 6 are executed at least by the computer. 7 is accumulated. Each function will be described below.
  • the text conversion function 1 of the voice information file is a function of converting a voice element input to the voice information file into a character string.
  • Audio information file playback time measurement function 2 is a function for measuring the playback time of an audio information file.
  • the database registration function 3 is a function for registering the voice information file in the database 7.
  • the search condition determination function 4 is a function for determining a search condition for extracting a desired audio information file.
  • the search processing function 5 is a function for executing a search on the database 7 in accordance with the search conditions determined by the search condition determination function 4.
  • the search result output function 6 is a function for outputting the search result obtained by the search processing function 5.
  • FIG. 2 is a flowchart when registering a database in the first embodiment of the speech information retrieval program according to the present invention. That is, FIG. 2 is a flowchart for registering an audio information file in the database based on the playback time.
  • file information of an audio information file (including a moving image information file; the same applies hereinafter) is input (Sl).
  • the file information As the file information, the title name, category, creator, voice language, creation date, etc. of the voice information file are input.
  • the ability to input filter condition items such as content name and category is not particularly limited, and the file information input in this step is not particularly limited.
  • Information necessary for database construction can be input. For example, basic content information may be input.
  • the input method is not particularly limited in the present invention, and can be performed by, for example, a keyboard.
  • Various information input in this way is registered in the database as file information of the audio information file (S2).
  • an audio information file is reproduced, converted into text using an audio conversion system, and stored in a database.
  • the played audio is in Japanese (S 3, see S4a), and the text conversion is displayed in hiragana. Or, it is not particularly limited to display Japanese translation! / Katakana display.
  • the text conversion is displayed in the foreign language such as English characters.
  • the target language is not particularly limited, and may be Japanese, English, Chinese, Korean, French, etc. In this case, it is preferable to use a language that does not have speech.
  • speech information the words (characters) corresponding to the speech can be uniquely converted and determined, so that higher search accuracy can be obtained. .
  • steps S1 to S4 are appropriately performed from the viewpoint of search items used as search conditions that are not necessarily required, the viewpoint of easy management of audio information files on a database, and the like. Can do.
  • the audio information file is reproduced (S 5), and the audio elements of the reproduced audio information file are converted to text (S 6).
  • the method for text-converting a voice element of a voice information file is not particularly limited.
  • text conversion can be performed using a voice conversion system such as Microsoft (registered trademark) R.Net Speech.
  • step S8a is a short-time audio information file with a playback time of n seconds or less.
  • step S7 If it is determined in step S7 that it is longer than n seconds, the playback part of the first n seconds of the audio information file is separated, and the text information of the separated audio information file is separated. Register in the database (S8b). Then, the remaining portion of the cut audio information file is determined again in step S7 as to whether or not the reproduction time is longer than n seconds, and is finally divided in units of audio information file power seconds.
  • the audio information file processed in step S8b is an audio information file whose playback time is longer than the predetermined time n seconds.
  • the reproduced audio information is converted into text and stored in the database.
  • each time stamp is pressed at predetermined time intervals (n seconds), Recorded in units. Also, if the audio information is short and / or information amount less than the predetermined time interval (n seconds), the time stamp is recorded without being divided (see FIG. 3 etc.).
  • the time setting of the predetermined time n seconds in step S7 is not particularly limited, and can be set as appropriate in consideration of the search accuracy, the processing capability of the computer or the like used, the usage environment, and the like. For example, if you want to increase the search accuracy, you can shorten the n-second time setting. If you want to focus on the search speed, etc., you can increase the n-second time setting.
  • FIG. 3 is a conceptual diagram for explaining the basic information table of the audio information file registered in the database in the first embodiment (see FIG. 2, S1, etc.), and FIG. 4 is the first embodiment.
  • FIG. 3 is a conceptual diagram illustrating content data of an audio information file registered in a database in the form (see FIG. 2; S7, S8a, S8b, etc.).
  • FIG. 3 is an example of the file information of the audio information file, and four programs with content IDs “1” to “4” are registered. Information about the titles, categories, and audio languages of the four programs is shown. For example, the content ID “1” is assigned to the first registered program (“Create homepage”) as the number that identifies and identifies the program. “Computer” is assigned to the power category. Thus, the search accuracy can be further improved by registering the genre and the like of the content program. Then, “Japanese” as the voice language is registered as file information.
  • FIG. 4 shows the content data of the audio information file registered in the database, and the audio information file is converted to text and divided so as to have a reproduction time of a predetermined time n seconds or less ( Figure 2; see S8a, 8b etc.).
  • FIG. 4 illustrates the program “homepage creation” with the content ID “1” in FIG. 3 and the program “weather in Tokyo today” with the content ID “4”. .
  • the content ID is a number assigned to each program as in FIG.
  • the record ID indicates the power of the audio information played back in the program. For example, record ID “1” is the first audio information file played back in the program. Record ID “2” is played after record ID “1”. Show that it is the second audio information file! /
  • the time stamp in Fig. 4 indicates the playback time from the beginning of the program.
  • the time stamp “00:00:00” indicates that it is 0 seconds from the beginning of the program, and the time stamp “00:00:00” is exactly 10 seconds from the beginning of the program. It is shown that it was time. That is, in this case, the predetermined time n of each record is 10 seconds (see FIG. 2; S8b, etc.).
  • the item “text” in FIG. 4 indicates data obtained by converting the voice element of the voice information file into characters.
  • the voice of “I will make a home page from now on!
  • the next record ID “2” is a continuation of the record ID “1”, and the text of the record ID “2” must be “ ⁇ .
  • Muriyo “•” audio is converted to text!
  • FIG. 5 is a flowchart for performing a search process for the database in the first embodiment. That is, FIG. 5 is a flowchart for performing a search process for the database registered in FIG. 2, and shows a search process for searching for an audio information file including a predetermined keyword.
  • search condition items to be entered include the category, language, text keyword, etc. of the voice information file.
  • items of search condition items (see S2 in FIG. 2) input at the time of database creation can be input.
  • the search condition items are not particularly limited. For example, a category, a language, a keyword included (character ⁇ IJ), and the like can be set.
  • step S10 it is determined whether the number of extractions is 0 or more than 1 (Sl l). As a result, if there are more extractions that match the search conditions, a content ID list of the content ID (see Fig. 2 etc.) of the corresponding data is created. On the other hand, if the number of extracted cases is 0, “N / A” is displayed (12b).
  • the number of content ID lists is looped (S13 to S26). That is, all the contents extracted in step S 10 are searched. For example, when a search is made in the database using the category “computer” as a search condition, all the contents belonging to the category “computer” are extracted, and all the records of the content group are searched for text.
  • the text search is performed in the following steps. First, the content data count is reset (S14), and a search is made for records with matching content IDs in the same content data table (S15 to 25). First, a search is made for a group of records that are divided and registered within the content (within the program). For the record ID, the nth ID and the (n + 1) th ID are extracted (S16). For example, records with record IDs “1” and “2” are extracted. Alternatively, records with record IDs “2” and “3” are extracted.
  • step S20 keyword search is performed for the character string in the text of the content “n”.
  • step S20 text search is performed within a single record without combining previous and subsequent records. For example, in the example of FIG. 3, a keyword search is performed on the character string in the content ID “2”.
  • the search result list obtained so far data in which the corresponding content ID and the same item in the content basic information table match is acquired (S28). Then, the acquired data is converted into display data (S29).
  • the aspect of the display data is not particularly limited, but may be an aspect listed on the user terminal, for example.
  • FIG. 6 is a flowchart when outputting the detection result in the first embodiment. That is, FIG. 6 is a flowchart for outputting the search result obtained by the search processing in FIG. 5, and is a search result output for reproduction from a reproduction portion including a predetermined key keyword.
  • the accuracy improvement mode is selected, more accurate keyword playback is possible.
  • the start time position of the corresponding content data is first obtained from the result list obtained in the search process (see FIG. 4 etc.) (S34).
  • the output audio information file is selected from Japanese or English (S35).
  • the speech language is not limited to Japanese and English, and may be a plurality of types of languages.
  • noise cut processing based on English speech is performed (S36b). Since the utterance wave number varies depending on the language, it can be reproduced with little noise when it is reproduced and output by performing an appropriate noise cut process according to the language. This noise cut should be processed while playing back the audio information file.
  • information on the start time position of the corresponding content data is acquired from the search result list obtained in the search process (see FIG. 4 etc.) (S36).
  • information on the scheduled playback position of which audio ID file (that is, which content ID) should be played back from what record ID of what record ID is acquired.
  • playback is started from the corresponding scheduled playback position (S37), and the playback location is converted to text (S38).
  • the first character 1S of the keyword 1S is determined as to the power contained in the text at the scheduled playback position (S39). For example, when a voice information file including the keyword “Internet” is searched, it is a step of determining whether or not the first character “I” of “Internet” is included in the scheduled playback position. As a result, if the first character is included, the playback position obtained by subtracting a predetermined time (1 second) from the playback start time (scheduled playback position) of the first character is stored in a variable (S40). [0058] In this step S40, for example, in the case where the keyword "Internet” is searched for the content "homepage creation" in FIG.
  • the playback position is 34 seconds one second before that position. Is stored in a variable. As a result, it is possible to prevent the voice information finale from being played when the first character of “Internet” is cut off and the head is cut off like “Internet”, and the sound is played back right before “Internet”. be able to.
  • the predetermined time may not be 1 second, but can be determined appropriately according to the type of search target file, the audio content, and the like.
  • step S39 If it is determined in step S39 that the first character of the keyword does not match, the voice information file at the scheduled start position is played again (S37), and the second character (n It is determined whether or not 2) matches (S44).
  • the start time position of the corresponding content data in the search result list and the time confirmed in the variable are calculated (S46). For example, in the case of Figure 3, if the playback start position of the keyword “Internet” is 35 seconds after the beginning of the content, the position of 34 seconds is calculated by subtracting the predetermined time of 1 second from 35 seconds. To do.
  • step S46 the reproduction start position calculated in step S46 is set (designated) (S47), and reproduction is performed from the reproduction designated position of the predetermined audio information file (S48).
  • step S33 If the accuracy improvement mode is not selected in step S33, the start time position of the corresponding content data is directly acquired from the result list obtained by the search process. (S10b), the playback is started from the start time position (S47). For example, if the content data “Homepage creation” in FIG. 3 is searched with the keyword “Homepage” and then the keyword “Homepage” is searched, the first playback start position of the record ID “4” containing the keyword (ie, the content Playback starts from 30 seconds from the beginning).
  • the accuracy improvement mode can be appropriately set to “valid” or “invalid”.
  • set the accuracy improvement mode to “Enable”.
  • playback is performed from the time stamp of the record to be searched, and the location matching the keyword is specified. Then, the total time of the time until matching and the time of the time stamp is set as the playback start position for the user.
  • the accuracy improvement mode is set to "invalid". As a result, playback starts from the time stamp of the record to be searched (that is, from the beginning of the record ID to be searched).
  • FIG. 7 is a flowchart showing a procedure for registering a database in the second embodiment of the speech information search program according to the present invention.
  • the audio information file according to the present invention is registered in a database based on a predetermined number of characters. The following description will focus on differences from the first embodiment of the present invention.
  • file information of an audio file is input (S1), and the audio file is converted to text.
  • the procedure up to (S6) is the same as in FIG. It is determined whether the number of characters in the voice information file converted into text in this way is larger or smaller than a predetermined number of characters (n in FIG. 7) (S7-2).
  • the audio information file processed in step S8a is an audio information file composed of short // text with a reproduction time of n characters or less.
  • step S7-2 If it is determined in step S7-2 that the number of characters is longer than n, the first n number of playback parts of the audio information file are separated, and the audio information file text is delimited. Information is registered in the database (S8b). And the cut audio information For the rest of the file, it is determined again in step S7-2 whether the text is longer than n characters. Finally, the audio information file is divided in units of n characters. That is, the audio information file processed in step S8b is an audio information file in which the number of texts in the reproduction information is longer than the predetermined number of characters (n characters).
  • the reproduced audio information is converted into text and stored in the database.
  • a time stamp with a predetermined number of characters (n characters)
  • each unit unit It is recorded separated by. If the audio information has a short amount of information equal to or less than the predetermined number of characters (n characters), the time stamp is pressed and recorded as it is without being divided.
  • search processing, search result output processing, and the like can be performed in the same procedure as in the first embodiment of the present invention (Figs. 5 and 5). (Refer to 6).
  • FIG. 8 is a flowchart showing a procedure for registering a database in the third embodiment of the speech information search program according to the present invention.
  • the audio information file according to the present invention is registered in the database based on the predetermined number of words. The following description will focus on differences from the first embodiment and the second embodiment of the present invention.
  • the text is divided into predetermined phrase units or word units.
  • the delimiter unit may be a phrase unit or a word unit, and can be selected as appropriate.
  • the method for recognizing the phrase unit or the word unit is not particularly limited, and a suitable method can be used as appropriate.
  • the unit to be divided is not particularly limited. For example, although it is not shown, a predetermined phrase unit or word unit) can be set as n clause number or n word number). If the unit is shortened, the search accuracy is improved, and if the unit is lengthened, the search speed can be increased. A suitable unit can be appropriately determined in consideration of these factors.
  • the voice information file processed in step S8a is a voice information file consisting of short text and text whose playback time is less than the number of possessed phrases n (predetermined number of words n).
  • step S7-3 If it is determined in step S7-3 that it is longer than n clauses (n words), the playback portion of the first n clauses (n words) of the audio information file is selected. Divide and register the text information of the separated audio information file in the database (S8b). Then, the remaining portion of the cut audio information file is determined again in step S 7-3 whether or not the text is longer than n clauses (n words). Finale is separated by n clauses (n words). That is, the audio information file processed in step S8b is an audio information file in which the number of texts of the reproduction information is longer than the predetermined number of phrases (predetermined number of words).
  • FIG. 9 is a flowchart for performing a search process for a database according to the third embodiment. That is, FIG. 9 is a flowchart for performing a search process for the database registered in FIG. The difference from the case where the audio information file is registered in the database based on a predetermined time (see FIG. 6 etc.) will be described below.
  • the search condition is input (S9) until the procedure (S10, S11) for extracting data from the content basic information table from a predetermined condition such as category 'language', the first embodiment of the present invention, The same as in the second embodiment.
  • step S11 if there are more than the number of extracted cases in step S11, the corresponding data is copied.
  • a keyword in the text of the content data table is searched from the content ID (S49 a) D, that is, a step of further searching the content ID.
  • step S11 If the number of extractions in the search result of step S11 is 0, "N / A" is displayed and the search is terminated (S49b).
  • step S49a If there is one or more extraction results as a result of keyword search in step S49a, a loop search is performed for the corresponding number (S51a to S57). In step S49a, if the number of extractions is 0, “N / A” is displayed and the search ends (S49b).
  • the following loop search is performed (S51a to S57).
  • the content ID and record ID “n” are added to the result list.
  • this step identifies the record ID that includes the corresponding audio part.
  • the data corresponding to the same item in the content ID and the content basic information table is acquired from the result list (S53).
  • the acquired data is converted into display data (S54). For example, it may be display data that can be displayed in a list on the user's terminal device and the corresponding part can be reproduced as it is by clicking on the linked part.
  • a loop search is performed (S57) in which the series of steps (S52 to S56) are all performed for the corresponding number of cases.
  • FIG. 10 is a flowchart for outputting the detection result in the third embodiment. That is, FIG. 10 is a flowchart for outputting the search result obtained by the search process in FIG. The difference from the case where the audio information file is registered in the database based on a predetermined time (see FIG. 7 etc.) will be described below.
  • the start time position of the corresponding content data is acquired from the result list obtained by the search processing performed in FIG. 9 (S58). Subsequently, the start time of the content data is set (S59). Then, playback is performed from the set start time (S60).
  • an audio information file into text when converting an audio information file into text, it may be divided by a predetermined time unit! / (See FIG. 2 etc.) or may be divided by a predetermined number of characters! / (See Fig. 7 etc.) and may be separated by the number of specified phrases or words (see Fig. 8 etc.). These are appropriately determined in consideration of the processing capability when converting the voice information file to text and the processing capability such as whether the text-converted file can be recognized as a character, a phrase or a word, etc. Select with power S.
  • FIG. 11 is a conceptual diagram for explaining an example of the speech information retrieval system according to the present invention.
  • the speech information retrieval system includes a text conversion server 8, a database server 9, a web server 10, a speech information file storage server 11, and a management server 12.
  • the system connected by the network 13.
  • the text conversion server 8 is a server that converts a voice information file into text, creates a content information file, and registers it in a database.
  • the database server 9 is a server that stores the text-converted information file 91 and the content information file 92.
  • the Web server 10 is a server that is connected to the user's terminal 101 and exhibits a search condition input from the user's terminal 101 and a function for displaying and outputting the search result.
  • the audio information file storage server 11 is a server that stores an audio information file (including a video information file) 111.
  • the management server 12 is a server that manages system failures such as each server and the network 13.
  • the above-described accuracy improvement mode can be operated as required. If the accuracy improvement mode is to function, set the accuracy improvement mode to “valid” (see Figure 6). As a result, playback is performed from the time stamp of the record to be searched, and the location that matches the keyword is specified. Then, the sum of the time until matching and the time of the time stamp is set as the playback start position for the user.
  • the accuracy improvement mode is set to "invalid". As a result, playback starts from the time stamp of the record to be searched (that is, from the beginning of the record to be searched).
  • the database capability can be adjusted by shortening the time required for search or improving the search accuracy by adjusting settings as appropriate. For example, when the accuracy improvement mode is set to be effective, the search accuracy of voice information is improved, and a position where a user's desired keyword is output (spoken) can be detected, and a more accurate reproduction start position can be determined. Searchable. On the other hand, by disabling the accuracy improvement mode, the time required for the search can be shortened because the step of the accuracy improvement mode is unnecessary, and the load on the server of the search system can be reduced. That is, according to the present invention, it is possible to appropriately construct a search system in consideration of the type and amount of information file to be searched, a certain hardware environment, and the hardware environment to be used.
  • the information recorded in the database used in the present invention is not limited to the audio information file (or moving image information file), and for example, a text information file is also recorded.
  • a text information file is also recorded.
  • voice information is converted into text and registered in the database, not only the voice information file but also the text information file can be simultaneously searched. That is, in the present invention, by registering not only the audio information file but also the text information file, the audio information file (or moving picture information file) and the text information file can be simultaneously stored by executing one search. You can search.
  • a step of designating a search target before the search execution may be separately provided.
  • an appropriate selection in consideration of the user's purpose of use, usage environment, etc., such as a search for only audio information files or a search for only text information files.
  • a function may be provided in which a plurality of contents can be searched simultaneously when performing a search process. By performing simultaneous search for multiple contents, the time required for search processing can be reduced. In this case, it is possible to appropriately determine whether to provide this function in consideration of the processing capacity of the computer or the like used in the present invention.
  • the voice information search program according to the present invention can be used for management, tabulation, and search of various voice information files.
  • a multimedia-related database for managing a voice information file with a large amount of information It can be used by incorporating a large number of information files into a search engine on the Internet that is the search target.
  • FIG. 1 is a conceptual diagram for explaining a speech information retrieval program according to the present invention.
  • FIG. 2 is a flowchart for registering a database in the first embodiment of the audio information program according to the present invention.
  • FIG. 3 is a conceptual diagram illustrating a basic information table of audio information files registered in a database in the first embodiment.
  • FIG. 4 is a conceptual diagram illustrating content data of a voice information file registered in a database in the first embodiment.
  • FIG. 5 is a flowchart when a search process is performed for a database in the first embodiment.
  • FIG. 6 is a flowchart for outputting a detection result in the first embodiment.
  • FIG. 7 is a flowchart for registering a database in the second embodiment of the audio information program according to the present invention.
  • FIG. 8 is a flowchart for registering a database in the third embodiment of the audio information program according to the present invention.
  • FIG. 9 is a flowchart when a search process is performed for a database in the third embodiment.
  • FIG. 10 is a flowchart for outputting a detection result in the third embodiment.
  • FIG. 11 is a conceptual diagram for explaining an example of a speech information retrieval system according to the present invention. Explanation of symbols

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Cette invention propose un programme de recherche d'informations audio qui peut obtenir un résultat de recherche précis et peut être facilement utilisé dans diverses bases de donnés. Un programme de recherche d'informations audio amène un ordinateur à exécuter au moins les étapes suivantes : la conversion d'un élément audio enregistré dans des informations audio en une chaîne de caractères, la mesure du temps de reproduction des informations audio, l'enregistrement de la chaîne de caractères des informations audio dans une base de données par la division de la chaîne en un intervalle de temps prédéterminé si le temps de reproduction des informations audio est plus long que le temps prédéterminé et par l'enregistrement de la chaîne de caractères des informations audio dans la base de données sans la diviser si le temps de reproduction des informations audio n'est pas plus long que le temps prédéterminé, la décision d'un objet de recherche pour rechercher la base de données et une condition de recherche pour l'objet de recherche, l'exécution d'une recherche de la base de données conformément à la condition de recherche, et l'émission du résultat de recherche.
PCT/JP2007/069655 2006-10-10 2007-10-09 Programme de recherche d'informations audio et son support d'enregistrement, système de recherche d'informations audio, et procédé de recherche d'informations audio WO2008044669A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2006277026A JP2008097232A (ja) 2006-10-10 2006-10-10 音声情報検索プログラムとその記録媒体、音声情報検索システム、並びに音声情報検索方法
JP2006-277026 2006-10-10

Publications (1)

Publication Number Publication Date
WO2008044669A1 true WO2008044669A1 (fr) 2008-04-17

Family

ID=39282862

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2007/069655 WO2008044669A1 (fr) 2006-10-10 2007-10-09 Programme de recherche d'informations audio et son support d'enregistrement, système de recherche d'informations audio, et procédé de recherche d'informations audio

Country Status (2)

Country Link
JP (1) JP2008097232A (fr)
WO (1) WO2008044669A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797632A (zh) * 2019-04-04 2020-10-20 北京猎户星空科技有限公司 信息处理方法、装置及电子设备

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010055259A (ja) * 2008-08-27 2010-03-11 Konica Minolta Business Technologies Inc 画像処理装置、画像処理プログラム及び画像処理方法
CN106021249A (zh) * 2015-09-16 2016-10-12 展视网(北京)科技有限公司 一种基于内容的语音文件检索方法和系统
JP6721981B2 (ja) * 2015-12-17 2020-07-15 ソースネクスト株式会社 音声再生装置、音声再生方法及びプログラム

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000348064A (ja) * 1999-04-09 2000-12-15 Internatl Business Mach Corp <Ibm> 内容情報と話者情報を使用して音声情報を検索するための方法および装置
JP2002157112A (ja) * 2000-11-20 2002-05-31 Teac Corp 音声情報変換装置
JP2006054517A (ja) * 2004-08-09 2006-02-23 Bank Of Tokyo-Mitsubishi Ltd 情報提示装置、方法及びプログラム

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000348064A (ja) * 1999-04-09 2000-12-15 Internatl Business Mach Corp <Ibm> 内容情報と話者情報を使用して音声情報を検索するための方法および装置
JP2002157112A (ja) * 2000-11-20 2002-05-31 Teac Corp 音声情報変換装置
JP2006054517A (ja) * 2004-08-09 2006-02-23 Bank Of Tokyo-Mitsubishi Ltd 情報提示装置、方法及びプログラム

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797632A (zh) * 2019-04-04 2020-10-20 北京猎户星空科技有限公司 信息处理方法、装置及电子设备
CN111797632B (zh) * 2019-04-04 2023-10-27 北京猎户星空科技有限公司 信息处理方法、装置及电子设备

Also Published As

Publication number Publication date
JP2008097232A (ja) 2008-04-24

Similar Documents

Publication Publication Date Title
US11978439B2 (en) Generating topic-specific language models
US7546288B2 (en) Matching media file metadata to standardized metadata
JP4997601B2 (ja) 音声データ検索用webサイトシステム
US7310601B2 (en) Speech recognition apparatus and speech recognition method
US8583418B2 (en) Systems and methods of detecting language and natural language strings for text to speech synthesis
US8355919B2 (en) Systems and methods for text normalization for text to speech synthesis
JP5178109B2 (ja) 検索装置、方法及びプログラム
US20160012047A1 (en) Method and Apparatus for Updating Speech Recognition Databases and Reindexing Audio and Video Content Using the Same
US20100274667A1 (en) Multimedia access
US20040054541A1 (en) System and method of media file access and retrieval using speech recognition
US20110029545A1 (en) Syllabic search engines and related methods
Witbrock et al. Using words and phonetic strings for efficient information retrieval from imperfectly transcribed spoken documents
US9015172B2 (en) Method and subsystem for searching media content within a content-search service system
US9305119B1 (en) System, apparatus and method for determining correct metadata from community-submitted data
JP3545824B2 (ja) データ検索装置
JP4064902B2 (ja) メタ情報生成方法、メタ情報生成装置、検索方法および検索装置
WO2008044669A1 (fr) Programme de recherche d&#39;informations audio et son support d&#39;enregistrement, système de recherche d&#39;informations audio, et procédé de recherche d&#39;informations audio
JP2009080576A (ja) 検索装置、方法及びプログラム
EP1531405B1 (fr) Appareil de recherche d&#39;information, méthode pour la recherche d&#39;information, et mémoire sur laquelle le programme de recherche d&#39;information est enregistré
US7949667B2 (en) Information processing apparatus, method, and program
Goto et al. PodCastle and Songle: Crowdsourcing-Based Web Services for Retrieval and Browsing of Speech and Music Content.
LawTo et al. A scalable video search engine based on audio content indexing and topic segmentation
Lindsay et al. Representation and linking mechanisms for audio in MPEG-7
Declerck et al. Contribution of NLP to the content indexing of multimedia documents
US20080046488A1 (en) Populating a database

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07829393

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 24-07-2009)

122 Ep: pct application non-entry in european phase

Ref document number: 07829393

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)