WO2019227578A1 - 语音采集方法、装置、计算机设备及存储介质 - Google Patents

语音采集方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2019227578A1
WO2019227578A1 PCT/CN2018/094364 CN2018094364W WO2019227578A1 WO 2019227578 A1 WO2019227578 A1 WO 2019227578A1 CN 2018094364 W CN2018094364 W CN 2018094364W WO 2019227578 A1 WO2019227578 A1 WO 2019227578A1
Authority
WO
WIPO (PCT)
Prior art keywords
corpus
text
recording
communication application
application platform
Prior art date
Application number
PCT/CN2018/094364
Other languages
English (en)
French (fr)
Inventor
黄锦伦
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019227578A1 publication Critical patent/WO2019227578A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/04Real-time or near real-time messaging, e.g. instant messaging [IM]

Definitions

  • the present application relates to the field of computer technology, and in particular, to a voice acquisition method, device, computer equipment, and storage medium.
  • voiceprint recognition technology is used to quickly and easily determine the identity of speakers
  • voice recognition technology is used in To convert natural speech into text, both voiceprint recognition technology and speech recognition technology need to collect a large amount of speaker information and speaker corresponding speech information for model training.
  • the voice collection mainly uses professional recording equipment for voice recording to obtain the recording file, and then manually marks the speaker corresponding to the recording file.
  • This manual method cannot remove the unqualified recording in time, and the operation is cumbersome, making the collection efficiency low
  • this manual collection method is not suitable for speech collection of people at a long distance. If you need to collect speech from people in different regions at the same time, you can only purchase multiple recording devices, which wastes a lot of collection costs.
  • a speech collection method includes:
  • a corpus acquisition request sent by a user through an application account of a communication application platform is received, acquiring a basic corpus included in the corpus acquisition request and identification information of the user;
  • a preset voice recognition calling framework is used to call the offline speech recognition function of the application account of the communication application platform to record the recording file.
  • a mapping relationship between the recording file and the identification information is established, and the recording file, the identification information, and the mapping are correspondingly saved. relationship.
  • a voice acquisition device includes:
  • a domain determination module configured to, if a corpus acquisition request sent by a user through an application account of a communication application platform is received, obtain a basic corpus included in the corpus acquisition request and identification information of the user;
  • a corpus selection module configured to randomly select a corpus from the basic corpus corresponding to the basic corpus as an initial corpus, and display the corpus on an interface of an application account of the communication application platform;
  • a recording generation module is configured to, if a recording start request sent by the user through an application account of the communication application platform is received, use a preset recording call framework to call the recording function of the application account of the communication application platform for recording And generate recording files;
  • the voice recognition module is configured to call the offline voice recognition of the application account of the communication application platform using a preset voice recognition calling framework if a termination recording request sent by the user through the application account of the communication application platform is received.
  • a text matching module configured to match text similarity between the target text and the initial corpus to obtain text similarity between the target text and the initial corpus;
  • a file storage module configured to establish a mapping relationship between the recording file and the identification information if the text similarity is greater than or equal to a preset similarity threshold, and correspondingly save the recording file and the identity Identification information and the mapping relationship.
  • a computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, and the processor implements the above-mentioned voice collection method when the processor executes the computer-readable instructions. step.
  • One or more non-volatile readable instructions storing computer readable instructions that, when executed by one or more processors, cause the one or more Each processor executes the steps of performing the speech collection method.
  • FIG. 1 is a schematic diagram of an application environment of a voice collection method according to an embodiment of the present application
  • FIG. 2 is an implementation flowchart of a speech collection method according to an embodiment of the present application
  • FIG. 3 is a flowchart of implementing step S3 in the voice collection method according to an embodiment of the present application.
  • FIG. 4 is a flowchart of implementing step S5 in the voice collection method according to an embodiment of the present application.
  • FIG. 5 is a flowchart of implementing step S53 in the voice collection method according to an embodiment of the present application.
  • FIG. 6 is another implementation flowchart of a speech collection method according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a voice collection device according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a computer device according to an embodiment of the present application.
  • FIG. 1 illustrates an application environment of a voice collection method provided by an embodiment of the present application.
  • This method of voice acquisition is applied in a voice acquisition scenario of an application account based on a communication application platform.
  • the voice collection scenario includes a server, a client, and a communication application platform.
  • the server, the client, and the communication application platform are connected to each other through a network.
  • the user obtains the corpus from the server through the client and sends voice data to the communication.
  • Application platform the server uses the communication application platform for recording and voice recognition.
  • the communication application platform sends recording files and speech recognition results to the server.
  • the communication application platform is an instant messaging system, which can transmit voice, pictures, videos, etc. file.
  • the communication application platform may be other communication application platforms such as WeChat, Michao, Yixin, Alipay and Happy Ping.
  • Third-party software developers can develop their own application accounts based on the communication application platform, that is, the application account of the communication application platform.
  • the client is the application account of the communication application platform.
  • the account can be WeChat public account, Alipay public account, WeChat applet, etc.
  • the server can be implemented by an independent server or a server cluster composed of multiple servers.
  • FIG. 2 illustrates a voice collection method provided by an embodiment of the present application. The method is applied to the server in FIG. 1 as an example for description, and the details are as follows:
  • the basic corpus included in the corpus acquisition request is acquired, and at the same time, the user's identity information is acquired based on the application account of the communication application platform.
  • the basic corpus includes, but is not limited to, data classification in the fields of news, military, life, economy, sports, hotspots, and entertainment.
  • the basic corpus can be designed according to needs, and is not specifically limited here.
  • the identification information is used to uniquely identify the user.
  • the identification information includes, but is not limited to: the work number, the communication application platform identification ((Identification, ID), and name, etc.), which can also be set according to actual needs, and is not specifically limited here .
  • the application account of the communication application platform is a WeChat public account
  • the basic corpus included in the corpus acquisition request sent by the user through the WeChat public account is "news”
  • the obtained identity information is " 619-Zhangsan-Zhang San”
  • 619 refers to the user's work number
  • Zhangsan refers to the user's WeChat ID
  • Zhang San refers to the user's name
  • the user can directly obtain the user's communication application platform identity through the application account of the communication application platform, and obtain the identity of the communication application platform stored on the server with the communication application platform.
  • the identification information of the corresponding user is identified.
  • the server sends a request for improvement of the identification information to the user of the application account of the communication application platform.
  • the specific information can be popped up by popping up a page of supplementary information. For users to fill in, you can also use other methods according to your needs. There is no specific limitation here. After obtaining the complete information of the identity information filled in by the user, the identity information is stored on the server.
  • a random corpus is selected from the basic corpus corresponding to the basic corpus as the initial corpus, and displayed on the interface of the application account of the communication application platform.
  • each basic corpus is preset with a corresponding basic corpus. After obtaining the basic corpus included in the corpus acquisition request in step S1, confirm the basic corpus corresponding to the basic corpus, and use a random function from the basic corpus. A random corpus is selected as the initial corpus and displayed on the interface of the application account of the communication application platform for users to read aloud.
  • the basic corpus refers to a large-scale electronic text database that has been scientifically sampled and processed.
  • the basic corpus stores language materials that have actually appeared in the actual use of the language, and uses electronic computers as the carrier to carry basic language knowledge.
  • a random corpus is selected from the basic corpus corresponding to "news” as "Most people like this father and son, and agree with this kind of education by honing his son on foot Method "and display this corpus on the WeChat public account interface.
  • the user After displaying the initial corpus on the interface of the application account of the communication application platform, the user sends a recording start request to the server by clicking the recording button set on the WeChat public account, and the server receives the user through the communication application platform.
  • the recording function of the application account of the communication application platform is called by using a preset recording calling framework, and the voice generated by the user reading the initial corpus is recorded and saved as a recording file.
  • the recording calling framework implements recording by calling the audio interface in the JSSDK (JavaScript Software Development Kit, Java scripting language software development kit) of the communication application platform.
  • the audio interface includes, but is not limited to, a recording start interface and a stop recording interface. .
  • the corpus “Most people like this father and son and agree with this educational method of honing their sons on foot” is displayed, and the user clicks The recording button is to send a recording start request to the server through the WeChat public account.
  • the server After receiving the request, the server calls the start recording interface in the audio interface of the WeChat server through the recording call framework to perform recording.
  • the preset speech recognition calling framework is used to call the offline speech recognition function of the application account of the communication application platform to convert the recording file into the target text.
  • the recording end request is sent to the server by clicking the stop recording button on the application account interface of the communication application platform.
  • the server uses the preset voice recognition
  • the calling framework calls the offline voice recognition function provided by the application account of the communication application platform, performs voice recognition on the recorded file, and obtains the target text.
  • the server performs voice recognition on the recording file through the WeChat server, and the target text is “most people like this pair of father and son and agree to this Hone your son on foot ”goal text.
  • S5 Match the text similarity between the target text and the initial corpus to obtain the text similarity between the target text and the initial corpus.
  • text similarity matching is performed on the obtained target text and the initial corpus to obtain the text similarity between the target text and the initial corpus. Then, whether the recording file corresponding to the target text meets the requirements can be determined according to the text similarity.
  • text similarity refers to measuring similarity between texts through similarity coefficients, similar distances, etc.
  • text similarity matching refers to obtaining the similarity between two texts through a formula, algorithm, or model of text similarity calculation. Degree process.
  • the text similarity algorithm includes, but is not limited to: cosine similarity, k-NearestNeighbor (kNN) classification algorithm, Dynamic Programming algorithm, Manhattan Distance and SimHash based local sensitive hashing. Hamming distance of the algorithm, etc.
  • the text similarity calculated in step S5 is compared with a preset similarity threshold. If the text similarity is greater than or equal to the preset similarity threshold, it is determined that the recording file corresponding to the target text meets the collection requirements.
  • the recording file, and then the mapping relationship between the recording file and the identification information is established, and the recording file, the identification information and the mapping relationship are correspondingly saved, and the voice data collection is completed.
  • a corpus acquisition request sent by an application account of a communication application platform is received, a corpus is randomly selected from the basic corpus included in the request as an initial corpus, and an application account using the communication application platform is obtained
  • the user uses the preset recording calling framework to call the recording function of the application account of the communication application platform for recording, generates a recording file, and uses the preset
  • the speech recognition calling framework calls the offline speech recognition function of the application account of the communication application platform to convert the recording file into the target text, and then uses a text matching algorithm to match the target text with the initial corpus to obtain similarity, and then The similarity is compared with a preset similarity threshold.
  • mapping relationship between a recording file and identity information is established, and the recording file, identity information, and mapping relationship are saved to Database, thus realizing the use of communication application platform
  • the application account based on the communication application platform performs voice collection through the network, for different regions The staff does not need to add additional purchasing equipment, saving collection costs.
  • the server interacts with the communication application platform, and calls the recording function provided by the communication application platform for recording.
  • a preset recording calling framework is used to call the recording function of the application account of the communication application platform to record and generate a recording file, which specifically includes the following steps:
  • the configuration information is sent to the communication application platform.
  • the configuration information It is used to provide communication application platform for permission verification.
  • the interface configuration refers to the configuration of signature information that needs to be verified by the communication application platform through the interface.
  • the signature information includes: appId, timestamp, nonceStr, signature, and jsApiList.
  • AppId is the unique identifier of the application account of the communication application platform. Timestamp is The timestamp when the signature was generated, nonceStr is the generated random string, signature refers to the signature of the server, jsApiList refers to the list of Java script language (JavaScript, JS) interfaces that need to be called, and in the embodiment of this application, the interfaces that need to be called
  • the list includes audio interfaces and speech recognition interfaces. Audio interfaces include: start recording interface, end recording interface, pause recording interface, etc.
  • the server sends configuration information to the communication application platform for permission verification
  • WeChat verifies the configuration information and returns the verification result to the server. If the returned information received by the server is verified, the authorization is successful.
  • the communication application platform sends a command to start the recording interface in the signature information of step S31, the communication application platform performs a recording operation and generates a recording file after receiving the instruction.
  • a recording start request when a recording start request is received, preset configuration information is sent to the communication application platform.
  • the communication application platform performs permission verification on the received configuration information. After the verification is passed, the receiving server sends an activation
  • the audio interface commands are set and executed, and the voice signals generated by the users using the application account of the communication application platform are collected to obtain the recording files, so that the voice collection can be completed quickly only by the user using the application account of the communication application platform, without the need to Professional equipment improves the efficiency of voice collection.
  • the voice collection through the application account of the communication application platform is realized through the network, there is no need to go to the field to collect, which saves the collection cost.
  • step S5 the target text is matched with the initial corpus to obtain the text similarity, and the text similarity between the target text and the initial corpus is obtained, which specifically includes the following steps:
  • S51 Determine punctuation marks in the initial corpus according to a preset regular expression.
  • the initial corpus includes punctuation marks, but the user does not read the punctuation marks when reading aloud. Therefore, the punctuation marks in the initial corpus need to be deleted.
  • a preset regular expression is used to The initial corpus is subjected to regular matching to determine the punctuation marks contained in the initial corpus.
  • the preset regular expression can specifically be: " ⁇ p ⁇ P
  • the Unicode character set contains 7 attributes, which are: P (punctuation character), L (letter), M (tag symbol), Z (delimiters, such as space characters, line breaks, etc.), S ( Symbols, such as mathematical symbols, currency, etc.), N (numbers, such as Arabic numerals and Roman numerals, etc.), and C (other characters), ⁇ P
  • C ⁇ means that when matching a character, if If any attribute of P, M, Z, S or C is matched, it is determined that the match is successful.
  • Unicode is also called universal code or single code, which is an industry standard in the field of computer science, including character sets and encoding schemes. Unicode was created to address the limitations of traditional character encoding schemes. It sets a unified and unique binary encoding for each character in each language to meet the requirements of cross-language and cross-platform text conversion processing.
  • the corpus “most people like this pair of fathers and sons and agree with this educational method of honing their sons on foot” After matching the preset regular expressions, find the punctuation " , ".
  • the punctuation marks found in step S51 are deleted, and the obtained text content is used as the target corpus.
  • the punctuation marks in the initial corpus are deleted to obtain the target corpus as "Most people praise this pair of fathers and sons and agree with this educational method of honing their sons on foot.”
  • S53 Use a dynamic programming algorithm to perform similarity calculation on the target text and target corpus to obtain the text similarity.
  • the length of all common substrings in the target text and target corpus is obtained, and the length with the largest value is selected as the target length, and then the target length and the number of characters in the target corpus are used.
  • the ratio of the numbers is used as the text similarity.
  • the dynamic programming algorithm divides the problem and defines the relationship between the problem state and the state, so that the problem can be solved recursively.
  • the dynamic programming algorithm is to decompose the problem to be solved into several sub-problems, and solve the sub-problems in order.
  • the solution of the former sub-problem provides useful information for the solution of the latter sub-problem.
  • solve any sub-problem list all possible local solutions, retain those that are likely to reach the optimal local solution through decision-making, and discard other local solutions. Solve each sub-problem in turn, and the solution of the last sub-problem is the solution of the initial problem.
  • the corresponding target corpus is obtained in step S51, "most people are this pair of fathers and children Praise and agree with this method of honing sons on foot ".
  • the dynamic programming algorithm calculates that the longest common sub-string in the target text and target corpus is" artificially, this pair of fathers and sons praise and agree to this kind of walk-through training Son ", the length of the common substring is 22 characters, and the number of characters in the target corpus is 29.
  • the text similarity between the target text and the target corpus is 22/29, which is about 75.86%.
  • the punctuation marks in the initial corpus are determined, and the punctuation marks are deleted, and the obtained text content is used as the target corpus to avoid the interference of the punctuation marks on the similarity calculation and improve Calculate the accuracy of text similarity, and then use the dynamic programming algorithm to perform similarity calculation on the target text and target corpus to obtain the text similarity.
  • use the dynamic programming algorithm to quickly perform similarity calculation and improve the efficiency of text similarity calculation.
  • step S53 the similarity calculation is performed on the target text and the target corpus using a dynamic programming algorithm to obtain the text similarity, which specifically includes the following steps:
  • S531 The characters in the target text are sequentially stored in the one-dimensional array X a , and the characters in the target corpus are sequentially stored in the one-dimensional array Y b , where a is the number of characters in the target text and b is the target. The number of characters in the corpus.
  • the number of characters a in the target text and the number b of characters in the target corpus are obtained, and the characters in the target text are sequentially stored in the one-dimensional array X a in the order from the beginning to the next.
  • the characters are sequentially stored in the one-dimensional array Y b in the order from front to back.
  • the length of the common subsequence in the i-th bit of X a and the j-th bit in Y b is calculated by formula (1).
  • the common subsequence refers to a subsequence included in both X a and Y b .
  • maximum ⁇ represents the maximum value in the curly braces as the value of the expression.
  • S533 Perform recursive backtracking on the length of the common subsequence to obtain a set of common subsequence lengths, and obtain the common subsequence length l with the largest value from the set as the target length, where l is a positive integer less than or equal to b.
  • the lengths of all common subsequences obtained in step S532 are stored in a matrix of row a and column b, and the matrix is recursively traced back in a preset manner, and each length is calculated according to formula (1) Update to obtain the length of all common subsequences, and select the length of the common subsequence with the largest value as the target length.
  • the preset method can be from right to left, then from bottom to top, or from bottom to top, and then from right to left, which can be selected according to the actual situation, which is not specifically limited here.
  • recursion refers to a method that directly or indirectly calls itself in a procedure or function in its definition or description. It usually transforms a large and complex problem layer into a smaller problem similar to the original problem. To solve, the recursive strategy only needs a small number of programs to describe the multiple repeated calculations required to solve the problem, which greatly reduces the code size of the program.
  • each time only need to be based on L (i-1, j), L (i, j-1), L (i-1, j -1) and same (X i , Y j ) can be calculated, and then the obtained L (i, j) can be used as L (i + 1, j), L (i, j + 1) or L (i + 1, j + 1) is calculated for subsequent calculations.
  • backtracking is also called heuristics. It is a similar enumerated search attempt process. It is mainly to find the solution to the problem during the search attempt. When it is found that the solution conditions have not been met, it will "backtrack" to return and try other path. At the same time, the backtracking method is also a search method for selecting the best. It searches forward according to the selection conditions to achieve the goal. But when a certain step is explored, it is found that the original choice is not good or does not reach the goal, then it returns to one step and chooses again.
  • the similarity calculation is performed by formula (2), and the quotient between the length of the common subsequence with the largest value and the number of characters of the target corpus is obtained from the set as the text similarity between the target text and the target corpus.
  • the characters in the target text are stored in an array
  • the characters in the target corpus are stored in another array
  • the two arrays are traversed to calculate the length of the common subsequence and obtain the maximum value.
  • the common subsequence length is used as the target length
  • the ratio of the target length to the number of target corpus characters is used as the text similarity.
  • the method for voice collection further includes the following steps:
  • step S7 If the text similarity is less than the preset similarity threshold, the voice collection failure is displayed on the interface of the application account of the communication application platform, and the process returns to step S2 to continue execution.
  • step S5 the text similarity is compared with a preset similarity threshold. If the text similarity is less than the preset similarity threshold, it is determined that the recording file corresponding to the target text is not consistent. Collect the required recording file. At this time, skip to step S2, that is, randomly select a corpus from the basic corpus corresponding to the basic corpus as the initial corpus, and display it on the interface of the application account of the communication application platform, and restart execution. In the process of speech collection, until the obtained text similarity is greater than or equal to a preset similarity threshold, step S6 is performed, and the collection process is ended.
  • the voice collection failure is displayed on the interface of the application account of the communication application platform, and the process returns to step S2 to continue the voice collection process until the obtained text similarity It is greater than or equal to a preset similarity threshold, so that the collected voice data is high-quality voice data, and it is avoided that when the voice data is used, it is found that the quality of the voice data does not meet the requirements, and then the voice data collection is performed again, which improves the The quality of the collected voice data.
  • a voice acquisition device corresponds to the voice acquisition method in the above embodiment.
  • the voice acquisition device includes a domain determination module 10, a corpus selection module 20, a recording generation module 30, a speech recognition module 40, a text matching module 50, and a file storage module 60.
  • the detailed description of each function module is as follows:
  • the domain determining module 10 is configured to, if a corpus acquisition request sent by a user through an application account of a communication application platform is received, acquire a basic corpus included in the corpus acquisition request and identification information of the user;
  • a corpus selection module 20 is used to randomly select a corpus from the basic corpus corresponding to the basic corpus as the initial corpus and display it on the interface of the application account of the communication application platform;
  • the recording generation module 30 is configured to, if a recording start request sent by a user through an application account of the communication application platform is received, use a preset recording calling framework to call the recording function of the application account of the communication application platform to record and generate a recording file;
  • the voice recognition module 40 is configured to, if a recording termination request sent by a user through the application account of the communication application platform is received, use a preset voice recognition calling framework to call the offline speech recognition function of the application account of the communication application platform to convert the recording file Is the target text;
  • a text matching module 50 configured to match text similarity between the target text and the initial corpus to obtain the text similarity between the target text and the initial corpus;
  • the file storage module 60 is configured to establish a mapping relationship between a recording file and identity information if the text similarity is greater than or equal to a preset similarity threshold, and correspondingly save the recording file, identity information, and mapping relationship.
  • the recording generation module 30 includes:
  • the authority verification unit 31 is configured to send preset configuration information to the communication application platform if a recording start request is received, where the configuration information is used to perform permission verification on the recording start request;
  • the recording driving unit 32 is configured to send an instruction to enable a preset audio interface to the communication application platform if the authorization success message sent by the communication application platform is received, so that the user uses the recording function of the application account of the communication application platform to record and generate Recording files.
  • the text matching module 50 includes:
  • a regular matching unit 51 configured to determine punctuation marks in an initial corpus according to a preset regular expression
  • a symbol processing unit 52 configured to delete punctuation marks, and use the obtained text content as a target corpus
  • the similarity calculation unit 53 is configured to use a dynamic programming algorithm to perform similarity calculation on the target text and the target corpus to obtain the text similarity.
  • the similarity calculation unit 53 includes:
  • Array construction subunit 531 is used to sequentially store characters in the target text into the one-dimensional array X a , and sequentially store characters in the target corpus into the one-dimensional array Y b , where a is a character of the target text Number, b is the number of characters in the target corpus;
  • the length calculation sub-unit 532 is configured to calculate a common subsequence length L (i, j) before the i-th bit of X a and the j-th bit of Y b by using the following formula:
  • the target determination subunit 533 is used to recursively trace the length of the common subsequence to obtain a set of the length of the common subsequence, and obtain the common subsequence length l with the largest value from the set as the target length, where l is less than or equal to b Positive integer
  • the similarity calculation sub-unit 534 is configured to calculate text similarity using the following formula:
  • the voice acquisition device further includes:
  • the loop collection module 70 is configured to display the voice collection failure on the interface of the application account of the communication application platform if the text similarity is less than a preset similarity threshold, and return to the corpus selection module 20 to continue execution.
  • Each module in the above voice collection device may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the hardware in or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 8.
  • the computer device includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in a non-volatile storage medium.
  • the database of the computer equipment is used to store the identification information and recording files in the foregoing voice collection method.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions are executed by a processor to implement a speech acquisition method.
  • a computer device which includes a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor.
  • the processor executes the computer-readable instructions
  • the speech collection of the foregoing embodiment is implemented.
  • the steps of the method are, for example, steps S1 to S6 shown in FIG. 2.
  • the processor executes the computer-readable instructions
  • the functions of each module / unit of the speech collection device of the foregoing embodiment are implemented, for example, modules 10 to 60 shown in FIG. 7. To avoid repetition, we will not repeat them here.
  • one or more non-volatile readable instructions are provided, and computer readable instructions are stored thereon.
  • the computer readable instructions are executed by one or more processors, the method for implementing the speech collection method of the foregoing embodiment is implemented. Steps, or when the computer-readable instructions are executed by one or more processors, implement the functions of the modules / units of the speech collection device of the above embodiment. To avoid repetition, details are not described herein again.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

一种语音采集方法、装置、计算机设备及存储介质,所述方法包括:在接收到用户通过通讯应用平台的应用账号发送的语料获取请求时,选取对应的初始语料,并获取用户的身份标识信息,在接收到开始录音的请求后,调用通讯应用平台的应用账号的录音功能进行录音,得到录音文件,并调用通讯应用平台的应用账号的离线语音识别功能,将该录音文件转化为目标文本,使用文本匹配算法,将该目标文本与初始语料进行匹配,得到文本相似度,若该文本相似度大于或等于预设的相似度阈值,则将录音文件、身份标识信息和它们之间的映射关系存入到数据库,从而实现了使用通讯应用平台的应用账号对语音数据进行快速采集,提高了语音数据的采集效率。

Description

语音采集方法、装置、计算机设备及存储介质
本申请以2018年5月31日提交的申请号为201810550137.2,名称为“语音采集方法、装置、终端设备及存储介质”的中国发明专利申请为基础,并要求其优先权。
技术领域
本申请涉及计算机技术领域,尤其涉及一种语音采集方法、装置、计算机设备及存储介质。
背景技术
随着科技的进步和计算机网络技术的飞速发展,声纹识别技术和语音识别技术越来越受人们的青睐,其中声纹识别技术用于方便快捷地确定说话人身份,语音识别技术用于在将自然语音进行识别转化成文字,声纹识别技术和语音识别技术都需要采集大量的说话人信息和说话人对应的语音信息,用来进行模型的训练。
当前,语音采集主要通过使用专业录音设备进行语音录取,得到录音文件,然后人工标记录音文件对应的说话人,这种人工方式不能及时清除掉不合格的录音,且操作麻烦,使得采集效率低,同时,这种人工采集方式不适用于对距离较远的人员进行语音采集,若需要对不同地区的人员同时进行语音采集,只能通过购置多台录音设备,浪费了大量采集成本。
发明内容
基于此,有必要针对上述技术问题,提供一种基于通讯应用平台的应用账号提高语音采集效率和节约采集成本的语音采集方法、装置、计算机设备及存储介质。
一种语音采集方法,包括:
若接收到用户通过通讯应用平台的应用账号发送的语料获取请求,则获取所述语料获取请求中包含的基础语料和所述用户的身份标识信息;
从所述基础语料对应的基础语料库中随机选取一条语料,作为初始语料,并显示在所述通讯应用平台的应用账号的界面上;
若接收到所述用户通过所述通讯应用平台的应用账号发送的开始录音请求,则使用预设的录音调用框架,调用所述通讯应用平台的应用账号的录音功能进行录音并生成录音文件;
若接收到所述用户通过所述通讯应用平台的应用账号发送的终止录音请求,则使用预设的语音识别调用框架,调用所述通讯应用平台的应用账号的离线语音识别功能将所述录音文件转化为目标文本;
对所述目标文本与所述初始语料进行文本相似度匹配,得到所述目标文本与所述初始语料之间的文本相似度;
若所述文本相似度大于或等于预设相似度阈值,则建立所述录音文件与所述身份标识信息之间的映射关系,并对应保存所述录音文件、所述身份标识信息和所述映射关系。
一种语音采集装置,包括:
领域确定模块,用于若接收到用户通过通讯应用平台的应用账号发送的语料获取请求,则获取所述语料获取请求中包含的基础语料和所述用户的身份标识信息;
语料选取模块,用于从所述基础语料对应的基础语料库中随机选取一条语料,作为初始语料,并显示在所述通讯应用平台的应用账号的界面上;
录音生成模块,用于若接收到所述用户通过所述通讯应用平台的应用账号发送的开始录音请求,则使用预设的录音调用框架,调用所述通讯应用平台的应用账号的录音功能进行录音并生成录音文件;
语音识别模块,用于若接收到所述用户通过所述通讯应用平台的应用账号发送的终止录音请求,则使用预设的语音识别调用框架,调用所述通讯应用平台的应用账号的离线语音识别功能将所述录音文件转化为目标文本;
文本匹配模块,用于对所述目标文本与所述初始语料进行文本相似度匹配,得到所述目标文本与所述初始语料之间的文本相似度;
文件存储模块,用于若所述文本相似度大于或等于预设相似度阈值,则建立所述录音文件与所述身份标识信息之间的映射关系,并对应保存所述录音文件、所述身份标识信息和所述映射关系。
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现上述语音采集方法的步骤。
一个或多个非易失性可读指令,所述非易失性可读指令存储有计算机可读指令,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行执行所述语音采集方法的步骤。
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的语音采集方法的应用环境示意图;
图2是本申请实施例提供的语音采集方法的实现流程图;
图3是本申请实施例提供的语音采集方法中步骤S3的实现流程图;
图4是本申请实施例提供的语音采集方法中步骤S5的实现流程图;
图5是本申请实施例提供的语音采集方法中步骤S53的实现流程图;
图6是本申请实施例提供的语音采集方法的另一实现流程图;
图7是本申请实施例提供的语音采集装置的示意图;
图8是本申请实施例提供的计算机设备的示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
请参阅图1,图1示出本申请实施例提供的语音采集方法的应用环境。该语音采集方法应用在基于通讯应用平台的应用账号的语音采集场景中。该语音采集场景包括服务端、客户端和通讯应用平台,其中,服务端、客户端和通讯应用平台之间均通过网络互相连接,用户通过客户端从服务端获取语料,并发送语音数据到通讯应用平台,服务端通过通讯应用平台进行录音和语音识别,通讯应用平台将录音文件和语音识别结果发送到服务端,通讯应用平台为即时通讯系统,该即时通讯系统可以传输语音,图片,视频等文件。所述通讯应用平台可以是微信、米聊、易信、支付宝和快乐平安等其他通讯应用平台。第三方软件开发商可以基于通讯应用平台开发各自的应用账号,即通讯应用平台的应用账号。客户端为通讯应用平台的应用账号,该账号具体可以微信公众号、支付宝公众号、微信小程序等,服务端具体可以用独立的服务器或者多个服务器组成的服务器集群实现。
请参阅图2,图2示出本申请实施例提供的一种语音采集方法,以该方法应用在图1中的服务端为例进行说明,详述如下:
S1:若接收到用户通过通讯应用平台的应用账号发送的语料获取请求,则获取语料获取请求中包含的基础语料和该用户的身份标识信息。
具体地,在接收到用户通过通讯应用平台的应用账号发送的语料获取请求时,获取该语料获取请求中包含的基础语料,同时,基于通讯应用平台的应用账号,获取该用户的身份标识信息。
其中,基础语料包括但不限于:新闻、军事、生活、经济、体育、热点和娱乐等领域的数据分类,基础语料可以根据需要来设计,此处不作具体限制。在用户点击通讯应用平台的应用账号上的某个基础语料后,即通过通讯应用平台的应用账号向服务端发送了获取该基础语料的语料获取请求。
其中,身份标识信息用于唯一标识用户,身份标识信息包括但不限于:工号、通讯应用平台身份标识((Identification,ID)和姓名等,也可以根据实际需要进行设置,此处不作具体限制。
例如,在一具体实施方式中,通讯应用平台的应用账号为微信公众号,接收到用户通过微信公众号发送的语料获取请求中包含的基础语料为“新闻”,获取到的身份标识信息为“619-Zhangsan-张三”,其中“619”是指用户的工号,“Zhangsan”是用户的微信ID,“张三”是指用户的姓名。
值得说明的是,在用户非首次使用该通讯应用平台的应用账号时,可以直接通过通讯应用平台的应用账号来获取用户的通讯应用平台身份标识,并获取服务端存储的与该通讯 应用平台身份标识对应的用户的身份标识信息,在用户首次使用该通讯应用平台的应用账号时,服务端向通讯应用平台的应用账号的用户发送身份信息完善请求,具体可以是通过弹出补全资料的页面来供用户填写,也可以根据需要使用其他方式,此处不作具体限制,在获取到用户填写的身份信息完善资料后,将该身份信息存储到服务端。
S2:从基础语料对应的基础语料库中随机选取一条语料,作为初始语料,并显示在通讯应用平台的应用账号的界面上。
具体地,每个基础语料均预设有对应的基础语料库,在获取步骤S1中语料获取请求中包含的基础语料后,确认该基础语料对应的基础语料库,并使用随机函数,从该基础语料库中随机选取一条语料,作为初始语料,显示在通讯应用平台的应用账号的界面上,供用户朗读。
其中,基础语料库是指经科学取样和加工的大规模电子文本库,基础语料库中存放的是在语言的实际使用中真实出现过的语言材料,是以电子计算机为载体承载语言知识的基础资源。
以步骤S1中获取到的基础语料“新闻”为例,从“新闻”对应的基础语料库中随机选取一条语料为“大多数人为这对父子点赞,并认同这种通过徒步磨练儿子的教育方式”,并将这条语料显示在微信公众号的界面上。
S3:若接收到用户通过通讯应用平台的应用账号发送的开始录音请求,则使用预设的录音调用框架,调用通讯应用平台的应用账号的录音功能进行录音并生成录音文件。
具体地,在将初始语料显示在通讯应用平台的应用账号的界面上之后,用户通过点击微信公共号上设置的录音按钮来向服务端发送开始录音请求,服务端在接收到用户通过通讯应用平台的应用账号发送的开始录音请求后,通过使用预设的录音调用框架调用通讯应用平台的应用账号的录音功能,对用户朗读初始语料产生的语音进行录音,并保存为录音文件。
其中,录音调用框架是通过调用通讯应用平台的JSSDK(JavaScript Software Development Kit,Java脚本语言软件开发工具包)中的音频接口实现录音,该音频接口包括但不限于:开始录音接口和停止录音接口等。
以步骤S2中获取到的语料为例,在微信公众号的界面上显示出“大多数人为这对父子点赞,并认同这种通过徒步磨练儿子的教育方式”这条语料后,用户点击录音按钮,即完成通过微信公众号向服务端发送开始录音请求,服务端在接收到该请求后,通过录音调用框架来调用微信服务器的音频接口中的开始录音接口来进行录音。
S4:若接收到用户通过通讯应用平台的应用账号发送的终止录音请求,则使用预设的语音识别调用框架,调用通讯应用平台的应用账号的离线语音识别功能将录音文件转化为目标文本。
具体地,在用户朗读完初始语料后,通过点击通讯应用平台的应用账号界面上的停止录音按钮向服务端发送终止录音请求,服务端在接收到终止录音请求后,通过使用预设的语音识别调用框架,调用通讯应用平台的应用账号提供的离线语音识别功能,对录音文件进行语音识别,得到目标文本。
以步骤S3中得到的录音文件为例,在用户按下停止录音按钮后,服务端通过微信服务器,对该录音文件进行语音识别,得到目标文本为“大多人为这对父子点赞并认同这种 通过徒步磨练儿子的方式”的目标文本。
S5:对目标文本与初始语料进行文本相似度匹配,得到目标文本与初始语料之间的文本相似度。
具体地,对得到的目标文本与初始语料进行文本相似度匹配,得到目标文本与初始语料之间的文本相似度,进而可以根据该文本相似度来确定目标文本对应的录音文件是否符合要求。
其中,文本相似度是指通过相似系数、相似距离等尺度来衡量文本之间的相似程度,这里文本相似度匹配是指通过文本相似度计算的公式、算法或模型来获取两个文本之间相似程度的过程。
其中,文本相似度算法包括但不限于:余弦相似性、最近邻(k-NearestNeighbor,kNN)分类算法、动态规划(Dynamic Programming)算法、曼哈顿距离(Manhattan Distance)和基于局部敏感哈希(SimHash)算法的汉明距离等。
S6:若文本相似度大于或等于预设相似度阈值,则建立录音文件与身份标识信息之间的映射关系,并对应保存录音文件、身份标识信息和映射关系。
具体地,将步骤S5中计算出的文本相似度与预设的相似度阈值进行比较,若文本相似度大于或等于预设的相似度阈值,则确定目标文本对应的录音文件为符合采集要求的录音文件,进而建立该录音文件与身份标识信息之间的映射关系,并对应保存录音文件、身份标识信息和映射关系,即完成了语音数据的采集。
在本实施例中,若接收到通讯应用平台的应用账号发送的语料获取请求,则从该请求中包含的基础语料中随机选取一条语料,作为初始语料,并获取使用该通讯应用平台的应用账号的用户的身份标识信息,在接收到开始录音请求后,使用预设的录音调用框架,调用通讯应用平台的应用账号的录音功能进行录音,生成录音文件,并在录音结束后,使用预设的语音识别调用框架,调用通讯应用平台的应用账号的离线语音识别功能,将该录音文件转化为目标文本,进而使用文本匹配算法,将该目标文本与初始语料进行匹配,得到相似度,并将该相似度与预设的相似度阈值进行比较,若该相似度大于或等于预设的相似度阈值,则建立录音文件与身份标识信息的映射关系,并保存录音文件、身份标识信息和映射关系到数据库,从而实现了使用通讯应用平台的应用账号对语音数据进行快速采集,利用通讯应用平台的应用账号的便捷性和普遍性,有效提高语音数据的采集效率,同时由于是基于通讯应用平台的应用账号通过网络进行语音采集,对于不同地区的人员无需添加额外购置设备,节约了采集成本。
在一实施例中,服务端与通讯应用平台进行交互,调用通讯应用平台提供的录音功能进行录音,如图3所示,步骤S3中,即若接收到用户通过通讯应用平台的应用账号发送的开始录音请求,则使用预设的录音调用框架,调用通讯应用平台的应用账号的录音功能进行录音并生成录音文件,具体包括如下步骤:
S31:若接收到开始录音请求,则向通讯应用平台发送预设的配置信息,其中,配置信息用于对开始录音请求进行权限验证。
具体地,在进行语音采集之前,需要预先在服务端的配置(config)接口上进行接口配置,得到预设的配置信息,在接收到开始录音请求时,向通讯应用平台发送该配置信息,配置信息用于提供给通讯应用平台进行权限验证。
其中,接口配置是指通过接口来配置需要通讯应用平台需要进行验证的签名信息,签名信息包括:appId、timestamp、nonceStr、signature和jsApiList,appId是该通讯应用平台的应用账号的唯一标识,timestamp是生成签名时的时间戳,nonceStr是生成的随机字符串,signature是指服务端的签名,jsApiList是指需要调用的Java脚本语言(JavaScript,JS)接口列表,在本申请实施例中,需要调用的接口列表为音频接口和语音识别接口,音频接口包括:开始录音接口、结束录音接口和暂停录音接口等。
S32:若接收到通讯应用平台发送的授权成功消息,则向通讯应用平台发送启用预设的音频接口的指令,使得用户使用通讯应用平台的应用账号的录音功能进行录音并生成录音文件。
具体地,服务端在向通讯应用平台发送配置信息进行权限验证后,微信对该配置信息进行验证,并将验证结果返回给服务端,若服务端接收到的返回信息为验证通过,提示授权成功时,则向通讯应用平台发送启用步骤S31的签名信息中的开始录音接口的指令,通讯应用平台在接收到该指令后,执行录音操作并生成录音文件。
在本实施例中,在接收到开始录音请求时,向通讯应用平台发送预设的配置信息,通讯应用平台对接收到的配置信息进行权限验证,在验证通过后,接收服务端发送的启用预设的音频接口的指令并执行,对使用通讯应用平台的应用账号的用户产生的语音信号进行采集,得到录音文件,使得语音采集仅需通过用户使用通讯应用平台的应用账号来快捷完成,而无需专业的设备,提高了语音采集的效率,同时由于通过通讯应用平台的应用账号进行语音采集是通过网络来实现,无需去实地采集,节约了采集成本。
在一实施例中,如图4所示,步骤S5中,即对目标文本与初始语料进行文本相似度匹配,得到目标文本与初始语料之间的文本相似度,具体包括如下步骤:
S51:根据预设的正则表达式,确定初始语料中的标点符号。
具体地,初始语料中包含有标点符号,但用户在朗读时不会朗读标点符号,因而需要将初始语料中的标点符号进行删除处理,在本实施例中,通过预设的正则表达式,对初始语料进行正则匹配,确定初始语料中包含的标点符号。
其中,预设的正则表达式具体可以为:“\p{P|M|Z|S|C}”,小写的p是属性(property)的意思,表示Unicode(统一码)属性,用于Unicode正则表达式的前缀,Unicode字符集包含7个属性,分别是:P(标点字符)、L(字母)、M(标记符号)、Z(分隔符,比如空格符、换行符等)、S(符号,比如数学符号、货币等)、N(数字,比如阿拉伯数字和罗马数字等)和C(其他字符),{P|M|Z|S|C}表示在对一个字符进行匹配时,若匹配到P、M、Z、S或C中的任一种属性,即确定匹配成功。
其中,Unicode又称为万国码或单一码,是计算机科学领域里的一项业界标准,包括字符集、编码方案等。Unicode是为了解决传统的字符编码方案的局限而产生的,它为每种语言中的每个字符设定了统一并且唯一的二进制编码,以满足跨语言、跨平台进行文本转换处理的要求。
以步骤S2中获取到的语料为例,该语料“大多数人为这对父子点赞,并认同这种通过徒步磨练儿子的教育方式”经过预设的正则表达式匹配后,找到标点符号“,”。
值得说明的是,若相同类型的标点符号超过一个,可以给该标点符号设置一个对应的标识,例如“douhao001”、“douhao002”,在后续需要对该标点符号进行处理时,只需 要根据标识就可以准确找到该标识对应的标点符号。
S52:对标点符号进行删除处理,将得到的文本内容作为目标语料。
具体地,对步骤S51中找到的标点符号进行删除处理,将得到的文本内容作为目标语料。可以使用replace函数进行删除处理,将找到的标点符号替换成空,比如:replace(“,”,“”),也可以使用字符串前移覆盖的方式进行删除处理,比如:针对第一个待删除的标点符号,将该待删除的标点符号之后的所有字符往前移动一个字符的位置,并覆盖保存,针对后面的标点符号也按照此方法进行删除即可。
以步骤S51中的初始语料“大多数人为这对父子点赞,并认同这种通过徒步磨练儿子的教育方式”为例,对该初始语料中的标点符号进行删除处理,获取到目标语料为“大多数人为这对父子点赞并认同这种通过徒步磨练儿子的教育方式”。
应理解,以上提供的两种删除处理方法为本申请实施例优选的两种方式,但并不限于此,实际也可以根据实际需要选取其他合适的方式进行删除处理,此处不作具体限制。
S53:使用动态规划算法,对目标文本和目标语料进行相似度计算,得到文本相似度。
具体地,通过使用动态规划算法(Dynamic Programming Algorithm),获取目标文本和目标语料中的所有公共子字符串长度,并选取值最大的长度作为目标长度,进而用目标长度和目标语料中字符个数的比值作为文本相似度。
其中,动态规划算法是通过拆分问题,定义问题状态和状态之间的关系,使得问题能够以递推的方式去解决。动态规划算法是将待求解的问题分解为若干个子问题,按顺序求解子问题,前一子问题的解,为后一子问题的求解提供了有用的信息。在求解任一子问题时,列出各种可能的局部解,通过决策保留那些有可能达到最优的局部解,丢弃其他局部解。依次解决各子问题,最后一个子问题的解就是初始问题的解。
以步骤S4中获取到的目标文本“大多人为这对父子点赞并认同这种通过徒步磨练儿子的方式”为例,其对应的目标语料为步骤S51中获取到“大多人为这对父子点赞并认同这种通过徒步磨练儿子的方式”,经过动态规划算法计算出该目标文本与目标语料中长度最大的公共子字符串为“人为这对父子点赞并认同这种通过徒步磨练儿子的”,该公共子字符串的长度为22个字符,目标语料中的字符个数为29,则目标文本与目标语料的文本相似度为22/29,约为75.86%。
在本实施例中,根据预设的正则表达式,确定初始语料中的标点符号,并对标点符号进行删除处理,将得到的文本内容作为目标语料,避免标点符号对相似度计算的干扰,提升计算文本相似度准确度,进而使用动态规划算法,对目标文本和目标语料进行相似度计算,得到文本相似度,同时,使用动态规划算法可以快速进行相似度计算,提高文本相似度计算效率。
在一实施例中,如图5所示,步骤S53中,即使用动态规划算法,对目标文本和目标语料进行相似度计算,得到文本相似度,具体包括如下步骤:
S531:将目标文本中的字符依序存入一维数组X a中,将目标语料中的字符依序存入一维数组Y b中,其中,a为目标文本的字符个数,b为目标语料的字符个数。
具体地,获取目标文本中的字符个数a和目标语料中的字符个数b,并将目标文本中 的字符按照从前往后的顺序依次存入一维数组X a中,将目标语料中的字符按照从前往后的顺序依次存入一维数组Y b中。
以步骤S4中获取的目标文本和步骤S52中获取的目标语料为例,得到目标文本字符个数为26,得到目标语料的字符个数为29,将目标文本存入一维数组,得到:X 26={大,多,人,为,这,对,父,子,点,赞,并,认,同,这,种,通,过,徒,步,磨,练,儿,子,的,方,式},将目标语料存入一维数组,得到:Y 29={大,多,数,人,为,这,对,父,子,点,赞,并认,同,这,种,通,过,徒,步,磨,练,儿,子,的,教,育,方,式}。
S532:使用如下公式计算X a的第i位与Y b的第j位之前的公共子序列长度L(i,j):
Figure PCTCN2018094364-appb-000001
其中,若same(X i,Y j)在X i与Y j相同时,取值为1,否则,same(X i,Y j)在X i与Y j不相同时,取值为0,max{L(i-1,j-1)+same(X i,Y j),L(i-1,j),L(i,j-1)}为取L(i-1,j-1)+same(X i,Y j)、L(i-1,j)和L(i,j-1)三个表达式中的最大值,i和j均为正整数,且i≤a,j≤b。
具体地,通过公式(1)来计算X a的第i位与Y b第j位之前中公共子序列的长度。
其中,公共子序列是指X a和Y b都包含的子序列。
其中,“max{}”表示去大括号中的最大值作为表达式的值。
以步骤S531中的两个一维数组为例,在i=8时,X 8=“子”,在j=9时,Y 9=“子”,因而L(8,9)=1,针对X 26和Y 29这两个一维数组,使用公式(1)计算可以得到L(1,1)至L(26,29)共26x29个长度。
值得说明的是,公共子序列长度的初始值均为0。
S533:对公共子序列长度进行递归回溯,得到公共子序列长度的集合,并从集合中获取值最大的公共子序列长度l,作为目标长度,其中,l为小于等于b的正整数。
具体地,将步骤S532中得到的所有公共子序列长度存入到a行b列的矩阵中,采用预设的方式,对这个矩阵进行递归回溯,并按照公式(1)对每一个长度进行计算更新,得出所有公共子序列长度,并选取值最大的公共子序列长度,作为目标长度。
其中,预设的方式可以是从右到左,再从下到上,也可以是从下到上,再从右到左,可以根据实际情形选取,此处不作具体限制。
其中,递归是指在一个过程或函数在其定义或说明中有直接或间接调用自身的一种方法,它通常把一个大型复杂的问题层层转化为一个与原问题相似的规模较小的问题来求解,递归策略只需少量的程序就可描述出解题过程所需要的多次重复计算,大大地减少了程序的代码量。例如,在本实施例中,在对L(i,j)进行计算时,每次只需根据L(i-1,j)、L(i,j-1)、L(i-1,j-1)和same(X i,Y j)的值来计算即可,然后可将得到的L(i,j)来作为L(i+1,j)、L(i,j+1)或L(i+1,j+1)的计算因子来进行后续计算。
其中,回溯是也叫试探法,它是一个类似枚举的搜索尝试过程,主要是在搜索尝试过程中寻找问题的解,当发现已不满足求解条件时,就“回溯”返回,尝试别的路径。同时,回溯法也是一种选优搜索法,按选优条件向前搜索,以达到目标。但当探索到某一步时,发现原先选择并不优或达不到目标,就退回一步重新选择,这种走不通就退回再走的技术为回溯法,而满足回溯条件的某个状态的点称为“回溯点”,在本实施例中,以针对由长度构成的26x29的矩阵为例,可以先从第一行左边即L(1,1)向第一行的右边即L(1,29)进行递归计算,并对第一行的29个长度进行更新,在更新完矩阵第一行的长度后,再第二行左边向右边进行递归计算,按照此方法继续更新,直到更新完整个矩阵为止,在更新后的矩阵中,选取值最大的长度作为目标长度。
S534:使用如下公式计算文本相似度:
Figure PCTCN2018094364-appb-000002
其中,θ为文本相似度,θ∈[0,1]。
具体地,通过公式(2)进行相似度计算,将从集合中获取值最大的公共子序列长度与目标语料的字符个数之间的商,作为目标文本与目标语料之间的文本相似度。
在本实施例中,通过将目标文本中的字符存入到一个数组,将目标语料中的字符存入到另一个数组,进而对这两个数组进行遍历计算公共子序列长度,并获取值最大的公共子序列长度,作为目标长度,进而将目标长度与目标语料字符个数的比值作为文本相似度,通过该方法可以快速准确得到目标文本与目标语料的文本相似度,提升了计算文本相似度的效率。
在一实施例中,如图6所示,在步骤S5之后,该语音采集方法还包括如下步骤:
S7:若文本相似度小于预设相似度阈值,则在通讯应用平台的应用账号的界面上显示语音采集失败,并返回步骤S2继续执行。
具体地,在步骤S5中得到文本相似度后,将该文本相似度与预设相似度阈值进行比较,若该文本相似度小于预设相似度阈值,则确定目标文本对应的录音文件为不符合采集要求的录音文件,此时,跳转到步骤S2,即从基础语料对应的基础语料库中随机选取一条语料,作为初始语料,并显示在通讯应用平台的应用账号的界面上处,重新开始执行语音采集的过程,直到得到的文本相似度大于或等于预设相似度阈值时,执行步骤S6,并结束此次采集过程。
在本实施例中,在文本相似度小于预设相似度阈值时,在通讯应用平台的应用账号的界面上显示语音采集失败,并返回步骤S2继续执行语音采集的过程,直到得到的文本相 似度大于或等于预设相似度阈值为止,使得采集到的语音数据为高质量语音数据,避免在使用语音数据时,才发现该语音数据质量达不到要求,进而重新去进行语音数据采集,提高了采集到的语音数据的质量。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在一实施例中,提供一种语音采集装置,该语音采集装置与上述实施例中语音采集方法一一对应。如图7所示,该语音采集装置包括领域确定模块10、语料选取模块20、录音生成模块30、语音识别模块40、文本匹配模块50和文件存储模块60。各功能模块详细说明如下:
领域确定模块10,用于若接收到用户通过通讯应用平台的应用账号发送的语料获取请求,则获取语料获取请求中包含的基础语料和该用户的身份标识信息;
语料选取模块20,用于从基础语料对应的基础语料库中随机选取一条语料,作为初始语料,并显示在通讯应用平台的应用账号的界面上;
录音生成模块30,用于若接收到用户通过通讯应用平台的应用账号发送的开始录音请求,则使用预设的录音调用框架,调用通讯应用平台的应用账号的录音功能进行录音并生成录音文件;
语音识别模块40,用于若接收到用户通过通讯应用平台的应用账号发送的终止录音请求,则使用预设的语音识别调用框架,调用通讯应用平台的应用账号的离线语音识别功能将录音文件转化为目标文本;
文本匹配模块50,用于对目标文本与初始语料进行文本相似度匹配,得到目标文本与初始语料之间的文本相似度;
文件存储模块60,用于若文本相似度大于或等于预设相似度阈值,则建立录音文件与身份标识信息之间的映射关系,并对应保存录音文件、身份标识信息和映射关系。
进一步地,录音生成模块30包括:
权限验证单元31,用于若接收到开始录音请求,则向通讯应用平台发送预设的配置信息,其中,配置信息用于对开始录音请求进行权限验证;
录音驱动单元32,用于若接收到通讯应用平台发送的授权成功消息,则向通讯应用平台发送启用预设的音频接口的指令,使得用户使用通讯应用平台的应用账号的录音功能进行录音并生成录音文件。
进一步地,文本匹配模块50包括:
正则匹配单元51,用于根据预设的正则表达式,确定初始语料中的标点符号;
符号处理单元52,用于对标点符号进行删除处理,将得到的文本内容作为目标语料;
相似计算单元53,用于使用动态规划算法,对目标文本和目标语料进行相似度计算,得到文本相似度。
进一步地,相似计算单元53包括:
数组构建子单元531,用于将目标文本中的字符依序存入一维数组X a中,将目标语料中的字符依序存入一维数组Y b中,其中,a为目标文本的字符个数,b为目标语料的字符个数;
长度计算子单元532,用于使用如下公式计算X a的第i位与Y b的第j位之前的公共子序列长度L(i,j):
Figure PCTCN2018094364-appb-000003
其中,若same(X i,Y j)在X i与Y j相同时,取值为1,否则,same(X i,Y j)在X i与Y j不相同时,取值为0,max{L(i-1,j-1)+same(X i,Y j),L(i-1,j),L(i,j-1)}为取L(i-1,j-1)+same(X i,Y j)、L(i-1,j)和L(i,j-1)三个表达式中的最大值,i和j均为正整数,且i≤a,j≤b;
目标确定子单元533,用于对公共子序列长度进行递归回溯,得到公共子序列长度的集合,并从集合中获取值最大的公共子序列长度l,作为目标长度,其中,l为小于等于b的正整数;
相似度计算子单元534,用于使用如下公式计算文本相似度:
Figure PCTCN2018094364-appb-000004
其中,θ为文本相似度,θ∈[0,1]。
进一步地,该语音采集装置还包括:
循环采集模块70,用于若文本相似度小于预设相似度阈值,则在通讯应用平台的应用账号的界面上显示语音采集失败,并返回语料选取模块20继续执行。
关于语音采集装置的具体限定可以参见上文中对于语音采集方法的限定,在此不再赘述。上述语音采集装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图8所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储前述语音采集方法中的身份标识信息和录音文件。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种语音采集方法。
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现上述实施例语音采集方法的步骤,例如图2所示的步骤S1至步骤S6。或者,处理器执行计算机可读指令时实现上述实施例语音采集装置的各模块/单元的功能,例如图7所示的模块10至模块60。为避免重复,这里不再赘述。
在一个实施例中,提供了一个或多个非易失性可读指令,其上存储有计算机可读指令,计算机可读指令被一个或多个处理器执行时实现上述实施例语音采集方法的步骤,或者, 计算机可读指令被一个或多个处理器执行时实现上述实施例语音采集装置的各模块/单元的功能,为避免重复,这里不再赘述。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种语音采集方法,其特征在于,所述语音采集方法包括:
    若接收到用户通过通讯应用平台的应用账号发送的语料获取请求,则获取所述语料获取请求中包含的基础语料和所述用户的身份标识信息;
    从所述基础语料对应的基础语料库中随机选取一条语料,作为初始语料,并显示在所述通讯应用平台的应用账号的界面上;
    若接收到所述用户通过所述通讯应用平台的应用账号发送的开始录音请求,则使用预设的录音调用框架,调用所述通讯应用平台的应用账号的录音功能进行录音并生成录音文件;
    若接收到所述用户通过所述通讯应用平台的应用账号发送的终止录音请求,则使用预设的语音识别调用框架,调用所述通讯应用平台的应用账号的离线语音识别功能将所述录音文件转化为目标文本;
    对所述目标文本与所述初始语料进行文本相似度匹配,得到所述目标文本与所述初始语料之间的文本相似度;
    若所述文本相似度大于或等于预设相似度阈值,则建立所述录音文件与所述身份标识信息之间的映射关系,并对应保存所述录音文件、所述身份标识信息和所述映射关系。
  2. 如权利要求1所述的语音采集方法,其特征在于,所述若接收到所述用户通过所述通讯应用平台的应用账号发送的开始录音请求,则使用预设的录音调用框架,调用所述通讯应用平台的应用账号的录音功能进行录音并生成录音文件包括:
    若接收到所述开始录音请求,则向所述通讯应用平台发送预设的配置信息,其中,所述配置信息用于对开始录音请求进行权限验证;
    若接收到所述通讯应用平台发送的授权成功消息,则向所述通讯应用平台发送启用预设的音频接口的指令,使得所述用户使用所述通讯应用平台的应用账号的录音功能进行录音并生成录音文件。
  3. 如权利要求1或2所述的语音采集方法,其特征在于,所述对所述目标文本与所述初始语料进行文本相似度匹配,得到所述目标文本与所述初始语料之间的文本相似度包括:
    根据预设的正则表达式,确定所述初始语料中的标点符号;
    对所述标点符号进行删除处理,将得到的文本内容作为目标语料;
    使用动态规划算法,对所述目标文本和所述目标语料进行相似度计算,得到所述文本相似度。
  4. 如权利要求3所述的语音采集方法,其特征在于,所述使用动态规划算法,对所述目标文本和所述目标语料进行相似度计算,得到所述文本相似度包括:
    将所述目标文本中的字符依序存入一维数组X a中,将所述目标语料中的字符依序存入一维数组Y b中,其中,a为所述目标文本的字符个数,b为所述目标语料的字符个数;
    使用如下公式计算X a的第i位与Y b的第j位之前的公共子序列长度L(i,j):
    Figure PCTCN2018094364-appb-100001
    其中,若same(X i,Y j)在X i与Y j相同时,取值为1,否则,same(X i,Y j)在X i与Y j不相同时,取值为0,max{L(i-1,j-1)+same(X i,Y j),L(i-1,j),L(i,j-1)}为取L(i-1,j-1)+same(X i,Y j)、L(i-1,j)和L(i,j-1)三个表达式中的最大值,i和j均为正整数,且i≤a,j≤b;
    对所述公共子序列长度进行递归回溯,得到所述公共子序列长度的集合,并从所述集合中获取值最大的公共子序列长度l,作为目标长度,其中,l为小于等于b的正整数;
    使用如下公式计算所述文本相似度:
    Figure PCTCN2018094364-appb-100002
    其中,θ为所述文本相似度,θ∈[0,1]。
  5. 如权利要求1所述的语音采集方法,其特征在于,在所述对所述目标文本与所述初始语料进行文本相似度匹配,得到所述目标文本与所述初始语料之间的文本相似度之后,所述语音采集方法还包括:
    若所述文本相似度小于预设相似度阈值,则在所述通讯应用平台的应用账号的界面上显示语音采集失败,并返回从所述基础语料对应的基础语料库中随机选取一条语料,作为初始语料,并显示在所述通讯应用平台的应用账号的界面上的步骤继续执行。
  6. 一种语音采集装置,其特征在于,所述语音采集装置包括:
    领域确定模块,用于若接收到用户通过通讯应用平台的应用账号发送的语料获取请求,则获取所述语料获取请求中包含的基础语料和所述用户的身份标识信息;
    语料选取模块,用于从所述基础语料对应的基础语料库中随机选取一条语料,作为初始语料,并显示在所述通讯应用平台的应用账号的界面上;
    录音生成模块,用于若接收到所述用户通过所述通讯应用平台的应用账号发送的开始录音请求,则使用预设的录音调用框架,调用所述通讯应用平台的应用账号的录音功能进行录音并生成录音文件;
    语音识别模块,用于若接收到所述用户通过所述通讯应用平台的应用账号发送的终止录音请求,则使用预设的语音识别调用框架,调用所述通讯应用平台的应用账号的离线语音识别功能将所述录音文件转化为目标文本;
    文本匹配模块,用于对所述目标文本与所述初始语料进行文本相似度匹配,得到所述目标文本与所述初始语料之间的文本相似度;
    文件存储模块,用于若所述文本相似度大于或等于预设相似度阈值,则建立所述录音文件与所述身份标识信息之间的映射关系,并对应保存所述录音文件、所述身份标识信息和所述映射关系。
  7. 如权利要求6所述的语音采集装置,其特征在于,所述录音生成模块包括:
    权限验证单元,用于若接收到所述开始录音请求,则向所述通讯应用平台发送预设的配置信息,其中,所述配置信息用于对开始录音请求进行权限验证;
    录音驱动单元,用于若接收到所述通讯应用平台发送的授权成功消息,则向所述通讯应用平台发送启用预设的音频接口的指令,使得所述用户使用所述通讯应用平台的应用账号的录音功能进行录音并生成录音文件。
  8. 如权利要求6或7所述的语音采集装置,其特征在于,所述文本匹配模块包括:
    正则匹配单元,用于根据预设的正则表达式,确定所述初始语料中的标点符号;
    符号处理单元,用于对所述标点符号进行删除处理,将得到的文本内容作为目标语料;
    相似计算单元,用于使用动态规划算法,对所述目标文本和所述目标语料进行相似度计算,得到所述文本相似度。
  9. 如权利要求8所述的语音采集装置,其特征在于,相似计算单元包括:
    数组构建子单元,用于将所述目标文本中的字符依序存入一维数组X a中,将所述目标语料中的字符依序存入一维数组Y b中,其中,a为所述目标文本的字符个数,b为所述目标语料的字符个数;
    长度计算子单元,用于使用如下公式计算X a的第i位与Y b的第j位之前的公共子序列长度L(i,j):
    Figure PCTCN2018094364-appb-100003
    其中,若same(X i,Y j)在X i与Y j相同时,取值为1,否则,same(X i,Y j)在X i与Y j不相同时,取值为0,max{L(i-1,j-1)+same(X i,Y j),L(i-1,j),L(i,j-1)}为取L(i-1,j-1)+same(X i,Y j)、L(i-1,j)和L(i,j-1)三个表达式中的最大值,i和j均为正整数,且i≤a,j≤b;
    目标确定子单元,用于对所述公共子序列长度进行递归回溯,得到所述公共子序列长度的集合,并从所述集合中获取值最大的公共子序列长度l,作为目标长度,其中,l为小于等于b的正整数;
    相似度计算子单元,用于使用如下公式计算所述文本相似度:
    Figure PCTCN2018094364-appb-100004
    其中,θ为所述文本相似度,θ∈[0,1]。
  10. 如权利要求6所述的语音采集装置,其特征在于,所述语音采集装置还包括:
    循环采集模块,用于若所述文本相似度小于预设相似度阈值,则在所述通讯应用平台的应用账号的界面上显示语音采集失败,并返回从所述基础语料对应的基础语料库中随机选取一条语料,作为初始语料,并显示在所述通讯应用平台的应用账号的界面上的步骤继续执行。
  11. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下采集方法:
    若接收到用户通过通讯应用平台的应用账号发送的语料获取请求,则获取所述语料获取请求中包含的基础语料和所述用户的身份标识信息;
    从所述基础语料对应的基础语料库中随机选取一条语料,作为初始语料,并显示在所述通讯应用平台的应用账号的界面上;
    若接收到所述用户通过所述通讯应用平台的应用账号发送的开始录音请求,则使用预设的录音调用框架,调用所述通讯应用平台的应用账号的录音功能进行录音并生成录音文件;
    若接收到所述用户通过所述通讯应用平台的应用账号发送的终止录音请求,则使用预 设的语音识别调用框架,调用所述通讯应用平台的应用账号的离线语音识别功能将所述录音文件转化为目标文本;
    对所述目标文本与所述初始语料进行文本相似度匹配,得到所述目标文本与所述初始语料之间的文本相似度;
    若所述文本相似度大于或等于预设相似度阈值,则建立所述录音文件与所述身份标识信息之间的映射关系,并对应保存所述录音文件、所述身份标识信息和所述映射关系。
  12. 如权利要求11所述的终端设备,其特征在于,其特征在于,所述若接收到所述用户通过所述通讯应用平台的应用账号发送的开始录音请求,则使用预设的录音调用框架,调用所述通讯应用平台的应用账号的录音功能进行录音并生成录音文件包括:
    若接收到所述开始录音请求,则向所述通讯应用平台发送预设的配置信息,其中,所述配置信息用于对开始录音请求进行权限验证;
    若接收到所述通讯应用平台发送的授权成功消息,则向所述通讯应用平台发送启用预设的音频接口的指令,使得所述用户使用所述通讯应用平台的应用账号的录音功能进行录音并生成录音文件。
  13. 如权利要求11或12所述的终端设备,其特征在于,其特征在于,所述对所述目标文本与所述初始语料进行文本相似度匹配,得到所述目标文本与所述初始语料之间的文本相似度包括:
    根据预设的正则表达式,确定所述初始语料中的标点符号;
    对所述标点符号进行删除处理,将得到的文本内容作为目标语料;
    使用动态规划算法,对所述目标文本和所述目标语料进行相似度计算,得到所述文本相似度。
  14. 如权利要求13所述的终端设备,其特征在于,所述使用动态规划算法,对所述目标文本和所述目标语料进行相似度计算,得到所述文本相似度包括:
    将所述目标文本中的字符依序存入一维数组X a中,将所述目标语料中的字符依序存入一维数组Y b中,其中,a为所述目标文本的字符个数,b为所述目标语料的字符个数;
    使用如下公式计算X a的第i位与Y b的第j位之前的公共子序列长度L(i,j):
    Figure PCTCN2018094364-appb-100005
    其中,若same(X i,Y j)在X i与Y j相同时,取值为1,否则,same(X i,Y j)在X i与Y j不相同时,取值为0,max{L(i-1,j-1)+same(X i,Y j),L(i-1,j),L(i,j-1)}为取L(i-1,j-1)+same(X i,Y j)、L(i-1,j)和L(i,j-1)三个表达式中的最大值,i和j均为正整数,且i≤a,j≤b;
    对所述公共子序列长度进行递归回溯,得到所述公共子序列长度的集合,并从所述集合中获取值最大的公共子序列长度l,作为目标长度,其中,l为小于等于b的正整数;
    使用如下公式计算所述文本相似度:
    Figure PCTCN2018094364-appb-100006
    其中,θ为所述文本相似度,θ∈[0,1]。
  15. 如权利要求11所述的终端设备,其特征在于,在所述对所述目标文本与所述初始语料进行文本相似度匹配,得到所述目标文本与所述初始语料之间的文本相似度之后,所述处理器执行所述计算机可读指令时还包括实现如下步骤:
    若所述文本相似度小于预设相似度阈值,则在所述通讯应用平台的应用账号的界面上显示语音采集失败,并返回从所述基础语料对应的基础语料库中随机选取一条语料,作为初始语料,并显示在所述通讯应用平台的应用账号的界面上的步骤继续执行。
  16. 一个或多个存储有计算机可读指令的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:若接收到用户通过通讯应用平台的应用账号发送的语料获取请求,则获取所述语料获取请求中包含的基础语料和所述用户的身份标识信息;
    从所述基础语料对应的基础语料库中随机选取一条语料,作为初始语料,并显示在所述通讯应用平台的应用账号的界面上;
    若接收到所述用户通过所述通讯应用平台的应用账号发送的开始录音请求,则使用预设的录音调用框架,调用所述通讯应用平台的应用账号的录音功能进行录音并生成录音文件;
    若接收到所述用户通过所述通讯应用平台的应用账号发送的终止录音请求,则使用预设的语音识别调用框架,调用所述通讯应用平台的应用账号的离线语音识别功能将所述录音文件转化为目标文本;
    对所述目标文本与所述初始语料进行文本相似度匹配,得到所述目标文本与所述初始语料之间的文本相似度;
    若所述文本相似度大于或等于预设相似度阈值,则建立所述录音文件与所述身份标识信息之间的映射关系,并对应保存所述录音文件、所述身份标识信息和所述映射关系。
  17. 如权利要求16所述的非易失性可读存储介质,其特征在于,所述若接收到所述用户通过所述通讯应用平台的应用账号发送的开始录音请求,则使用预设的录音调用框架,调用所述通讯应用平台的应用账号的录音功能进行录音并生成录音文件包括:
    若接收到所述开始录音请求,则向所述通讯应用平台发送预设的配置信息,其中,所述配置信息用于对开始录音请求进行权限验证;
    若接收到所述通讯应用平台发送的授权成功消息,则向所述通讯应用平台发送启用预设的音频接口的指令,使得所述用户使用所述通讯应用平台的应用账号的录音功能进行录音并生成录音文件。
  18. 如权利要求16或17所述的非易失性可读存储介质,其特征在于,所述对所述目标文本与所述初始语料进行文本相似度匹配,得到所述目标文本与所述初始语料之间的文本相似度包括:
    根据预设的正则表达式,确定所述初始语料中的标点符号;
    对所述标点符号进行删除处理,将得到的文本内容作为目标语料;
    使用动态规划算法,对所述目标文本和所述目标语料进行相似度计算,得到所述文本相似度。
  19. 如权利要求18所述的非易失性可读存储介质,其特征在于,所述使用动态规划算法,对所述目标文本和所述目标语料进行相似度计算,得到所述文本相似度包括:
    将所述目标文本中的字符依序存入一维数组X a中,将所述目标语料中的字符依序存入一维数组Y b中,其中,a为所述目标文本的字符个数,b为所述目标语料的字符个数;
    使用如下公式计算X a的第i位与Y b的第j位之前的公共子序列长度L(i,j):
    Figure PCTCN2018094364-appb-100007
    其中,若same(X i,Y j)在X i与Y j相同时,取值为1,否则,same(X i,Y j)在X i与Y j不相同时,取值为0,max{L(i-1,j-1)+same(X i,Y j),L(i-1,j),L(i,j-1)}为取L(i-1,j-1)+same(X i,Y j)、L(i-1,j)和L(i,j-1)三个表达式中的最大值,i和j均为正整数,且i≤a,j≤b;
    对所述公共子序列长度进行递归回溯,得到所述公共子序列长度的集合,并从所述集合中获取值最大的公共子序列长度l,作为目标长度,其中,l为小于等于b的正整数;
    使用如下公式计算所述文本相似度:
    Figure PCTCN2018094364-appb-100008
    其中,θ为所述文本相似度,θ∈[0,1]。
  20. 如权利要求16所述的非易失性可读存储介质,其特征在于,在所述对所述目标文本与所述初始语料进行文本相似度匹配,得到所述目标文本与所述初始语料之间的文本相似度之后,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:
    若所述文本相似度小于预设相似度阈值,则在所述通讯应用平台的应用账号的界面上显示语音采集失败,并返回从所述基础语料对应的基础语料库中随机选取一条语料,作为初始语料,并显示在所述通讯应用平台的应用账号的界面上的步骤继续执行。
PCT/CN2018/094364 2018-05-31 2018-07-03 语音采集方法、装置、计算机设备及存储介质 WO2019227578A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810550137.2A CN108831476A (zh) 2018-05-31 2018-05-31 语音采集方法、装置、计算机设备及存储介质
CN201810550137.2 2018-05-31

Publications (1)

Publication Number Publication Date
WO2019227578A1 true WO2019227578A1 (zh) 2019-12-05

Family

ID=64145434

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/094364 WO2019227578A1 (zh) 2018-05-31 2018-07-03 语音采集方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN108831476A (zh)
WO (1) WO2019227578A1 (zh)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109471931A (zh) * 2018-11-22 2019-03-15 平安科技(深圳)有限公司 语料收集方法、装置、计算机设备及存储介质
CN109801628B (zh) * 2019-02-11 2020-02-21 龙马智芯(珠海横琴)科技有限公司 一种语料收集方法、装置及系统
CN109858702B (zh) * 2019-02-14 2021-02-19 中国联合网络通信集团有限公司 客户升级投诉的预测方法、装置、设备及可读存储介质
CN109785856A (zh) * 2019-03-01 2019-05-21 深圳市伟文无线通讯技术有限公司 一种多通道远近场语料采集方法及装置
CN112185351B (zh) * 2019-07-05 2024-05-24 北京猎户星空科技有限公司 语音信号处理方法、装置、电子设备及存储介质
CN110858819A (zh) * 2019-08-16 2020-03-03 杭州智芯科微电子科技有限公司 基于微信小程序的语料收集方法、装置和计算机设备
CN113112997A (zh) * 2019-12-25 2021-07-13 华为技术有限公司 数据采集的方法及装置
CN111161732B (zh) * 2019-12-30 2022-12-13 秒针信息技术有限公司 语音采集方法、装置、电子设备及存储介质
CN111261167B (zh) * 2020-01-16 2023-05-30 广州荔支网络技术有限公司 一种音频热点内容自动化标签生成方法
CN111554307A (zh) * 2020-05-20 2020-08-18 浩云科技股份有限公司 一种声纹采集注册方法及装置
CN111768789B (zh) * 2020-08-03 2024-02-23 上海依图信息技术有限公司 电子设备及其语音发出者身份确定方法、装置和介质
CN112613506A (zh) * 2020-12-23 2021-04-06 金蝶软件(中国)有限公司 图像中的文本识别方法、装置、计算机设备和存储介质
CN112954695A (zh) * 2021-01-26 2021-06-11 国光电器股份有限公司 一种音箱配网的方法、装置、计算机设备和存储介质
CN112911003B (zh) * 2021-02-03 2022-06-07 广州市高奈特网络科技有限公司 电子数据提取方法、计算机设备和存储介质
CN113393848A (zh) * 2021-06-11 2021-09-14 上海明略人工智能(集团)有限公司 用于训练说话人识别模型的方法、装置、电子设备和可读存储介质
CN113808616A (zh) * 2021-09-16 2021-12-17 平安银行股份有限公司 语音合规检测方法、装置、设备及存储介质
CN114171065A (zh) * 2021-11-29 2022-03-11 重庆长安汽车股份有限公司 音频采集和对比方法、系统及车辆
CN115757862A (zh) * 2023-01-09 2023-03-07 百融至信(北京)科技有限公司 批量录音匹配话术文本的方法和装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010039492A1 (en) * 2000-05-02 2001-11-08 International Business Machines Corporation Method, system, and apparatus for speech recognition
CN104281275A (zh) * 2014-09-17 2015-01-14 北京搜狗科技发展有限公司 一种英文的输入方法和装置
US20160093297A1 (en) * 2014-09-26 2016-03-31 Michael E. Deisher Method and apparatus for efficient, low power finite state transducer decoding
CN106506524A (zh) * 2016-11-30 2017-03-15 百度在线网络技术(北京)有限公司 用于验证用户的方法和装置
CN107273359A (zh) * 2017-06-20 2017-10-20 北京四海心通科技有限公司 一种文本相似度确定方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102637180B (zh) * 2011-02-14 2014-06-18 汉王科技股份有限公司 基于正则表达式的文字后处理方法和装置
US9548052B2 (en) * 2013-12-17 2017-01-17 Google Inc. Ebook interaction using speech recognition
US20170318013A1 (en) * 2016-04-29 2017-11-02 Yen4Ken, Inc. Method and system for voice-based user authentication and content evaluation
CN107395352B (zh) * 2016-05-16 2019-05-07 腾讯科技(深圳)有限公司 基于声纹的身份识别方法及装置
CN106961418A (zh) * 2017-02-08 2017-07-18 北京捷通华声科技股份有限公司 身份认证方法和身份认证系统
CN106971009B (zh) * 2017-05-11 2020-05-22 网易(杭州)网络有限公司 语音数据库生成方法及装置、存储介质、电子设备
CN107403623A (zh) * 2017-07-31 2017-11-28 努比亚技术有限公司 录音内容的保存方法、终端、云服务器及可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010039492A1 (en) * 2000-05-02 2001-11-08 International Business Machines Corporation Method, system, and apparatus for speech recognition
CN104281275A (zh) * 2014-09-17 2015-01-14 北京搜狗科技发展有限公司 一种英文的输入方法和装置
US20160093297A1 (en) * 2014-09-26 2016-03-31 Michael E. Deisher Method and apparatus for efficient, low power finite state transducer decoding
CN106506524A (zh) * 2016-11-30 2017-03-15 百度在线网络技术(北京)有限公司 用于验证用户的方法和装置
CN107273359A (zh) * 2017-06-20 2017-10-20 北京四海心通科技有限公司 一种文本相似度确定方法

Also Published As

Publication number Publication date
CN108831476A (zh) 2018-11-16

Similar Documents

Publication Publication Date Title
WO2019227578A1 (zh) 语音采集方法、装置、计算机设备及存储介质
WO2021135910A1 (zh) 基于机器阅读理解的信息抽取方法、及其相关设备
US10839790B2 (en) Sequence-to-sequence convolutional architecture
CN109522557B (zh) 文本关系抽取模型的训练方法、装置及可读存储介质
US11886998B2 (en) Attention-based decoder-only sequence transduction neural networks
US11308189B2 (en) Remote usage of locally stored biometric authentication data
US11055527B2 (en) System and method for information extraction with character level features
CN110457431B (zh) 基于知识图谱的问答方法、装置、计算机设备和存储介质
US11972201B2 (en) Facilitating auto-completion of electronic forms with hierarchical entity data models
CN111241304A (zh) 基于深度学习的答案生成方法、电子装置及可读存储介质
CN111026319B (zh) 一种智能文本处理方法、装置、电子设备及存储介质
WO2019196302A1 (zh) 基于声纹识别的身份验证方法、服务器及存储介质
US20210110015A1 (en) Biometric Challenge-Response Authentication
CN109766072B (zh) 信息校验输入方法、装置、计算机设备和存储介质
BRPI0807415A2 (pt) Controlar o acesso a sistemas de computador e anotar arquivos de mídia.
US20240004703A1 (en) Method, apparatus, and system for multi-modal multi-task processing
CN112016274B (zh) 医学文本结构化方法、装置、计算机设备及存储介质
WO2022078168A1 (zh) 基于人工智能的身份验证方法、装置、计算机设备和存储介质
WO2019201024A1 (zh) 用于更新模型参数的方法、装置、设备和存储介质
WO2021159669A1 (zh) 系统安全登录方法、装置、计算机设备和存储介质
CN111444905B (zh) 基于人工智能的图像识别方法和相关装置
CN112035611A (zh) 目标用户推荐方法、装置、计算机设备和存储介质
WO2020233381A1 (zh) 基于语音识别的服务请求方法、装置及计算机设备
WO2023029513A1 (zh) 基于人工智能的搜索意图识别方法、装置、设备及介质
CN113961768A (zh) 敏感词检测方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18920388

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.03.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18920388

Country of ref document: EP

Kind code of ref document: A1