CN112837690A

CN112837690A - Audio data generation method, audio data transcription method and device

Info

Publication number: CN112837690A
Application number: CN202011622002.6A
Authority: CN
Inventors: 许凌; 李明; 陶飞
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-05-25
Anticipated expiration: 2040-12-30
Also published as: CN112837690B

Abstract

The application discloses an audio data generation method, an audio data transfer method and a device thereof, wherein the generation method comprises the following steps: the method comprises the steps of firstly obtaining audio data to be processed and identity information of the audio data to be processed, and then generating target audio data according to the audio data to be processed and the identity information of the audio data to be processed, so that the target audio data carries the audio data to be processed and the identity information of the audio data to be processed. The target audio data carries the identity information, so that the subsequent transcription equipment can determine that the target audio data is legal audio data according to the identity information carried by the target audio data, legality screening of the audio data can be realized in the transcription equipment, the transcription equipment only needs to transcribe the legal audio data without transcribing illegal audio data, the transcription equipment can transcribe the legal audio data in time, and the transcription instantaneity of the transcription equipment on the legal audio data can be improved.

Description

Audio data generation method, audio data transcription method and device

Technical Field

The present application relates to the field of computer technologies, and in particular, to an audio data generation method, an audio data transcription method, and an apparatus thereof.

Background

Currently, a recording device (e.g., a recording pen) may be used to record audio data in a recording application scenario (e.g., a conference, interview, course training, etc.), so that the audio data is subsequently converted into text by a transcription device, so that a user can know the audio information recorded in the audio data through the text. The audio transcription is to convert audio information recorded in the audio into characters.

However, since the audio data waiting for transcription in the transcription apparatus is generally large, the load of the transcription apparatus having a limited transcription capability is too heavy, and the transcription instantaneity of the transcription apparatus is low.

Disclosure of Invention

The embodiment of the application mainly aims to provide an audio data generation method, an audio data transcription method and an audio data transcription device, which can realize legality screening of audio data, so that the transcription instantaneity of transcription equipment can be improved.

The embodiment of the application provides an audio data generation method, which comprises the following steps:

acquiring audio data to be processed and identity information of the audio data to be processed;

and generating target audio data according to the audio data to be processed and the identity information of the audio data to be processed, so that the target audio data carries the audio data to be processed and the identity information of the audio data to be processed.

Optionally, when the audio data to be processed includes N pieces of first audio data, and the identity information of the audio data to be processed includes the identity information of the N pieces of first audio data, generating the target audio data according to the audio data to be processed and the identity information of the audio data to be processed includes:

generating ith second audio data according to the ith first audio data and the identity information of the ith first audio data, so that the ith second audio data carries the ith first audio data and the identity information of the ith first audio data; wherein i is a positive integer, i is not more than N, and N is a positive integer;

and obtaining target audio data according to the 1 st second audio data to the Nth second audio data.

Optionally, the generating the ith second audio data according to the ith first audio data and the identity information of the ith first audio data includes:

and adding the identity information of the ith first audio data to a preset position of the ith first audio data to obtain the ith second audio data.

Optionally, if the number of the audio data to be processed is T, the method further includes:

generating audio basic identity information corresponding to the T audio data to be processed according to user identity identifications corresponding to the T audio data to be processed and product serial numbers corresponding to the T audio data to be processed;

the acquiring identity information of the audio data to be processed includes:

determining identity information of the 1 st audio data to be processed according to the audio basic identity information corresponding to the T audio data to be processed;

generating identity information of t +1 th audio data to be processed according to the tth audio data to be processed and the identity information of the tth audio data to be processed; the recording time corresponding to the tth audio data to be processed is earlier than the recording time corresponding to the t +1 th audio data to be processed; t is a positive integer, T is less than or equal to T-1, and T is a positive integer.

Optionally, the generating the identity information of the t +1 th to-be-processed audio data according to the identity information of the t th to-be-processed audio data and the identity information of the t th to-be-processed audio data includes:

generating a first updating rule according to the t-th audio data to be processed;

and updating the identity information of the tth audio data to be processed according to the first updating rule to obtain the identity information of the t +1 th audio data to be processed.

Optionally, when the t-th audio data to be processed includes N_tThe identity information of the tth audio data to be processed comprises N_tWhen the first update rule includes a first sorting target, the updating the identity information of the tth audio data to be processed according to the first update rule to obtain the identity information of the t +1 th audio data to be processed includes:

the N is_tThe identity information of the first audio data is sorted and adjusted according to the first sorting target to obtain the identity information of the t +1 th audio data to be processed; wherein N is_tIs a positive integer.

Optionally, the acquiring the audio data to be processed includes:

acquiring original audio data;

and encrypting the original audio data to obtain the audio data to be processed.

The embodiment of the application also provides an audio data transcription method, which comprises the following steps:

acquiring audio data to be transcribed; the audio data to be transcribed is target audio data generated by any implementation mode of the audio data generation method provided by the embodiment of the application;

extracting actual identity information corresponding to the audio data to be transcribed from the audio data to be transcribed;

determining whether the audio data to be transcribed is legal audio data or not according to the actual identity information corresponding to the audio data to be transcribed;

and when the audio data to be transcribed is determined to be legal audio data, performing transcription processing on the audio data to be transcribed to obtain characters corresponding to the audio data to be transcribed.

In a possible implementation manner, if the audio data to be transcribed includes N second audio data, the extracting, from the audio data to be transcribed, actual identity information corresponding to the audio data to be transcribed includes:

extracting actual identity information corresponding to the kth second audio data from the kth second audio data; wherein k is a positive integer, k is less than or equal to N, and N is a positive integer;

and generating actual identity information corresponding to the audio data to be transcribed according to the actual identity information corresponding to the 1 st second audio data to the actual identity information corresponding to the Nth second audio data.

In one possible embodiment, the method further comprises:

acquiring theoretical identity information corresponding to the audio data to be transcribed;

the determining whether the audio data to be transcribed is legal audio data according to the actual identity information corresponding to the audio data to be transcribed comprises:

matching the actual identity information corresponding to the audio data to be transcribed with the theoretical identity information corresponding to the audio data to be transcribed to obtain an identity matching result corresponding to the audio data to be transcribed;

and determining whether the audio data to be transcribed is legal audio data or not according to the identity matching result corresponding to the audio data to be transcribed.

In a possible implementation manner, if the number of the audio data to be transcribed is M, the obtaining theoretical identity information corresponding to the audio data to be transcribed includes:

generating theoretical identity information corresponding to the (m + 1) th audio data to be transcribed according to the mth audio data to be transcribed and the theoretical identity information corresponding to the mth audio data to be transcribed; wherein M is a positive integer, M is not more than M-1, and M is a positive integer; the recording time corresponding to the mth audio data to be transcribed is earlier than the recording time corresponding to the (m + 1) th audio data to be transcribed; the theoretical identity information corresponding to the 1 st audio data to be transcribed is determined according to the audio basic identity information corresponding to the M audio data to be transcribed.

In a possible implementation manner, the generating theoretical identity information corresponding to the m +1 th audio data to be transcribed according to the mth audio data to be transcribed and the theoretical identity information corresponding to the mth audio data to be transcribed includes:

generating a second updating rule according to the mth audio data to be transcribed;

and updating the theoretical identity information corresponding to the mth audio data to be transcribed according to the second updating rule to obtain the theoretical identity information corresponding to the (m + 1) th audio data to be transcribed.

In one possible implementation, when the mth audio data to be transcribed comprises N_mThe m-th audio data to be transcribed corresponds to theoretical identity information comprising N_mWhen the second update rule includes a second ordering target, the theoretical identity information corresponding to the mth audio data to be transcribed is updated according to the second update rule to obtain the theoretical identity information corresponding to the m +1 th audio data to be transcribed, including:

the N is_mAnd sequencing and adjusting the theoretical identity information corresponding to the second audio data according to the second sequencing target to obtain the theoretical identity information corresponding to the (m + 1) th audio data to be transcribed.

In a possible implementation manner, if the number of the audio data to be transcribed is M, matching the actual identity information corresponding to the audio data to be transcribed with the theoretical identity information corresponding to the audio data to be transcribed to obtain an identity matching result corresponding to the audio data to be transcribed, includes:

matching actual identity information corresponding to the r-th audio data to be transcribed with theoretical identity information corresponding to the r-th audio data to be transcribed to obtain an identity matching result corresponding to the r-th audio data to be transcribed; wherein r is a positive integer, r is less than or equal to M, and M is a positive integer;

determining whether the audio data to be transcribed is legal audio data according to the identity matching result corresponding to the audio data to be transcribed comprises the following steps:

if the identity matching results corresponding to the M audio data to be transcribed all represent that the matching is successful, determining that the M audio data to be transcribed are legal audio data;

and if at least one identity matching result corresponding to the M audio data to be transcribed indicates that the matching fails, determining that the M audio data to be transcribed is illegal audio data.

In a possible implementation manner, the performing the transcription processing on the audio data to be transcribed to obtain the text corresponding to the audio data to be transcribed includes:

extracting audio data to be decrypted corresponding to the audio data to be transcribed from the audio data to be transcribed;

decrypting the audio data to be decrypted corresponding to the audio data to be transcribed to obtain decrypted audio data corresponding to the audio data to be transcribed;

and decrypting the audio data corresponding to the audio data to be transcribed to obtain characters corresponding to the audio data to be transcribed.

An embodiment of the present application further provides an audio data generating apparatus, where the apparatus includes:

the device comprises a first acquisition unit, a second acquisition unit and a processing unit, wherein the first acquisition unit is used for acquiring audio data to be processed and identity information of the audio data to be processed;

and the data generation unit is used for generating target audio data according to the audio data to be processed and the identity information of the audio data to be processed, so that the target audio data carries the audio data to be processed and the identity information of the audio data to be processed.

An embodiment of the present application further provides an audio data transcription apparatus, the apparatus includes:

a second acquisition unit configured to acquire audio data to be transcribed; the audio data to be transcribed is target audio data generated by any implementation mode of the audio data generation method provided by the embodiment of the application;

the information extraction unit is used for extracting the actual identity information corresponding to the audio data to be transcribed from the audio data to be transcribed;

the legality determining unit is used for determining whether the audio data to be transcribed is legal audio data according to the actual identity information corresponding to the audio data to be transcribed;

and the audio transfer unit is used for performing transfer processing on the audio data to be transferred to obtain characters corresponding to the audio data to be transferred when the audio data to be transferred is determined to be legal audio data.

Based on the technical scheme, the method has the following beneficial effects:

in the audio data generation method provided by the application, the audio data to be processed and the identity information of the audio data to be processed are obtained first, and then the target audio data is generated according to the audio data to be processed and the identity information of the audio data to be processed, so that the target audio data carries the audio data to be processed and the identity information of the audio data to be processed. The target audio data carries identity information, so that the identity information can represent that the audio information carried by the target audio data is legal, subsequent transcription equipment can determine that the target audio data is legal audio data according to the identity information carried by the target audio data, and the legality screening of the audio data can be realized in the transcription equipment.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of an audio data generation method applied to a terminal device according to an embodiment of the present application;

fig. 2 is a schematic application scenario diagram of an audio data generation method applied to a server according to an embodiment of the present application;

fig. 3 is a flowchart of an audio data generating method according to an embodiment of the present application;

fig. 4 is a schematic diagram of audio to be stored and audio data to be processed according to an embodiment of the present application;

fig. 5 is a schematic diagram illustrating generation of identity information of T pieces of audio data to be processed according to an embodiment of the present application;

fig. 6 is a schematic diagram of a preset position of first audio data according to an embodiment of the present disclosure;

FIG. 7 is a diagram illustrating second audio data provided by an embodiment of the present application;

FIG. 8 is a schematic diagram of target audio data provided by an embodiment of the present application;

fig. 9 is a flowchart of an audio data transcription method provided in an embodiment of the present application;

fig. 10 is a schematic diagram illustrating generation of actual identity information corresponding to audio data to be transcribed according to an embodiment of the present application;

fig. 11 is a schematic view of an application scenario provided in an embodiment of the present application;

fig. 12 is a flowchart illustrating the generation and storage of audio data in a recording pen according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of an audio data generating apparatus according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of an audio data transcription apparatus according to an embodiment of the present application.

Detailed Description

In order to facilitate understanding of the technical solutions provided in the embodiments of the present application, some technical terms are described below.

The recording device refers to a terminal device with an audio recording function. In addition, the embodiment of the present application is not limited to the sound recording device, and for example, the sound recording device may be a smart phone, a computer, a Personal Digital Assistant (PDA), a tablet computer, a sound recording pen, or the like.

The transcription apparatus refers to an apparatus having a function of transcribing audio into text. In addition, the embodiment of the present application is not limited to the transfer device, and for example, the transfer device may be a server or a terminal device. The terminal device may be a smart phone, a computer, a Personal Digital Assistant (PDA), a tablet pc, a voice pen, or the like.

The audio sampling points refer to a frame of audio directly recorded by a recording device in the recording process.

The audio information refers to sound information recorded in audio. For example, the audio information of an audio sampling point may refer to sound information recorded in the audio sampling point.

The legal audio data refers to audio data carrying audio information recorded by legal recording equipment; and the legal recording device has the transcription right granted by the transcription device.

The illegal audio data refers to audio data carrying audio information recorded by an illegal recording device; and the illegal recording device does not have the dubbing right granted by the dubbing device.

In the research on audio transcription, the inventor finds that, for a transcription device (such as a transcription server or a transcription terminal), after the transcription device receives audio data, the transcription device does not distinguish whether the audio data is legal audio data, but directly transcribes the audio data, so that the transcription device not only needs to transcribe the legal audio data, but also needs to transcribe illegal audio data, and thus, a large amount of time is wasted for the transcription device to transcribe the illegal audio data, and the transcription device with limited transcription capability cannot timely transcribe the legal audio data, so that the transcription device has low real-time performance on the legal audio data.

In order to solve the above technical problem, an embodiment of the present application provides an audio data generating method, including: the method comprises the steps of firstly obtaining audio data to be processed and identity information of the audio data to be processed, and then generating target audio data according to the audio data to be processed and the identity information of the audio data to be processed, so that the target audio data carries the audio data to be processed and the identity information of the audio data to be processed.

Therefore, the identity information can represent that the audio information carried by the target audio data is legal because the target audio data carries the identity information, so that the subsequent transcription equipment can determine that the target audio data is legal according to the identity information carried by the target audio data, and the legality screening of the audio data can be realized in the transcription equipment, so that the transcription equipment can transcribe the legal audio data only without the illegal audio data, the time consumed by the transcription equipment for transcribing the illegal audio data can be saved, the transcription equipment can transcribe the legal audio data in time, the transcription instantaneity of the transcription equipment can be improved, and particularly the transcription instantaneity of the transcription equipment for the legal audio data can be improved.

In addition, the embodiment of the present application does not limit the execution subject of the audio data generation method, and for example, the audio data generation method provided by the embodiment of the present application may be applied to a data processing device such as a terminal device or a server. The terminal device may be a smart phone, a computer, a Personal Digital Assistant (PDA), a tablet pc, a voice pen, or the like. The server may be a stand-alone server, a cluster server, or a cloud server.

In order to facilitate understanding of the technical solutions provided in the embodiments of the present application, an application scenario of the audio data generation method provided in the embodiments of the present application is exemplarily described below with reference to fig. 1 and fig. 2, respectively. Fig. 1 is an application scene schematic diagram of an audio data generation method applied to a terminal device according to an embodiment of the present application; fig. 2 is a schematic application scenario diagram of an audio data generation method applied to a server according to an embodiment of the present application.

In the application scenario shown in fig. 1, when a user 101 triggers an audio recording request on a terminal device 102, the terminal device 102 receives the audio recording request, records an audio to be stored, obtains audio data to be processed and identity information of the audio data to be processed according to the audio to be stored, generates target audio data according to the audio data to be processed and the identity information of the audio data to be processed, so that the target audio data carries the audio data to be processed and the identity information of the audio data to be processed, and finally stores and displays the target audio data as audio storage data corresponding to the audio to be stored.

In the application scenario shown in fig. 2, when a user 201 triggers an audio recording request on a terminal device 202, the terminal device 202 receives the audio recording request, records an audio to be stored, and sends the audio to be stored to a server 203, so that the server 203 may obtain audio data to be processed and identity information of the audio data to be processed according to the audio to be stored, and then generate target audio data according to the audio data to be processed and the identity information of the audio data to be processed, so that the target audio data carries the audio data to be processed and the identity information of the audio data to be processed, and finally, the target audio data may be sent to the terminal device 202 as storage data corresponding to the audio to be stored, so that the terminal device 202 stores and displays the storage data corresponding to the audio to be stored.

It should be noted that the audio data generation method provided in the embodiment of the present application can be applied to not only the application scenarios shown in fig. 1 or fig. 2, but also other application scenarios that need to generate audio data, and the embodiment of the present application is not particularly limited to this.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Method embodiment one

Referring to fig. 3, the figure is a flowchart of an audio data generation method provided in an embodiment of the present application.

The audio data generation method provided by the embodiment of the application comprises the following steps of S301-S303:

s301: and acquiring audio data to be processed.

The audio data to be processed carries audio information recorded by the recording device.

In some cases, the audio data to be processed may include at least one first audio data, and the first audio data may include at least one audio sample data. It should be noted that the embodiment of the present application is not limited to audio sample data, for example, one audio sample data may refer to one audio sample point, and may also refer to one audio sample point after being encrypted. In addition, the number of audio sample data in the first audio data is not limited in the embodiments of the present application, for example, the first audio data may include 128 audio sample data. Further, the above-described "encryption processing" is also "encryption processing" in S3012 below.

In some cases, the audio data to be processed may be generated from audio to be stored that is recorded by the recording device. The audio to be stored refers to the audio directly recorded by the recording device after the user triggers the recording request on the recording device.

In addition, the embodiment of the present application does not limit the generation manner of the audio data to be processed (i.e., the embodiment of S301), and for ease of understanding, the following description is made with reference to two possible embodiments.

In a first possible implementation manner, S301 may specifically be: at least one piece of audio data to be processed is extracted from the audio to be stored.

It should be noted that, the embodiment of the present application does not limit the extraction process of the audio data to be processed, and for ease of understanding, the following description is made in conjunction with two cases.

In case 1, the extraction of the audio data to be processed may be performed in the recording process of the audio to be stored, and the extraction may specifically be: in the recording process of the audio to be stored, when a preset number of audio sampling points are collected, the audio data to be processed can be generated directly according to the preset number of audio sampling points.

The preset number is preset, and the preset number is not limited in the embodiments of the present application. For example, the preset number may be 1280.

As an example, when the audio to be stored includes 1280 × T audio sample points; the acquisition time of the w-th audio sampling point is earlier than that of the w + 1-th audio sampling point, w is a positive integer, and w +1 is less than or equal to 1280 multiplied by T; when the preset number is 1280, S301 may specifically be: in the process of recording the audio to be stored, when the 1280 th audio sampling point is recorded, generating 1 st audio data to be processed according to the 1 st audio sampling point to the 1280 th audio sampling point; when 2560 audio sampling points are recorded, generating 2 nd audio data to be processed according to the 1281 st audio sampling point to the 2560 th audio sampling point; … … (and so on); and when 1280 × T audio sampling points are recorded, generating the T-th audio data to be processed according to the 1280 × T-1 +1 audio sampling points to the 1280 × T audio sampling points.

It should be noted that, the embodiment of the present application does not limit the generation manner of the jth audio data to be processed, for example, the set from the 1280 × (j-1) +1 audio sampling point to the 1280 × j audio sampling point may be directly determined as the jth audio data to be processed. For another example, the 1280 × 1 (j-1) +1 audio sampling point to the 1280 × j audio sampling point may be equally divided into 10 groups to obtain 10 first audio data, so that each first audio data includes 128 audio sampling points adjacent to the recording time; then, the 10 sets of first audio data are determined as the jth audio data to be processed (e.g., the 1 st audio data to be processed to the T th audio data shown in fig. 4). Wherein j is a positive integer and is less than or equal to T.

Based on the related content in the above case 1, in some cases, the audio to be stored may be received and recorded while the audio to be stored is extracted, which is beneficial to achieving parallel processing of the receiving and recording process and the data processing process of the audio to be stored, and thus beneficial to improving the generation efficiency of the audio data.

In case 2, after the audio to be stored is recorded, the audio data to be processed may be extracted from the audio to be stored, and the extracting may specifically be: after the audio to be stored is obtained, the stored audio is segmented according to a first segmentation rule to obtain at least one piece of audio data to be processed.

The first division rule is a preset audio segmentation rule; furthermore, the embodiment of the present application does not limit the first division rule. To facilitate understanding of the first division rule, the following description is made in conjunction with an example.

As an example, when the audio to be stored includes 1280 × T audio sample points; the acquisition time of the w-th audio sampling point is earlier than that of the w + 1-th audio sampling point, w is a positive integer, and w +1 is less than or equal to 1280 multiplied by T; when each to-be-processed audio data includes 10 first audio data, and each first audio data includes 128 audio sampling points, S301 may specifically be: firstly, performing primary division on audio to be stored by taking 128 audio sampling points as a dividing unit to obtain 10 × T first audio data which are sequenced according to recording time (for example, sequenced from high to low or from low to high); the 10 × T pieces of first audio data are divided again by using the 10 pieces of first audio data as a dividing unit, so as to obtain T pieces of audio data to be processed (as shown in fig. 4).

Based on the related content in the above case 2, in some cases, the audio data to be processed may be extracted from the audio to be stored that has been recorded in an audio dividing manner, so that the recording process of the audio to be stored is separated from the processing process of the audio to be stored, so that data processing may be performed on the audio to be stored at any time after the audio to be stored is recorded, which is beneficial to improving the convenience of the data processing process of the audio to be stored.

Based on the related contents of the first possible implementation manner of the above S301, in some cases, the audio data to be processed may be directly extracted from the audio to be stored, so that the extracted audio data to be processed can accurately carry the audio information recorded in the audio to be stored.

In some cases, to improve the storage security of the audio to be stored, the audio information carried by the audio to be stored may be encrypted. Based on this, the present application provides a second possible implementation manner of S301, which specifically includes S3011-S3012:

s3011: raw audio data is acquired.

The original audio data refers to audio data directly extracted from audio to be stored.

It should be noted that the extraction process of the original audio data is similar to the extraction process in the first possible implementation of S301 provided above, and please refer to the first possible implementation of S301 above for related contents.

S3012: and encrypting the original audio data to obtain audio data to be processed.

According to the embodiment of the application, the original audio data can be encrypted by using the preset encryption algorithm, and the encrypted original audio data is determined as the audio data to be processed, so that the storage safety of the audio to be stored is improved.

In addition, in some cases, the original audio data may be encrypted by using a preset encryption algorithm, so that the data volume of the encrypted original audio data is far smaller than that of the original audio data, and after the encrypted original audio data is determined as the audio data to be processed, the data volume of the audio data to be processed is far smaller than that of the original audio data, which is beneficial to reducing the storage space requirement of the audio to be stored.

It should be noted that, the preset encryption algorithm is not limited in the embodiment of the present application, and the preset encryption algorithm may be set according to an application scenario. In one possible implementation, the preset encryption algorithm may be Sub-band Coding (sbc), Advanced Audio Coding (AAC), or the like.

As an example, when the original audio data includes 128 audio samples and the preset encryption algorithm is sbc encoding, the S3012 may specifically be: and encoding the 128 audio sampling points by using sbc codes to obtain 128 bytes of encoded data, and determining the 128 bytes of encoded data as the audio data to be processed corresponding to the original audio data.

Based on the relevant contents of the above S3011 to S3012, in some cases, at least one piece of original audio data may be extracted from the audio to be stored, so that the original audio data includes a plurality of audio sampling points; and then encrypting each original audio data according to a preset encryption algorithm, and determining each encrypted original audio data as each audio data to be processed respectively, so that each audio data to be processed can carry audio information recorded in the audio to be stored in a safe manner, thereby being beneficial to improving the storage safety of the audio to be stored.

Based on the related content of S301, to-be-processed audio data corresponding to the to-be-stored audio may be generated according to the to-be-stored audio recorded by the recording device, so that the identity information adding process and the data storing process of the to-be-stored audio may be subsequently implemented based on the to-be-processed audio data.

It should be noted that, the number of the audio data to be processed is not limited in the embodiments of the present application, for example, the number of the audio data to be processed is T; wherein T is a positive integer. In addition, if the audio data to be processed is generated according to the audio to be stored, the number of the audio data to be processed may be determined according to the total number of the audio sampling points in the audio to be stored.

S302: and acquiring the identity information of the audio data to be processed.

The identity information of the audio data to be processed is used for describing the identity of the audio information carried by the audio data to be processed (such as recording equipment information, recording equipment user information, transcription authorization related information and the like); moreover, the identity information of the audio data to be processed can be used for proving the legality of the audio information carried by the audio data to be processed.

In addition, the embodiment of the present application does not limit the process of acquiring the identity information of the audio data to be processed, and for facilitating understanding, a possible implementation manner of S302 is described as an example below.

In a possible implementation manner, when the number of the audio data to be processed is T, and the recording time corresponding to the tth audio data to be processed is earlier than the recording time corresponding to the T +1 th audio data to be processed; when T is a positive integer, T is not greater than T-1, and T is a positive integer, as shown in fig. 5, the process of generating the identity information of the T pieces of audio data to be processed may specifically be: generating identity information of the 1 st audio data to be processed according to the audio basic identity information corresponding to the T audio data to be processed; generating identity information of 2 nd audio data to be processed according to the 1 st audio data to be processed and the identity information of the 1 st audio data to be processed; generating identity information of the 3 rd audio data to be processed according to the 2 nd audio data to be processed and the identity information of the 2 nd audio data to be processed; … … (and so on); and generating the identity information of the Tth audio data to be processed according to the Tth audio data to be processed and the identity information of the Tth audio data to be processed.

The receiving and recording time corresponding to the tth audio data to be processed is used to describe the receiving and recording time of the audio information carried by the tth audio data to be processed (that is, the audio sampling point corresponding to the tth audio data to be processed). For example, as shown in fig. 4, if the tth audio data to be processed is generated according to the 1280 × (t-1) +1 audio sampling point to the 1280 × t audio sampling point in the audio to be stored, the listing time corresponding to the tth audio data to be processed may be used to describe the 1280 × (t-1) × (1280) in the audio to be stored

(t-1) +1 audio sample point to the acquisition time of the 1280 x t audio sample point.

The audio basic identity information corresponding to the T audio data to be processed may be generated in advance or set in advance according to an application scenario.

In some cases, the audio basic identity information corresponding to the T pieces of audio data to be processed may be used to represent a correspondence (i.e., a binding relationship) between the recording apparatus and the user thereof, so that the subsequent transcription apparatus can determine the audio basic identity information according to the correspondence between the recording apparatus and the user thereof. Based on this, an embodiment of the present application further provides an implementation process for obtaining audio basic identity information corresponding to T pieces of audio data to be processed, where the implementation process specifically may be: and generating audio basic identity information corresponding to the T audio data to be processed according to the user identity identifications corresponding to the T audio data to be processed and the product serial numbers corresponding to the T audio data to be processed.

The user identity identifiers corresponding to the T pieces of audio data to be processed are used for uniquely identifying the identity of a user who uses the recording equipment to record the audio information carried by the T pieces of audio data to be processed. In addition, the user id is not limited in the embodiment of the application, for example, the user id may be an equipment login account corresponding to the recording equipment, user id information (e.g., an id card number, an id card copy, etc.), user body identification information (e.g., a face, a voiceprint, a fingerprint, etc.), and personalized identity information set by the user (e.g., a question and answer conversation, etc.).

The product serial numbers corresponding to the T pieces of audio data to be processed refer to device identifiers of the recording devices used for receiving and recording the audio information carried by the T pieces of audio data to be processed. For example, the product serial number corresponding to the T audio data to be processed may be a1234Y 20201207.

In addition, the embodiment of the present application does not limit the generation process of the audio basic identity information corresponding to the T pieces of audio data to be processed, for example, the generation process may specifically be: and fusing the user identity identifications corresponding to the T audio data to be processed and the product serial numbers corresponding to the T audio data to be processed by using a preset fusion algorithm to obtain a fusion character string with a preset length, and determining the fusion character string as audio basic identity information corresponding to the T audio data to be processed. The preset length may be set according to an application scenario, for example, the preset length is 10.

It should be noted that, the preset fusion algorithm is not limited in the embodiment of the present application, and any method capable of fusing the user identifiers corresponding to the T pieces of audio data to be processed and the product serial numbers corresponding to the T pieces of audio data to be processed into a fusion character string with a preset length may be used for implementation.

Based on the relevant content of the audio basic identity information, in the embodiment of the present application, the user identity and the product serial number thereof corresponding to the T pieces of audio data to be processed may be fused to obtain the audio basic identity information corresponding to the T pieces of audio data to be processed, so that the audio basic identity information may indicate a corresponding relationship between the recording device and a recording device user corresponding to the T pieces of audio data to be processed, and a subsequent transcription device may determine whether the T pieces of audio data to be processed belong to legal audio data based on the corresponding relationship. It should be noted that, the embodiment of the present application does not limit the execution subject of the generation process of the audio basic identity information corresponding to the T pieces of audio data to be processed, and for example, the execution subject may be generated by a transcription device.

In addition, the embodiment of the present application does not limit the generation process of the identity information of the 1 st audio data to be processed. For example, the audio basic identity information corresponding to the T pieces of audio data to be processed may be directly determined as the identity information of the 1 st piece of audio data to be processed.

Furthermore, as can be seen from fig. 5 and the related contents thereof, for the identity information of the T +1 th to-be-processed audio data (i.e., the non-first to-be-processed audio data of the T to-be-processed audio data), the determination may be made according to the T-th to-be-processed audio data and the identity information of the T-th to-be-processed audio data. Wherein T is a positive integer, T is less than or equal to T-1, and T is a positive integer.

In order to facilitate understanding of the generation process of the identity information of the t +1 th audio data to be processed, a possible implementation manner is described as an example.

In a possible implementation, the generation process of the identity information of the t +1 th audio data to be processed includes the following steps 11 to 12:

step 11: and generating a first updating rule according to the t-th audio data to be processed.

The first updating rule is used for describing an adjusting rule according to which identity information of the tth audio data to be processed needs to be adjusted.

The embodiment of the present application does not limit the first update rule, for example, when the tth audio data to be processed includes N_tThe identity information of the tth audio data to be processed comprises N_tThe first update rule may include a first sorting target when the identity information of the first audio data is obtained, and the first sorting target is used to describe a target sorting sequence number (i.e., a sorting sequence number occupied after the identity information of each first audio data in the tth audio data to be processed is adjusted by the first update rule) of the identity information of each first audio data in the tth audio data to be processed. Wherein N is_tIs a positive integer.

That is, the first sorting target may be [ Q ]₁，Q₂，……，Q_N]. Wherein Q is_hThe method comprises the steps that a sequence number occupied by identity information of the h first audio data in the t audio data to be processed after the identity information is adjusted by a first updating rule is represented; h is a positive integer, and h is less than or equal to N. For example, when the 10 identity information is arranged in the order "abc 123456 f", the first ordering target is [5,6,8,9,0,1,2,3,4,7 ]]Then, the 10 pieces of identity information after the sorting adjustment according to the first sorting target are "346 fabc 125".

In addition, the embodiment of the present application does not limit the generation process of the first update rule, and in a possible implementation, when the t-th audio data to be processed includes N_tWhen the first audio data and the first update rule include the first sorting target, the N may be first sorted by using a predetermined hash algorithm_tCarrying out Hash operation on the first audio data to obtain the N_tN corresponding to the first audio data_tA hash value such that N_tThe hash value is from 1 to N_tPositive integer in between; then the N is added_tDetermining a set of hash values as a first ordering target, and determining a first update rule according to the first ordering target (e.g., directly determining the first ordering target as the first update rule), so that the first update rule can describe the first order of the N_tAnd the adjustment rule according to which the sequence of the identity information of the first audio data needs to be adjusted when the sequence is adjusted.

Based on the related content of step 11, after the tth to-be-processed audio data is acquired, a first update rule may be generated according to the tth to-be-processed audio data, so that the first update rule may accurately describe an adjustment rule according to which identity information of the tth to-be-processed audio data needs to be adjusted, so that identity information of the tth to-be-processed audio data may be adjusted by using the first update rule in the following.

Step 12: and updating the identity information of the t-th audio data to be processed according to a first updating rule to obtain the identity information of the t + 1-th audio data to be processed.

In the embodiment of the application, after the first update rule is acquired, the identity information of the tth audio data to be processed can be adjusted according to the first update rule to obtain the identity information of the t +1 th audio data to be processed; moreover, the adjusting process may specifically be: when the t-th audio data to be processed includes N_tThe identity information of the first audio data and the tth audio data to be processed comprises N_tWhen the first updating rule includes the first sorting target, the first audio data may be the identity information of the first audio data, and the first updating rule may include the first sorting target_tAnd the identity information of the first audio data is sequenced and adjusted according to the first sequencing target to obtain the identity information of the t +1 th audio data to be processed. For example, when the identity information of the t-th audio data to be processed is "abc 123456 f", and the first ranking goal is [5,6,8,9,0,1,2,3,4,7]Meanwhile, the identity information of the t +1 th audio data to be processed is "346 fabc 125".

Based on the related content of the above steps 11 to 12, the identity information of the t +1 th to-be-processed audio data may be generated according to the t th to-be-processed audio data and the identity information of the t th to-be-processed audio data, so that the identity information of the t +1 th to-be-processed audio data may simultaneously carry the audio information carried by the t th to-be-processed audio data itself and the identity information corresponding thereto.

Based on the related content of S302, when the number of the audio data to be processed is T, the identity information of the 1 st audio data to be processed may be determined according to the audio basic identity information corresponding to the T audio data to be processed, so that the identity information of the 1 st audio data to be processed carries the corresponding relationship between the recording device and the user of the recording device; generating identity information of the t +1 th audio data to be processed according to the t-th audio data to be processed and the identity information of the t-th audio data to be processed, so that the identity information of the t +1 th audio data to be processed carries the audio information of the t-th audio data to be processed and the identity information corresponding to the audio information; t is a positive integer, T is less than or equal to T-1, and T is a positive integer.

S303: and generating target audio data according to the audio data to be processed and the identity information of the audio data to be processed, so that the target audio data carries the audio data to be processed and the identity information of the audio data to be processed.

The target audio data refers to stored data corresponding to the audio data to be processed; and the target audio data carries the audio data to be processed and the identity information of the audio data to be processed at the same time.

In addition, the generation process of the target audio data is not limited in the embodiment of the application, for example, the audio data to be processed and the identity information of the audio data to be processed may be directly spliced to obtain the target audio data.

In some cases, if the identity information carried by the target audio data is added as a whole, the illegal entity can easily determine the structure of the target audio data by simply checking and analyzing the target audio data, so that the illegal entity can use the structure of the target audio data to package the illegal audio data so as to disguise the illegal audio data as legal audio data and attack the transcription device by using the disguised legal audio data, thereby causing the transcription device to spend a large amount of time to transcribe the disguised legal audio data, and further causing the transcription device to be unable to transcribe real legal audio data in time.

Based on this, in order to further improve the structural security of the target audio data, the identity information may be added in a hashed (i.e., scatter-inserted) manner during the generation of the target audio data. Based on this, the embodiment of the present application further provides a possible implementation manner of generating the target audio data (i.e., S303), in this implementation manner, when the to-be-processed audio data includes N pieces of first audio data, and the identity information of the to-be-processed audio data includes the identity information of the N pieces of first audio data, S303 may specifically include S3031 to S3032:

s3031: generating ith second audio data according to the ith first audio data and the identity information of the ith first audio data, so that the ith second audio data carries the ith first audio data and the identity information of the ith first audio data; wherein i is a positive integer, i is not more than N, and N is a positive integer.

In the embodiment of the application, after the identity information of the ith first audio data and the ith first audio data is acquired, information integration may be performed on the identity information of the ith first audio data and the ith first audio data to obtain the ith second audio data, so that the ith second audio data may include the identity information of the ith first audio data and the ith first audio data.

In addition, the embodiment of the present application does not limit the information integration manner, and may be implemented by any existing or future information integration method for information integration. In addition, for ease of understanding, one possible embodiment is described below as an example.

In one possible implementation, S3031 may specifically be: and adding the identity information of the ith first audio data to a preset position of the ith first audio data to obtain the ith second audio data.

The preset position can be preset according to an application scene. For example, the preset position may be a front position of the ith first audio data, a rear position of the ith first audio data, or a position between any two adjacent characters in the ith first audio data.

The preset position of the ith first audio data is a character position before the position of the first character in the ith first audio data. For example, as shown in FIG. 6, when the ith first audio data is“C₁C₂……C_W", and the identity information of the ith first audio data includes 1 character, the leading position of the ith first audio data is" C "in fig. 6_B"the position of the location.

The post position of the ith first audio data is a character position located after the position where the last character in the ith first audio data is located. For example, as shown in fig. 6, when the ith first audio data is "C₁C₂……C_W", and the identity information of the ith first audio data includes 1 character, the position of the last bit of the ith first audio data is" C "in fig. 6_A"the position of the location.

Based on the related content of S3031, after the identity information of the ith first audio data and the identity information of the ith first audio data are acquired, the identity information of the ith first audio data may be added to the preset position of the ith first audio data to obtain the ith second audio data, so that the ith second audio data carries the identity information of the ith first audio data and the ith first audio data. For example, as shown in fig. 7, when the identity information of the ith first audio data is "3", the ith first audio data is "128B _ a 0", and the preset position is a leading position, the ith second audio data may be "3128B _ a 0". Wherein i is a positive integer, i is not more than N, and N is a positive integer.

S3032: and obtaining target audio data according to the 1 st second audio data to the Nth second audio data.

In the embodiment of the application, if the audio data to be processed includes N first audio data, after obtaining 1 st second audio data corresponding to the 1 st first audio data, 2 nd second audio data corresponding to the 2 nd first audio data, … …, and N th second audio data corresponding to the N th first audio data, the 1 st second audio data to the N th second audio data are directly spliced to obtain the target audio data. For example, as shown in fig. 8, when the 1 st second audio data is "3128B _ A0", the 2 nd second audio data is "4128B _ a 2", … …, and the nth second audio data is "5128B _ a 9", the target audio data is "3128B _ a04128B _ a2 … … 5128B _ a 9".

Based on the related contents of S3031 to S3032, after the audio data to be processed and the identity information of the audio data to be processed are obtained, the identity information of the audio data to be processed may be hashed to the audio data to be processed, so as to obtain the target audio data. The identity information carried by the target audio data is added in a hash mode, so that the structure of the target audio data is complex, illegal molecules cannot analyze the structure of the target audio data violently, illegal audio cannot be disguised as legal audio data by the illegal molecules, illegal audio attacks on the transcription equipment can be effectively avoided, and the transcription equipment can be guaranteed to transcribe the legal audio data in time.

In addition, after the target audio data is generated, the target audio data may be stored, or the target audio data may also be used (for example, the target audio data is played or transcribed), which is not specifically limited in this embodiment of the application.

Based on the related contents of S301 to S303, in the audio data generation method provided in the embodiment of the present application, to-be-processed audio data and identity information of the to-be-processed audio data are obtained, and then target audio data is generated according to the to-be-processed audio data and the identity information of the to-be-processed audio data, so that the target audio data carries the to-be-processed audio data and the identity information of the to-be-processed audio data. The target audio data carries identity information, so that the identity information can represent that the audio information carried by the target audio data is legal, subsequent transcription equipment can determine that the target audio data is legal audio data according to the identity information carried by the target audio data, and the legality screening of the audio data can be realized in the transcription equipment.

Based on the audio data generation method provided by the above method embodiment, the embodiment of the present application further provides an audio data transfer method, which is explained and explained below with reference to the accompanying drawings.

Method embodiment two

Referring to fig. 9, this figure is a flowchart of an audio data transcription method provided in this embodiment of the present application.

The audio data transcription method provided by the embodiment of the application comprises the following steps of S901-S904:

s901: and acquiring audio data to be transcribed.

Wherein the audio data to be transcribed is obtained by using the aboveMethod embodiment oneThe target audio data generated by any one of the embodiments of the method for generating intermediate audio data.

The audio data to be transcribed carries the actual identity information corresponding to the audio data to be transcribed. The actual identity information corresponding to the audio data to be transcribed is used to describe the identity of the audio information carried by the audio data to be transcribed (e.g., recording device information, recording device user information, transcription authorization related information, etc.).

The audio data to be transcribed may include at least one second audio data. Wherein the second audio data comprises at least one audio sample data and identity information of the at least one audio sample data. It should be noted that the relevant content of the "audio sample data" refers to the relevant content of the "audio sample data" in S301 above; further, the content related to the second audio data please refer to the content related to the "second audio data" in S303 above.

In addition, the number of audio data to be transcribed is not limited in the embodiments of the present application, and for example, the number of audio data to be transcribed is M. Wherein M is a positive integer.

In addition, the embodiment of the present application does not limit the manner of obtaining the audio data to be transcribed, for example, the transcription device may receive the audio data to be transcribed sent by other devices, and may also read the audio data to be transcribed from a specified storage space.

In some cases, the target user may select one stored audio data including a plurality of audio data to be transcribed for audio transcription, so that the audio data to be transcribed may be determined based on the stored audio data. Based on this, the embodiment of the present application further provides a possible implementation manner of obtaining audio data to be transcribed (i.e., S901), which may specifically include S9011 to S9012:

s9011: stored audio data is retrieved from the target storage location.

The target storage location is an actual storage location of the audio data specified by the user and to be transcribed. In addition, the target storage location is not limited in the embodiments of the present application, for example, the target storage location may be located in a storage space in the terminal device that triggers the audio transcription request.

The stored audio data may refer to utilizing the aboveMethod embodiment oneAny embodiment of the method for generating the medium audio data is used to process the audio to be stored to obtain the storage data corresponding to the audio to be stored, and the stored audio data may include the target audio data corresponding to the audio to be stored.

Based on the relevant content of S9011, in the embodiment of the present application, when a user wants to transcribe one piece of stored audio data, an audio transcription request may be triggered first, so that the audio transcription request is used to request to transcribe the stored audio data; and then the stored audio data is acquired from the target storage position according to the audio transcription request so as to be capable of carrying out the transcription processing on the stored audio data in the following.

S9012: and dividing the stored audio data according to a second division rule to obtain at least one audio data to be transcribed.

The second division rule may be preset, or may be generated according to the structure of the target audio data. For example, if a target audio data includes D characters, the second division rule may be set to have D characters as a division unit.

Based on the related content of S9012, after the stored audio data is acquired, the stored audio data may be divided according to a second division rule to obtain at least one audio data to be transcribed. For example, when the stored audio data includes D × M characters and the second division rule is that D characters are used as a division unit, the stored audio data may be divided according to the second division rule to obtain M audio data to be transcribed, so that each audio data to be transcribed includes D characters. Wherein D is a positive integer and M is a positive integer.

Based on the relevant contents of S9011 to S9012, when a user wants to transfer one piece of stored audio data, the user may first read the stored audio data from the target storage location, and then extract at least one piece of audio data to be transferred from the stored audio data, so that the user can subsequently implement a transfer process of the stored audio data based on the at least one piece of audio data to be transferred.

S902: and extracting the actual identity information corresponding to the audio data to be transcribed from the audio data to be transcribed.

The actual identity information corresponding to the audio data to be transcribed is used for describing the identity of the audio information carried by the audio data to be transcribed.

The extraction process of the actual identity information is not limited in the embodiment of the application, and the extraction process of the actual identity information is only required to be ensured to correspond to the insertion process of the identity information in the target audio data. In order to facilitate understanding of the extraction process of the actual identity information (i.e., S902), the following description is made with reference to an example.

As an example, when the identity information carried by the target audio data is added in a hashed manner and the audio data to be transcribed includes N second audio data, S902 may specifically include S9021-S9022:

s9021: extracting actual identity information corresponding to the kth second audio data from the kth second audio data; wherein k is a positive integer, k is less than or equal to N, and N is a positive integer.

In the embodiment of the application, for the kth second audio data in the audio data to be transcribed, the actual identity information corresponding to the kth second audio data may be directly extracted from the kth second audio data, so that the actual identity information corresponding to the audio data to be transcribed can be generated based on the actual identity information corresponding to the kth second audio data in the following.

It should be noted that, in the embodiment of the present application, the position of the actual identity information corresponding to the kth second audio data in the kth second audio data is not limited, for example, when the actual identity information corresponding to the kth second audio data is a character, the actual identity information corresponding to the kth second audio data may be located at a first character position, a middle character position, or a last character position in the kth second audio data.

S9022: and generating actual identity information corresponding to the audio data to be transcribed according to the actual identity information corresponding to the 1 st second audio data to the actual identity information corresponding to the Nth second audio data.

In the embodiment of the application, after the actual identity information corresponding to each second audio data in the audio data to be transcribed is acquired, the actual identity information corresponding to the 1 st second audio data can be directly spliced to the actual identity information corresponding to the nth second audio data, so that the actual identity information corresponding to the audio data to be transcribed is acquired.

Based on the relevant contents of S9021 to S9022, in this embodiment of the application, as shown in fig. 10, after audio data to be transcribed, which includes N second audio data, is acquired, actual identity information corresponding to 1 st second audio data may be extracted from 1 st second audio data, actual identity information corresponding to 2 nd second audio data may be extracted from 2 nd second audio data, … … (analogy), and actual identity information corresponding to nth second audio data may be extracted from nth second audio data; and splicing the actual identity information corresponding to the 1 st second audio data to the actual identity information corresponding to the Nth second audio data to obtain the actual identity information corresponding to the audio data to be transcribed.

Based on the related content of S902, in this embodiment of the application, after the audio data to be transcribed is obtained, the actual identity information corresponding to the audio data to be transcribed may be directly extracted from the audio data to be transcribed, so that the validity of the audio information carried by the audio data to be transcribed can be determined based on the actual identity information in the following.

S903: and determining whether the audio data to be transcribed is legal audio data or not according to the actual identity information corresponding to the audio data to be transcribed.

The determining process of whether the audio data to be transcribed is legal audio data is not limited in the embodiment of the application, and for example, whether the audio data to be transcribed is legal audio data can be determined in an identity information comparison mode. Based on this, the embodiment of the present application further provides a process for determining valid audio data, which may specifically include steps 21 to 23:

step 21: and acquiring theoretical identity information corresponding to the audio data to be transcribed.

The theoretical identity information corresponding to the audio data to be transcribed refers to standard identity information of the audio information carried by the audio data to be transcribed.

In addition, the generation process of the theoretical identity information corresponding to the audio data to be transcribed is similar to the generation process of the identity information of the audio data to be processed, so that any implementation mode for generating the identity information of the audio data to be processed can be adopted for implementation. To facilitate understanding of the theoretical identity information generation process, the following description is made with reference to an example.

As an example, when the number of the audio data to be transcribed is M, the receiving and recording time corresponding to the mth audio data to be transcribed is earlier than the receiving and recording time corresponding to the M +1 th audio data to be transcribed, M is a positive integer, M is less than or equal to M-1, and M is a positive integer, the theoretical identity information corresponding to the M +1 th audio data to be transcribed may be generated according to the mth audio data to be transcribed and the theoretical identity information corresponding to the mth audio data to be transcribed. Wherein, the theoretical identity information corresponding to the 1 st audio data to be transcribed is determined according to the audio basic identity information corresponding to the M audio data to be transcribed.

The method for determining the audio basic identity information corresponding to the M audio data to be transcribed is not limited in the embodiment of the application. For ease of understanding, the following description will take one possible embodiment as an example.

In one possible implementation, the process of determining the audio basic identity information corresponding to the M audio data to be transcribed comprises steps 31 to 32:

step 31: and acquiring at least one candidate identity information corresponding to the M audio data to be transcribed.

The embodiment of the present application does not limit the obtaining manner of step 31, for example, if the audio transcription requests corresponding to the M audio data to be transcribed are triggered by the target user, at least one candidate audio basic identity information corresponding to the user identity of the target user may be searched in the preset mapping relationship, and the candidate identity information corresponding to the M audio data to be transcribed is determined.

The preset mapping relation comprises a corresponding relation between a user identity of a target user and at least one candidate audio basic identity information. In addition, each candidate audio basic identity information can be generated according to the user identity of the target user and the product serial number of each recording device owned by the target user; moreover, a corresponding relationship exists between the user identification of the target user and the product serial number of the sound recording equipment owned by the target user.

For example, if the user A owns the recording apparatus A, the recording apparatus B, and the recording apparatus C, the user ID of the user A may correspond to the product serial number of the recording apparatus A, the product serial number of the recording apparatus B, and the product serial number of the recording apparatus C. At this time, first candidate audio basic identity information may be generated according to the user identity of the user a and the product serial number of the recording device a, second candidate audio basic identity information may be generated according to the user identity of the user a and the product serial number of the recording device B, and third candidate audio basic identity information may be generated according to the user identity of the user a and the product serial number of the recording device C; then establishing a corresponding relation between the user identity of the user A and the first audio basic identity information, a corresponding relation between the user identity of the user A and the second audio basic identity information, and a corresponding relation between the user identity of the user A and the third audio basic identity information; and finally, constructing a preset mapping relation according to the corresponding relation between the user identity of the user A and the first audio basic identity information, the corresponding relation between the user identity of the user A and the second audio basic identity information and the corresponding relation between the user identity of the user A and the third audio basic identity information.

Based on the related content of step 31, in some cases, since the target user may have a plurality of recording devices, the target user corresponds to a plurality of candidate audio basic identity information, so that after the target user triggers an audio transcription request corresponding to M audio data to be transcribed, the candidate audio basic identity information corresponding to the target user may be determined as a plurality of candidate identity information corresponding to M audio data to be transcribed, so that the audio basic identity information corresponding to M audio data to be transcribed can be subsequently determined from the candidate identity information.

Step 32: and determining the audio basic identity information corresponding to the M audio data to be transcribed according to the actual identity information corresponding to the 1 st audio data to be transcribed and at least one candidate identity information corresponding to the M audio data to be transcribed.

As an example, step 32 may specifically be: and respectively matching the actual identity information corresponding to the 1 st audio data to be transcribed with each candidate identity information corresponding to the M audio data to be transcribed, and determining the candidate identity information which is successfully matched as the audio basic identity information corresponding to the M audio data to be transcribed.

Based on the related contents in the above steps 31 to 32, in some cases, since the target user may have a plurality of sound recording devices, at least one candidate identity information corresponding to the user identity may be determined according to the user identity of the target user; and screening out the audio basic identity information corresponding to the M audio data to be transcribed from the at least one candidate identity information according to the actual identity information corresponding to the 1 st audio data to be transcribed.

In addition, the embodiment of the present application also does not limit the determination manner of the theoretical identity information corresponding to the 1 st audio data to be transcribed, for example, the audio basic identity information corresponding to the M audio data to be transcribed may be directly determined as the theoretical identity information corresponding to the 1 st audio data to be transcribed.

In addition, in the embodiment of the present application, a generation process of theoretical identity information corresponding to the m +1 th audio data to be transcribed is similar to the generation process of the identity information of the t +1 th audio data to be processed. For ease of understanding, the following description is made with reference to examples.

As an example, the generation process of the theoretical identity information corresponding to the m +1 th audio data to be transcribed may include steps 41 to 42:

step 41: and generating a second updating rule according to the mth audio data to be transcribed.

The second updating rule is used for describing an adjusting rule according to which theoretical identity information corresponding to the mth audio data to be transcribed needs to be adjusted. It should be noted that the second update rule is similar to the first update rule described above, and please refer to the first update rule described above for related content.

In addition, the generation process of the second update rule is similar to the generation process of the first update rule, so the second update rule can be generated according to the audio information carried by the mth audio data to be transcribed. Based on this, the present application provides a possible implementation manner of step 41, which specifically includes: the audio information corresponding to the mth audio data to be transcribed is extracted from the mth audio data to be transcribed, and then a second updating rule is generated according to the audio information corresponding to the mth audio data to be transcribed.

And the audio information corresponding to the mth audio data to be transcribed is used for describing the audio sampling data carried by the mth audio data to be transcribed. As can be seen, the audio information corresponding to the mth audio data to be transcribed may include the remaining information after the identity information in the mth audio data to be transcribed is removed. For example, when the mth audio data to be transcribed is "3128B _ a 0" and the identity information corresponding to the mth audio data to be transcribed is "3", the audio information corresponding to the mth audio data to be transcribed may be "128B _ a 0".

In addition, since the audio information corresponding to the mth audio data to be transcribed is similar to the tth audio data to be processed above, and the second update rule is similar to the first update rule above, the implementation of the step "generating the second update rule according to the audio information corresponding to the mth audio data to be transcribed" is similar to the implementation of the step 11 above, and please refer to the step 11 above for related contents.

Based on the related content of step 41, for the mth audio data to be transcribed, the audio information corresponding to the mth audio data to be transcribed may be extracted from the mth audio data to be transcribed, and then a second update rule is generated according to the audio information corresponding to the mth audio data to be transcribed, so that the second update rule may be used to describe an adjustment rule according to which identity information corresponding to the mth audio data to be transcribed needs to be adjusted, and thus theoretical identity information corresponding to the t +1 th audio data to be transcribed may be determined subsequently according to the second update rule.

Step 42: and updating the theoretical identity information corresponding to the mth audio data to be transcribed according to a second updating rule to obtain the theoretical identity information corresponding to the (m + 1) th audio data to be transcribed.

In fact, since the second update rule is similar to the first update rule above, and the theoretical identity information corresponding to the mth audio data to be transcribed is similar to the identity information of the tth audio data to be processed above, the implementation of step 42 is similar to the implementation of step 12 above, and please refer to step 12 above for related contents. For ease of understanding, the following description is made with reference to examples.

As an example, when the mth audio data to be transcribed includes N_mThe theoretical identity information corresponding to the mth audio data to be transcribed comprises N_mTheory of correspondence of second audio dataWhen the identity information and the second update rule include the second sort target, step 42 may specifically be: will N_mAnd sequencing and adjusting the theoretical identity information corresponding to the second audio data according to a second sequencing target to obtain the theoretical identity information corresponding to the (m + 1) th audio data to be transcribed.

It should be noted that the second ordering objective is similar to the first ordering objective, and the related content please refer to the first ordering objective.

Based on the related content of the above steps 41 to 42, the theoretical identity information corresponding to the m +1 th audio data to be transcribed may be generated according to the audio information corresponding to the m +1 th audio data to be transcribed and the identity information thereof, so that the theoretical identity information corresponding to the m +1 th audio data to be transcribed may carry the audio information carried by the m +1 th audio data to be transcribed and the identity information thereof.

Based on the related content of the step 21, in the embodiment of the present application, for the M audio data to be transcribed, theoretical identity information corresponding to the 1 st audio data to be transcribed may be determined according to the audio basic identity information corresponding to the M audio data to be transcribed; generating theoretical identity information corresponding to the 2 nd audio data to be transcribed according to the 1 st audio data to be transcribed and the theoretical identity information corresponding to the 1 st audio data to be transcribed; generating theoretical identity information corresponding to the 3 rd audio data to be transcribed according to the 2 nd audio data to be transcribed and the theoretical identity information corresponding to the 2 nd audio data to be transcribed; … … (and so on); and generating theoretical identity information corresponding to the Mth audio data to be transcribed according to the M-1 th audio data to be transcribed and the theoretical identity information corresponding to the M-1 th audio data to be transcribed, so that the legality of the M audio data to be transcribed can be analyzed subsequently according to the theoretical identity information corresponding to the M audio data to be transcribed.

Step 22: and matching the actual identity information corresponding to the audio data to be transcribed with the theoretical identity information corresponding to the audio data to be transcribed to obtain an identity matching result corresponding to the audio data to be transcribed.

As an example, when the number of the audio data to be transcribed is M, the step 22 may specifically be: matching actual identity information corresponding to the r-th audio data to be transcribed with theoretical identity information corresponding to the r-th audio data to be transcribed to obtain an identity matching result corresponding to the r-th audio data to be transcribed; wherein r is a positive integer, r is less than or equal to M, and M is a positive integer.

In this embodiment of the application, if the number of the audio data to be transcribed is M, the actual identity information corresponding to the 1 st audio data to be transcribed and the theoretical identity information thereof may be matched to obtain an identity matching result corresponding to the 1 st audio data to be transcribed; matching the actual identity information corresponding to the 2 nd audio data to be transcribed with the theoretical identity information thereof to obtain an identity matching result corresponding to the 2 nd audio data to be transcribed; … … (and so on); and matching the actual identity information corresponding to the Mth audio data to be transcribed and the theoretical identity information thereof to obtain an identity matching result corresponding to the Mth audio data to be transcribed, so that the legality of the M audio data to be transcribed can be judged by using the identity matching result corresponding to the M audio data to be transcribed in the following.

Step 23: and determining whether the audio data to be transcribed is legal audio data or not according to the identity matching result corresponding to the audio data to be transcribed.

In some cases (for example, M audio data to be transcribed are determined based on the stored audio data), the validity of the M audio data to be transcribed should be collectively determined. Based on this, the present application provides a possible implementation manner of step 23, which may specifically be: if the identity matching results corresponding to the M audio data to be transcribed all represent that the matching is successful, determining that the M audio data to be transcribed are legal audio data; and if at least one identity matching result corresponding to the M audio data to be transcribed indicates that the matching fails, determining that the M audio data to be transcribed is illegal audio data.

As can be seen, for the stored audio data including M audio data to be transcribed, when the identity matching result corresponding to the 1 st audio data to be transcribed is a successful match, the identity matching result corresponding to the 2 nd audio data to be transcribed is a successful match, … … (analogy in sequence), and the identity matching result corresponding to the M audio data to be transcribed is a successful match, the M audio data to be transcribed is determined to be legal audio data, so that the stored audio data including the M audio data to be transcribed can be determined to be legal audio data; otherwise, the stored audio data including the M audio data to be transcribed may be determined to be illegal audio data.

Based on the related contents of the above steps 21 to 23, in the embodiment of the present application, it can be determined whether the audio data to be transcribed is legal or not according to the theoretical identity information corresponding to the audio data to be transcribed and the matching result of the actual identity information thereof.

It should be noted that, in some cases (for example, the audio basic identity information may represent a corresponding relationship between the recording apparatus and the user), since the illegal audio data is collected by the illegal recording apparatus without the transcription authorization, the corresponding relationship corresponding to the illegal recording apparatus does not exist in the transcription apparatus, so that the transcription apparatus cannot query the audio basic identity information, which is matched with the actual identity information of the 1 st audio data to be transcribed, extracted from the illegal audio data, from the stored audio basic identity information. As can be seen, when it is determined that the actual identity information corresponding to the 1 st audio data to be transcribed and the at least one candidate identity information corresponding to the M audio data to be transcribed fail to match, it may be determined that the stored audio data including the 1 st audio data to be transcribed is illegal audio data.

S904: and when the audio data to be transcribed is determined to be legal audio data, performing transcription processing on the audio data to be transcribed to obtain characters corresponding to the audio data to be transcribed.

The transfer processing refers to transferring the audio information carried in the audio data to be transferred into characters.

In some cases, if the audio data to be transcribed includes the audio sampling points after the encryption processing, the audio information in the audio data to be transcribed may be decrypted first, and then the decrypted audio information may be transcribed. Based on this, the embodiment of the present application further provides a possible implementation manner of S904, which may specifically include S9041-S9043:

s9041: and extracting the audio data to be decrypted corresponding to the audio data to be transcribed from the audio data to be transcribed.

The audio data to be decrypted corresponding to the audio data to be transcribed refers to the audio information carried in the audio data to be transcribed. In addition, the extraction process of the audio data to be decrypted corresponding to the audio data to be transcribed is similar to the extraction process of the "audio information corresponding to the mth audio data to be transcribed" in step 41, and the relevant content please refer to the above.

S9042: and decrypting the audio data to be decrypted corresponding to the audio data to be transcribed to obtain decrypted audio data corresponding to the audio data to be transcribed.

And the decrypted audio data corresponding to the audio data to be transcribed comprises at least one audio sampling point.

In some cases, if the audio data to be decrypted corresponding to the audio data to be transcribed is obtained by encrypting using a preset encryption algorithm, S9042 may specifically be: and decrypting the audio data to be decrypted corresponding to the audio data to be transcribed by utilizing the decryption algorithm corresponding to the preset encryption algorithm to obtain the decrypted audio data corresponding to the audio data to be transcribed.

S9043: and decrypting the audio data corresponding to the audio data to be transcribed to obtain characters corresponding to the audio data to be transcribed.

The present embodiment is not limited to the transcription method, and may be implemented by any method that can transcribe audio into text, which is currently available or will come into existence in the future.

Based on the relevant contents of S9041 to S9043, if the audio data to be transcribed includes encrypted audio information, the encrypted audio information may be extracted from the audio data to be transcribed, the encrypted audio information is decrypted to obtain decrypted audio information, and finally, the decrypted audio information is transcribed to obtain a text corresponding to the audio data to be transcribed.

Based on the related contents of the above S901 to S904, in this embodiment of the application, if a target user wants to transcribe a stored audio data, the target user may trigger an audio transcription request for requesting to transcribe the stored audio data, so that the transcription device can transcribe the audio information carried by the stored audio data based on the audio transcription request by using the above S901 to S904, and obtain the text corresponding to the stored audio data. The stored audio data carries the identity information of the audio information, so that the transcription equipment can determine whether the stored audio data is legal audio data or not based on the identity information carried by the stored audio data, legality screening of the audio data can be realized, the transcription equipment only needs to transcribe the legal audio data without transcribing illegal audio data, time consumed by the transcription equipment for transcribing the illegal audio data can be saved, the transcription equipment can transcribe the legal audio data in time, the transcription instantaneity of the transcription equipment can be improved, and particularly the transcription instantaneity of the transcription equipment on the legal audio data is improved.

In order to facilitate understanding of the above audio data generation method and audio data transcription method, the following description is made in conjunction with a scene embodiment.

Scene embodiment

In some cases, the recording device may record and store audio, but the recording device cannot perform other complicated operations (e.g., audio transfer operation), and at this time, the other complicated operations may be performed by means of a user terminal device (e.g., a terminal device such as a mobile phone or a computer). For ease of understanding, the following description is made in conjunction with the application scenario illustrated in fig. 11. Fig. 11 is a schematic view of an application scenario provided in the embodiment of the present application.

In the application scenario shown in fig. 11, the recording pen 1101 can record and store audio; the recording pen 1101 can be mounted to the user terminal device 1102 through a preset connection mode (e.g., a USB interface connection mode), so that the user terminal device 1102 can directly read audio data stored in the recording pen 1101; the recording pen 1101 can also communicate with the user terminal device 1102 through a first communication method (e.g., a wireless communication method) so that the recording pen 1101 can acquire information sent by the transcription server 1103 via the user terminal device 1102. The transcription server 1103 can communicate with the user terminal device 1102 through a second communication method.

The user terminal device 1102 according to the embodiment of the present application is not limited to the user terminal device 1102, and for example, the user terminal device 1102 may be a smart phone, a computer, a Personal Digital Assistant (PDA), a tablet computer, or the like.

In addition, in order to enable the transcription server 1103 to distinguish between legitimate audio data and illegitimate audio data, the transcription server 1103 may send a transcription authorization code to the recording pen 1101. The transcription authorization code is determined according to the correspondence between the recording pen 1101 and the user thereof, and the generation process of the transcription authorization code may specifically include steps 51 to 53:

step 51: when the user terminal device 1102 and the recording pen 1101 are successfully connected through the first communication mode and/or the preset connection mode, the user may trigger the audio transcription authorization request on the user terminal device 1102, so that the user terminal device 1102 sends the audio transcription authorization request to the transcription server 1102. The audio transcription authorization request carries the user id of the user and the product serial number of the recording pen 1101.

Step 52: the transcription server 1102 generates audio basic identity information corresponding to the user identity of the user according to the user identity of the user and the product serial number of the recording pen 1101, which are carried by the audio transcription authorization request, and records the user identity of the user and the corresponding relationship between the audio basic identity information corresponding to the user identity of the user.

Step 53: the transcription server 1102 feeds back the audio basic identity information corresponding to the user identity of the user to the user terminal device 1102, so that the user terminal device 1102 forwards the audio basic identity information to the recording pen 1101 for storage, so that the subsequent recording pen 1101 can generate audio data carrying the identity information based on the audio basic identity information.

Based on the related contents of the above steps 51 to 53, since the audio basic identity information is generated according to the user identity of the user and the product serial number of the recording pen 1101, the corresponding relationship between the user identity of the user and the product serial number of the recording pen 1101 is implied in the audio basic identity information, so that after the recording pen 1101 receives the audio basic identity information, the recording pen 1101 can determine the corresponding relationship between the user identity of the user authorized by the transcription server 1103 and the product serial number of the recording pen 1101, and thus the transcription server 1103 can recognize the audio data carrying the identity information generated based on the audio basic identity information as legal audio data.

In addition, the recording pen 1101 may generate and store audio storage data based on the audio basic identity information, as shown in fig. 12, the process may specifically include S1201-S1209:

s1201: after the recording pen 1101 receives the recording request, the recording pen 1101 is in a recording state at an audio sampling point.

The recording request is used to request the recording pen 1101 to record sound. The recording state is a state in which the recording pen 1101 is in a sound collection state.

S1202: the recording pen 1101 records audio sampling points according to a preset recording frequency, and determines the recorded audio sampling points as unprocessed sampling data. The preset receiving and recording frequency can be preset according to an application scene.

S1203: judging whether the quantity of unprocessed sampling data reaches a first numerical value or not; if yes, go to S1204; if not, the process returns to step S1203.

The first value may be predetermined, for example, the first value may be 128.

S1204: and encrypting the unprocessed sample data of the first numerical value to obtain encrypted sample data of the first numerical value, deleting the unprocessed sample data of the first numerical value, and determining a set of the encrypted sample data of the first numerical value as first audio data.

It should be noted that the encryption process in S1204 is similar to the encryption process in S3012, and please refer to S3012 above for related contents.

S1205: judging whether the quantity of the first audio data reaches a second value or not; if yes, go to S1206; if not, the process returns to step S1203.

The second value may be predetermined, for example, the second value may be N (e.g., 10).

S1206: and determining the set of the first audio data with the second numerical value as the current audio data to be processed, and deleting the first audio data with the second numerical value.

S1207: and acquiring the identity information of the current audio data to be processed.

It is noted that S1207 may be implemented by using the above specific implementation of S302. For example, if the current audio data to be processed is the 1 st audio data to be processed, the audio basic identity information stored in the recording pen 1101 may be determined as the identity information of the current audio data to be processed; moreover, if the current audio data to be processed is the identity information of the t +1 th audio data to be processed, the identity information of the current audio data to be processed may be generated according to the t-th audio data to be processed and the identity information of the t-th audio data to be processed. Wherein T is a positive integer, T is less than or equal to T-1, and T is the total number of the audio data to be processed generated in the audio recording process.

S1208: and generating current target audio data according to the current audio data to be processed and the identity information of the current audio data to be processed, so that the current target audio data carries the current audio data to be processed and the identity information of the current audio data to be processed, and storing the current target audio data.

It is noted that S1208 can be implemented by the above specific implementation of S303. For example, the identity information of the current audio data to be processed is added to the preset position of each first audio data in the current audio data to be processed in a hash manner to obtain current target audio data, and the current target audio data is stored.

S1209: judging whether a stop condition is reached, if so, ending the generation process of the audio data; if not, the process returns to step S1203.

The stop condition may be preset, for example, the stop condition may be that there is no unprocessed sample data.

Based on the related contents of the above S1201 to S1209, when the user wants to record audio by using the recording pen 1101, the user may trigger the recording request, so that the recording pen 1101 records audio sampling points after receiving the recording request, and processes and stores the recorded audio sampling points in the recording process of the audio sampling points, so that the stored audio data in the recording pen 1101 all carries identity information, so that the subsequent transcription server 1103 can determine that the stored audio data is legal audio data based on the identity information.

In addition, when a user wants to transcribe the stored audio data in the recording pen 1101, the user may mount the recording pen 1101 on the user terminal device 1102, trigger an audio transcription request in the user terminal device 1102, and send the audio transcription request and the stored audio data selected by the user to the transcription server 1103 by the user terminal device 1102, so that the transcription server 1103 may perform audio transcription by using any one of the embodiments of the audio data transcription method provided in this embodiment of the present application. The audio transcription request carries the user identification of the user.

As an example, the transcription server 1101 may specifically include steps 61-65:

step 61: the transcription server 1101 divides the received stored audio data according to a second division rule to obtain at least one audio data to be transcribed.

Step 61 may be performed using any of the embodiments of S9012 described above.

Step 62: and extracting the actual identity information corresponding to each audio data to be transcribed from each audio data to be transcribed.

It should be noted that the extraction process of the actual identity information corresponding to each audio data to be transcribed is similar to the extraction process of S902 above, and the relevant content refers to S902 above.

And step 63: the transcription server 1101 searches at least one candidate audio basic identity information corresponding to the user identity in the stored preset mapping relationship according to the user identity of the user carried by the audio transcription request, and determines the candidate audio basic identity information as at least one candidate identity information.

It should be noted that the process of determining "at least one candidate identity information" is similar to the process of determining "at least one candidate identity information corresponding to M audio data to be transcribed", and please refer to the above for related contents.

Step 64: judging whether candidate identity information which is successfully matched with the actual identity information corresponding to the 1 st audio data to be transcribed exists in the at least one candidate identity information, if so, executing a step 65; if not, determining that the stored audio data is illegal audio data, and adopting processing operation corresponding to the illegal audio data.

For example, the processing operation corresponding to the illegal audio data may include an operation related to audio transfer payment, and may also include an operation of ending the transfer flow of the stored audio data.

Step 65: and acquiring theoretical identity information corresponding to each audio data to be transcribed.

Step 65 may be implemented by the implementation of step 21 above. For example, the candidate identity information successfully matched with the actual identity information corresponding to the 1 st audio data to be transcribed is determined as the theoretical identity information corresponding to the 1 st audio data to be transcribed; and generating theoretical identity information corresponding to the (m + 1) th audio data to be transcribed according to the mth audio data to be transcribed and the theoretical identity information corresponding to the mth audio data to be transcribed. Wherein M is a positive integer, M is less than or equal to M-1, M is a positive integer, and M is the total number of the audio data to be transcribed extracted from the stored audio data.

And step 66: and matching the actual identity information of the theoretical identity information machine corresponding to each audio data to be transcribed to obtain an identity matching result corresponding to each audio data to be transcribed.

It should be noted that step 66 may be implemented using the embodiment of step 22 above.

Step 67: if the identity matching results corresponding to all the audio data to be transcribed are successful, determining that all the audio data to be transcribed are legal audio data (that is, the stored audio data are legal audio data), and performing transcription processing on all the audio data to be transcribed to obtain characters corresponding to all the audio data to be transcribed (that is, characters corresponding to the stored audio data).

Note that the "transfer processing" in step 67 may be implemented by the above embodiment of S904.

Step 68: and if at least one matching failure exists in the identity matching results corresponding to all the audio data to be transcribed, determining that the stored audio data is illegal audio data, and adopting the processing operation corresponding to the illegal audio data.

Based on the related contents from the foregoing step 61 to step 68, after receiving the audio transcription request and the stored audio data, the transcription server 1103 may determine at least one candidate audio basic identity information corresponding to the user according to the user identity carried in the audio transcription request; and determining whether the stored audio data is legal audio data according to at least one candidate audio basic identity information corresponding to the user and the audio information and the identity information thereof carried by the stored audio data, so that when the stored audio data is determined to be legal audio data, the stored audio data is subjected to transcription processing to obtain characters corresponding to the stored audio data.

As can be seen from the above-mentioned related contents of fig. 11, if the recording pen 1101 cannot directly communicate with the transcription server 1103, the recording pen 1101 can realize a communication process with the transcription server via the user terminal device 1102.

However, in some cases, if the recording pen can directly communicate with the transcription server, the recording pen does not need to use the user terminal device, and the recording pen may directly send the audio transcription request to the transcription server, or may directly send the user id of the user and the product serial number of the recording pen to the transcription server, so as to directly receive the audio basic id information corresponding to the user id of the user from the transcription server.

Based on the audio data generation method provided by the above method embodiment, an embodiment of the present application further provides an audio data generation apparatus, which is explained and explained below with reference to the accompanying drawings.

Apparatus embodiment one

The device embodiment describes an audio data generating device, and please refer to the above method embodiment for related contents.

Referring to fig. 13, this figure is a schematic structural diagram of an audio data generation apparatus according to an embodiment of the present application.

The audio data generation apparatus 1300 provided in the embodiment of the present application includes:

a first obtaining unit 1301, configured to obtain audio data to be processed and identity information of the audio data to be processed;

a data generating unit 1302, configured to generate target audio data according to the audio data to be processed and the identity information of the audio data to be processed, so that the target audio data carries the audio data to be processed and the identity information of the audio data to be processed.

In a possible implementation manner, in order to improve the real-time performance of the transcription device, the data generating unit 1302 includes:

a first generating subunit, configured to generate, when the to-be-processed audio data includes N pieces of first audio data and the identity information of the to-be-processed audio data includes identity information of the N pieces of first audio data, ith second audio data according to the ith first audio data and the identity information of the ith first audio data, so that the ith second audio data carries the ith first audio data and the identity information of the ith first audio data; wherein i is a positive integer, i is not more than N, and N is a positive integer;

and the second generating subunit is used for obtaining the target audio data according to the 1 st second audio data to the Nth second audio data.

In a possible implementation manner, in order to improve the transfer real-time performance of the transfer device, the first generating subunit is specifically configured to:

In a possible implementation manner, in order to improve the real-time performance of the transcription device, the first obtaining unit 1301 includes:

the first obtaining subunit is configured to, if the number of the to-be-processed audio data is T, generate identity information of the T +1 th to-be-processed audio data according to the T-th to-be-processed audio data and the identity information of the T-th to-be-processed audio data; the recording time corresponding to the tth audio data to be processed is earlier than the recording time corresponding to the t +1 th audio data to be processed; t is a positive integer, T is less than or equal to T-1, and T is a positive integer; the identity information of the 1 st audio data to be processed is determined according to the audio basic identity information corresponding to the T audio data to be processed.

In a possible implementation manner, in order to improve the transfer real-time performance of the transfer device, the obtaining process of the audio basic identity information corresponding to the T pieces of audio data to be processed is as follows:

and generating audio basic identity information corresponding to the T audio data to be processed according to the user identity identifications corresponding to the T audio data to be processed and the product serial numbers corresponding to the T audio data to be processed.

In a possible implementation manner, to improve the real-time performance of the transcription device, the first obtaining subunit includes:

the third generating subunit is configured to generate a first update rule according to the tth audio data to be processed;

and the fourth generating subunit is configured to update the identity information of the tth audio data to be processed according to the first update rule, so as to obtain the identity information of the t +1 th audio data to be processed.

In a possible implementation manner, in order to improve the real-time performance of the transcription device, the fourth generating subunit is specifically configured to:

when the t-th audio data to be processed comprises N_tThe identity information of the tth audio data to be processed comprises N_tWhen the first updating rule comprises a first sequencing target, the N is the same as the first audio data_tThe identity information of the first audio data is sorted and adjusted according to the first sorting target to obtain the identity information of the t +1 th audio data to be processed; wherein N is_tIs a positive integer.

the second acquisition subunit is used for acquiring original audio data; and encrypting the original audio data to obtain the audio data to be processed.

Based on the audio data transfer method provided by the above method embodiment, the embodiment of the present application further provides an audio data transfer device, which is explained and explained below with reference to the accompanying drawings.

Device embodiment II

The device embodiment describes an audio data transcription device, and please refer to the above method embodiment for related contents.

Referring to fig. 14, a schematic structural diagram of an audio data transcription apparatus according to an embodiment of the present application is shown.

The audio data transcription apparatus 1400 provided in the embodiment of the present application includes:

a second acquisition unit 1401 for acquiring audio data to be transcribed; the audio data to be transcribed is target audio data generated by any implementation mode of the audio data generation method provided by the embodiment of the application;

an information extraction unit 1402, configured to extract actual identity information corresponding to the audio data to be transcribed from the audio data to be transcribed;

a validity determining unit 1403, configured to determine whether the audio data to be transcribed is valid audio data according to the actual identity information corresponding to the audio data to be transcribed;

and an audio transfer unit 1404, configured to, when it is determined that the audio data to be transferred is legal audio data, perform transfer processing on the audio data to be transferred to obtain characters corresponding to the audio data to be transferred.

In a possible implementation manner, in order to improve the transfer real-time performance of the transfer apparatus, the information extraction unit 1402 includes:

the information extraction subunit is configured to, if the audio data to be transcribed includes N second audio data, extract actual identity information corresponding to the kth second audio data from the kth second audio data; wherein k is a positive integer, k is less than or equal to N, and N is a positive integer;

and the fifth generating subunit is configured to generate actual identity information corresponding to the audio data to be transcribed according to the actual identity information corresponding to the 1 st second audio data to the actual identity information corresponding to the nth second audio data.

In a possible implementation manner, in order to improve the transfer real-time performance of the transfer device, the audio data transfer apparatus 1400 further includes:

a third obtaining unit, configured to obtain theoretical identity information corresponding to the audio data to be transcribed;

the validity determining unit 1403 includes:

a third obtaining subunit, configured to match actual identity information corresponding to the audio data to be transcribed with theoretical identity information corresponding to the audio data to be transcribed, so as to obtain an identity matching result corresponding to the audio data to be transcribed;

and the first determining subunit is used for determining whether the audio data to be transcribed is legal audio data or not according to the identity matching result corresponding to the audio data to be transcribed.

In a possible implementation manner, in order to improve the transfer real-time performance of the transfer device, the third obtaining unit is specifically configured to:

if the number of the audio data to be transcribed is M, generating theoretical identity information corresponding to the M +1 th audio data to be transcribed according to the mth audio data to be transcribed and the theoretical identity information corresponding to the mth audio data to be transcribed; wherein M is a positive integer, M is not more than M-1, and M is a positive integer; the recording time corresponding to the mth audio data to be transcribed is earlier than the recording time corresponding to the (m + 1) th audio data to be transcribed; the theoretical identity information corresponding to the 1 st audio data to be transcribed is determined according to the audio basic identity information corresponding to the M audio data to be transcribed.

a sixth generating subunit, configured to generate a second update rule according to the mth audio data to be transcribed;

and the seventh generating subunit is configured to update the theoretical identity information corresponding to the mth audio data to be transcribed according to the second update rule, so as to obtain theoretical identity information corresponding to the m +1 th audio data to be transcribed.

In a possible implementation manner, in order to improve the real-time performance of the transcription device, the seventh generating subunit is specifically configured to:

when the mth audio data to be transcribed comprises N_mThe m-th audio data to be transcribed corresponds to theoretical identity information comprising N_mTheoretical identity information corresponding to second audio data, and when the second updating rule comprises a second sequencing target, the N is used_mAnd sequencing and adjusting the theoretical identity information corresponding to the second audio data according to the second sequencing target to obtain the theoretical identity information corresponding to the (m + 1) th audio data to be transcribed.

In a possible implementation manner, in order to improve the transfer real-time performance of the transfer device, the third obtaining subunit is specifically configured to:

if the number of the audio data to be transcribed is M, matching actual identity information corresponding to the r audio data to be transcribed with theoretical identity information corresponding to the r audio data to be transcribed to obtain an identity matching result corresponding to the r audio data to be transcribed; wherein r is a positive integer, r is less than or equal to M, and M is a positive integer;

the first determining subunit is specifically configured to:

In a possible implementation manner, in order to improve the transfer real-time performance of the transfer apparatus, the audio transfer unit 1404 is specifically configured to:

extracting audio data to be decrypted corresponding to the audio data to be transcribed from the audio data to be transcribed; decrypting the audio data to be decrypted corresponding to the audio data to be transcribed to obtain decrypted audio data corresponding to the audio data to be transcribed; and decrypting the audio data corresponding to the audio data to be transcribed to obtain characters corresponding to the audio data to be transcribed.

Further, an embodiment of the present application also provides an audio data generating apparatus, including: a processor, a memory, a system bus;

the processor and the memory are connected through the system bus;

the memory is used for storing one or more programs, and the one or more programs comprise instructions which, when executed by the processor, cause the processor to execute any one of the implementation methods of the audio data generation method.

Further, an embodiment of the present application further provides an audio data transcription device, including: a processor, a memory, a system bus;

the processor and the memory are connected through the system bus;

the memory is used for storing one or more programs, and the one or more programs comprise instructions which, when executed by the processor, cause the processor to execute any one of the implementation methods of the audio data transfer method.

Further, an embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a terminal device, the terminal device is caused to execute any implementation method of the above audio data generation method or execute any implementation method of the above audio data transcription method.

Further, an embodiment of the present application further provides a computer program product, which when running on a terminal device, enables the terminal device to execute any implementation method of the above audio data generation method or execute any implementation method of the above audio data transcription method.

As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that all or part of the steps in the above embodiment methods can be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network communication device such as a media gateway, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

It should be noted that, in the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of audio data generation, the method comprising:

2. The method according to claim 1, wherein when the audio data to be processed includes N pieces of first audio data and the identity information of the audio data to be processed includes the identity information of the N pieces of first audio data, the generating the target audio data according to the audio data to be processed and the identity information of the audio data to be processed includes:

3. The method of claim 2, wherein the generating the ith second audio data according to the ith first audio data and the identity information of the ith first audio data comprises:

4. The method of claim 1, wherein if the number of audio data to be processed is T, the method further comprises:

the acquiring identity information of the audio data to be processed includes:

5. The method of claim 1, wherein the obtaining the audio data to be processed comprises:

acquiring original audio data;

6. A method for audio data transcription, the method comprising:

acquiring audio data to be transcribed; wherein the audio data to be transcribed is target audio data generated by the audio data generation method of any one of claims 1 to 5;

7. The method according to claim 6, wherein if the audio data to be transcribed includes N second audio data, the extracting actual identity information corresponding to the audio data to be transcribed from the audio data to be transcribed comprises:

8. The method of claim 6, further comprising:

9. The method according to claim 8, wherein if the number of the audio data to be transcribed is M, the obtaining theoretical identity information corresponding to the audio data to be transcribed comprises:

10. The method according to claim 9, wherein the generating theoretical identity information corresponding to the m +1 th audio data to be transcribed according to the mth audio data to be transcribed and the theoretical identity information corresponding to the mth audio data to be transcribed comprises:

11. The method of claim 10, wherein when the mth audio data to be transcribed comprises N_mThe m-th audio data to be transcribed corresponds to theoretical identity information comprising N_mWhen the second update rule includes a second ordering target, the theoretical identity information corresponding to the mth audio data to be transcribed is updated according to the second update rule to obtain the theoretical identity information corresponding to the m +1 th audio data to be transcribed, including:

12. The method according to claim 8, wherein if the number of the audio data to be transcribed is M, the matching the actual identity information corresponding to the audio data to be transcribed and the theoretical identity information corresponding to the audio data to be transcribed to obtain the identity matching result corresponding to the audio data to be transcribed comprises:

13. The method according to claim 6, wherein the performing the transcription processing on the audio data to be transcribed to obtain the text corresponding to the audio data to be transcribed comprises:

14. An apparatus for audio data transcription, the apparatus comprising:

a second acquisition unit configured to acquire audio data to be transcribed; wherein the audio data to be transcribed is target audio data generated by the audio data generation method of any one of claims 1 to 5;

15. A computer-readable storage medium having stored therein instructions that, when run on a terminal device, cause the terminal device to execute the audio data generation method of any one of claims 1 to 5 or the audio data transcription method of any one of claims 6 to 13.