CN112837690B

CN112837690B - Audio data generation method, audio data transfer method and device

Info

Publication number: CN112837690B
Application number: CN202011622002.6A
Authority: CN
Inventors: 许凌; 李明; 陶飞
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2024-04-16
Anticipated expiration: 2040-12-30
Also published as: CN112837690A

Abstract

The application discloses an audio data generation method, an audio data transfer method and a device thereof, wherein the generation method comprises the following steps: firstly, acquiring to-be-processed audio data and identity information of the to-be-processed audio data, and then generating target audio data according to the to-be-processed audio data and the identity information of the to-be-processed audio data, so that the target audio data carries the to-be-processed audio data and the identity information of the to-be-processed audio data. The target audio data carries the identity information, so that the subsequent transfer equipment can determine that the target audio data is legal audio data according to the identity information carried by the target audio data, and therefore validity screening of the audio data can be achieved in the transfer equipment, the transfer equipment can transfer legal audio data only without transferring illegal audio data, and the transfer equipment can transfer the legal audio data timely, and therefore transfer instantaneity of the transfer equipment for the legal audio data can be improved.

Description

Audio data generation method, audio data transfer method and device

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an audio data generating method, an audio data transferring method, and an apparatus thereof.

Background

Currently, audio data may be recorded in recording application scenes (e.g., conference, interview, lesson training, etc.) using a recording device (e.g., a recording pen) so that the audio data may be subsequently converted into text by a transcription device, so that a user may learn about the audio information recorded in the audio data through the text. The audio transcription refers to converting audio information recorded in audio into characters.

However, since the audio data waiting for transfer in the transfer apparatus is generally comparatively large, the load of the transfer apparatus having a limited transfer capability is excessively heavy, resulting in lower transfer instantaneity of the transfer apparatus.

Disclosure of Invention

The main object of the embodiments of the present application is to provide an audio data generating method, an audio data transferring method and a device thereof, which can implement validity screening of audio data, so as to improve transfer instantaneity of a transferring device.

The embodiment of the application provides an audio data generation method, which comprises the following steps:

acquiring audio data to be processed and identity information of the audio data to be processed;

Generating target audio data according to the audio data to be processed and the identity information of the audio data to be processed, so that the target audio data carries the audio data to be processed and the identity information of the audio data to be processed.

Optionally, when the audio data to be processed includes N pieces of first audio data and the identity information of the audio data to be processed includes identity information of N pieces of first audio data, generating the target audio data according to the audio data to be processed and the identity information of the audio data to be processed includes:

generating an ith second audio data according to the ith first audio data and the identity information of the ith first audio data, so that the ith second audio data carries the identity information of the ith first audio data and the ith first audio data; wherein i is a positive integer, i is less than or equal to N, and N is a positive integer;

and obtaining target audio data according to the 1 st second audio data to the N th second audio data.

Optionally, the generating the ith second audio data according to the ith first audio data and the identity information of the ith first audio data includes:

And adding the identity information of the ith first audio data to a preset position of the ith first audio data to obtain ith second audio data.

Optionally, if the number of audio data to be processed is T, the method further includes:

generating audio basic identity information corresponding to the T pieces of audio data to be processed according to user identity identifiers corresponding to the T pieces of audio data to be processed and product serial numbers corresponding to the T pieces of audio data to be processed;

the obtaining the identity information of the audio data to be processed includes:

according to the audio basic identity information corresponding to the T pieces of audio data to be processed, determining the identity information of the 1 st piece of audio data to be processed;

generating identity information of the (t+1) th audio data to be processed according to the (t) th audio data to be processed and the identity information of the (t) th audio data to be processed; the recording time corresponding to the t-th audio data to be processed is earlier than the recording time corresponding to the t+1th audio data to be processed; t is a positive integer, T is less than or equal to T-1, and T is a positive integer.

Optionally, the generating the identity information of the (t+1) th audio data to be processed according to the (t) th audio data to be processed and the identity information of the (t) th audio data to be processed includes:

Generating a first updating rule according to the t-th audio data to be processed;

updating the identity information of the t-th audio data to be processed according to the first updating rule to obtain the identity information of the t+1-th audio data to be processed.

Optionally, when the tth audio data to be processed includes N _t A first audio data, the identity information of the t th audio data to be processed comprises N _t When the first update rule includes a first ordering target, updating the identity information of the t-th audio data to be processed according to the first update rule to obtain the identity information of the t+1st audio data to be processed, including:

the N is set to _t The identity information of the first audio data is subjected to sorting adjustment according to the first sorting targetObtaining the identity information of the (t+1) th audio data to be processed; wherein N is _t Is a positive integer.

Optionally, the acquiring the audio data to be processed includes:

acquiring original audio data;

encrypting the original audio data to obtain the audio data to be processed.

The embodiment of the application also provides an audio data transfer method, which comprises the following steps:

Acquiring audio data to be transcribed; the audio data to be transcribed is target audio data generated by any implementation mode of the audio data generation method provided by the embodiment of the application;

extracting actual identity information corresponding to the audio data to be transcribed from the audio data to be transcribed;

determining whether the audio data to be transcribed is legal audio data or not according to the actual identity information corresponding to the audio data to be transcribed;

and when the audio data to be transcribed is determined to be legal audio data, conducting transcription processing on the audio data to be transcribed to obtain characters corresponding to the audio data to be transcribed.

In one possible implementation manner, if the audio data to be transcribed includes N pieces of second audio data, the extracting, from the audio data to be transcribed, actual identity information corresponding to the audio data to be transcribed includes:

extracting actual identity information corresponding to the kth second audio data from the kth second audio data; wherein k is a positive integer, k is less than or equal to N, and N is a positive integer;

generating the actual identity information corresponding to the audio data to be transcribed according to the actual identity information corresponding to the 1 st second audio data to the actual identity information corresponding to the N second audio data.

In one possible embodiment, the method further comprises:

acquiring theoretical identity information corresponding to the audio data to be transcribed;

the determining whether the audio data to be transcribed is legal audio data according to the actual identity information corresponding to the audio data to be transcribed comprises:

matching the actual identity information corresponding to the audio data to be transcribed with the theoretical identity information corresponding to the audio data to be transcribed to obtain an identity matching result corresponding to the audio data to be transcribed;

and determining whether the audio data to be transcribed is legal audio data or not according to an identity matching result corresponding to the audio data to be transcribed.

In a possible implementation manner, if the number of the audio data to be transcribed is M, the obtaining theoretical identity information corresponding to the audio data to be transcribed includes:

generating theoretical identity information corresponding to the (m+1) th audio data to be transcribed according to the (m) th audio data to be transcribed and the theoretical identity information corresponding to the (m) th audio data to be transcribed; wherein M is a positive integer, M is less than or equal to M-1, and M is a positive integer; the recording time corresponding to the mth audio data to be transcribed is earlier than the recording time corresponding to the (m+1) th audio data to be transcribed; the theoretical identity information corresponding to the 1 st audio data to be transcribed is determined according to the audio basic identity information corresponding to the M audio data to be transcribed.

In a possible implementation manner, the generating theoretical identity information corresponding to the (m+1) th audio data to be transcribed according to the (m) th audio data to be transcribed and the theoretical identity information corresponding to the (m) th audio data to be transcribed includes:

generating a second updating rule according to the m-th audio data to be transcribed;

and updating the theoretical identity information corresponding to the m-th audio data to be transcribed according to the second updating rule to obtain theoretical identity information corresponding to the (m+1) -th audio data to be transcribed.

In one possible embodiment, when the mth audio data to be transferred includes N _m Second audio data, the theoretical identity information corresponding to the mth audio data to be transferredThe message includes N _m When the second updating rule includes a second sorting target, updating the theoretical identity information corresponding to the mth audio data to be transcribed according to the second updating rule to obtain theoretical identity information corresponding to the mth+1th audio data to be transcribed, including:

the N is set to _m And sequencing and adjusting the theoretical identity information corresponding to the second audio data according to the second sequencing target to obtain theoretical identity information corresponding to the (m+1) th audio data to be transcribed.

In one possible implementation manner, if the number of the audio data to be transcribed is M, the matching the actual identity information corresponding to the audio data to be transcribed with the theoretical identity information corresponding to the audio data to be transcribed to obtain an identity matching result corresponding to the audio data to be transcribed includes:

matching the actual identity information corresponding to the r audio data to be transcribed with the theoretical identity information corresponding to the r audio data to be transcribed to obtain an identity matching result corresponding to the r audio data to be transcribed; wherein r is a positive integer, r is less than or equal to M, and M is a positive integer;

the determining whether the audio data to be transcribed is legal audio data according to the identity matching result corresponding to the audio data to be transcribed comprises:

if the identity matching results corresponding to the M pieces of audio data to be transcribed all show that the matching is successful, determining that the M pieces of audio data to be transcribed are legal audio data;

if at least one identity matching result corresponding to the M pieces of audio data to be transcribed shows that matching fails, determining that the M pieces of audio data to be transcribed are illegal audio data.

In a possible implementation manner, the performing a transcription process on the audio data to be transcribed to obtain a text corresponding to the audio data to be transcribed includes:

Extracting audio data to be decrypted corresponding to the audio data to be transcribed from the audio data to be transcribed;

decrypting the audio data to be decrypted corresponding to the audio data to be transcribed to obtain the decrypted audio data corresponding to the audio data to be transcribed;

and decrypting the audio data corresponding to the audio data to be transcribed to obtain the text corresponding to the audio data to be transcribed.

The embodiment of the application also provides an audio data generating device, which comprises:

the first acquisition unit is used for acquiring the audio data to be processed and the identity information of the audio data to be processed;

the data generation unit is used for generating target audio data according to the audio data to be processed and the identity information of the audio data to be processed, so that the target audio data carries the audio data to be processed and the identity information of the audio data to be processed.

The embodiment of the application also provides an audio data transfer device, which comprises:

the second acquisition unit is used for acquiring the audio data to be transcribed; the audio data to be transcribed is target audio data generated by any implementation mode of the audio data generation method provided by the embodiment of the application;

The information extraction unit is used for extracting actual identity information corresponding to the audio data to be transcribed from the audio data to be transcribed;

the validity determining unit is used for determining whether the audio data to be transcribed is legal audio data or not according to the actual identity information corresponding to the audio data to be transcribed;

and the audio transcription unit is used for carrying out transcription processing on the audio data to be transcribed when the audio data to be transcribed is determined to be legal audio data, so as to obtain characters corresponding to the audio data to be transcribed.

Based on the technical scheme, the application has the following beneficial effects:

in the audio data generation method, firstly, the audio data to be processed and the identity information of the audio data to be processed are acquired, and then, the target audio data is generated according to the audio data to be processed and the identity information of the audio data to be processed, so that the target audio data carries the audio data to be processed and the identity information of the audio data to be processed. The identity information can represent that the audio information carried by the target audio data is legal, so that the subsequent transfer equipment can determine that the target audio data is legal according to the identity information carried by the target audio data, validity screening of the audio data can be realized in the transfer equipment, the transfer equipment can transfer the legal audio data only, illegal audio data is not needed, time consumed by the transfer equipment for transferring the illegal audio data can be saved, the transfer equipment can timely transfer the legal audio data, and transfer instantaneity of the transfer equipment, especially transfer instantaneity of the legal audio data can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is an application scenario schematic diagram of an audio data generating method applied to a terminal device according to an embodiment of the present application;

fig. 2 is an application scenario schematic diagram of an audio data generating method applied to a server according to an embodiment of the present application;

fig. 3 is a flowchart of an audio data generating method according to an embodiment of the present application;

fig. 4 is a schematic diagram of audio to be stored and audio data to be processed according to an embodiment of the present application;

fig. 5 is a schematic diagram of generating identity information of T pieces of audio data to be processed according to an embodiment of the present application;

fig. 6 is a schematic diagram of a preset position of first audio data according to an embodiment of the present application;

fig. 7 is a schematic diagram of second audio data provided in an embodiment of the present application;

fig. 8 is a schematic diagram of target audio data according to an embodiment of the present application;

Fig. 9 is a flowchart of an audio data transfer method according to an embodiment of the present application;

fig. 10 is a schematic diagram of generating actual identity information corresponding to audio data to be transcribed according to an embodiment of the present application;

fig. 11 is a schematic view of an application scenario provided in an embodiment of the present application;

FIG. 12 is a flowchart of generating and storing audio data in a recording pen provided by an embodiment of the present application;

fig. 13 is a schematic structural diagram of an audio data generating device according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of an audio data transfer device according to an embodiment of the present application.

Detailed Description

In order to facilitate understanding of the technical solutions provided in the embodiments of the present application, some technical terms are described below.

The recording device refers to a terminal device having an audio recording function. In addition, the embodiment of the application is not limited to the recording device, and for example, the recording device may be a smart phone, a computer, a personal digital assistant (Personal Digital Assitant, PDA), a tablet computer, a recording pen, or the like.

The transfer device refers to a device having a function of transferring audio into text. In addition, the embodiment of the present application is not limited to the transfer apparatus, and for example, the transfer apparatus may be a server or a terminal apparatus. The terminal device may be a smart phone, a computer, a personal digital assistant (Personal Digital Assitant, PDA), a tablet computer, a sound recording pen, or the like.

The audio sampling point refers to a frame of audio directly recorded by the recording device in the recording process.

The audio information refers to sound information recorded in audio. For example, audio information for an audio sample point may refer to sound information recorded in the audio sample point.

Legal audio data refers to audio data carrying audio information recorded by legal recording equipment; and the legal recording device has the transfer rights granted by the transfer device.

The illegal audio data is audio data carrying audio information recorded by an illegal recording device; and the illegal recording device does not have the transfer rights granted by the transfer device.

In the research of audio transcription, the inventor finds that, for a transcription device (such as a transcription server or a transcription terminal, etc.), after the transcription device receives audio data, the transcription device does not distinguish whether the audio data is legal audio data or not, but directly transcribes the audio data, so that the transcription device needs to transcribe not only legal audio data but also illegal audio data, thus causing the transcription device to waste a great deal of time to transcribe illegal audio data, and leading the transcription device with limited transcription capability to be unable to timely transcribe the legal audio data, thereby leading to lower transcription instantaneity of the legal audio data.

In order to solve the above technical problems, an embodiment of the present application provides an audio data generating method, which includes: firstly, acquiring to-be-processed audio data and identity information of the to-be-processed audio data, and then generating target audio data according to the to-be-processed audio data and the identity information of the to-be-processed audio data, so that the target audio data carries the to-be-processed audio data and the identity information of the to-be-processed audio data.

Therefore, the identity information can represent that the audio information carried by the target audio data is legal, so that the subsequent transfer equipment can determine that the target audio data is legal according to the identity information carried by the target audio data, validity screening of the audio data can be realized in the transfer equipment, the transfer equipment can transfer the legal audio data only, illegal audio data is not needed, the time consumed by the transfer equipment for transferring the illegal audio data can be saved, the transfer equipment can transfer the legal audio data timely, and the transfer instantaneity of the transfer equipment, especially the transfer instantaneity of the legal audio data can be improved.

In addition, the embodiment of the present application does not limit the execution subject of the audio data generation method, and for example, the audio data generation method provided in the embodiment of the present application may be applied to a data processing device such as a terminal device or a server. The terminal device may be a smart phone, a computer, a personal digital assistant (Personal Digital Assitant, PDA), a tablet computer, a sound recording pen, or the like. The servers may be stand alone servers, clustered servers, or cloud servers.

In order to facilitate understanding of the technical solutions provided in the embodiments of the present application, an application scenario of the audio data generating method provided in the embodiments of the present application is described below by way of example with reference to fig. 1 and fig. 2, respectively. Fig. 1 is an application scenario schematic diagram of an audio data generating method applied to a terminal device according to an embodiment of the present application; fig. 2 is an application scenario schematic diagram of an audio data generating method applied to a server according to an embodiment of the present application.

In the application scenario shown in fig. 1, when the user 101 triggers an audio recording request on the terminal device 102, the terminal device 102 receives the audio recording request, records audio to be stored, obtains the audio data to be processed and identity information of the audio data to be processed according to the audio to be stored, and generates target audio data according to the audio data to be processed and the identity information of the audio data to be processed, so that the target audio data carries the audio data to be processed and the identity information of the audio data to be processed, and finally the target audio data can be stored and displayed as audio storage data corresponding to the audio to be stored.

In the application scenario shown in fig. 2, when the user 201 triggers an audio recording request on the terminal device 202, the terminal device 202 receives the audio recording request, records audio to be stored, and sends the audio to be stored to the server 203, so that the server 203 can obtain the audio data to be processed and identity information of the audio data to be processed according to the audio to be stored, and then generate target audio data according to the audio data to be processed and the identity information of the audio data to be processed, so that the target audio data carries the audio data to be processed and the identity information of the audio data to be processed, and finally the target audio data can be sent to the terminal device 202 as storage data corresponding to the audio to be stored, so that the terminal device 202 stores and displays the storage data corresponding to the audio to be stored.

It should be noted that, the audio data generating method provided in the embodiment of the present application may be applied not only to the application scenario shown in fig. 1 or fig. 2, but also to other application scenarios where audio data generation is required, which is not specifically limited in the embodiment of the present application.

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Method embodiment one

Referring to fig. 3, a flowchart of an audio data generating method according to an embodiment of the present application is shown.

The audio data generation method provided by the embodiment of the application comprises S301-S303:

s301: and acquiring audio data to be processed.

The audio data to be processed carries audio information recorded by the recording device.

In some cases, the audio data to be processed may include at least one first audio data, and the first audio data may include at least one audio sample data. It should be noted that the embodiment of the present application is not limited to audio sampling data, for example, one audio sampling data may refer to one audio sampling point, and may refer to one audio sampling point after encryption processing. In addition, the embodiment of the present application also does not limit the number of audio sample data in the first audio data, for example, the first audio data may include 128 audio sample data. Further, the above-described "encryption processing" is also referred to as "encryption processing" in S3012 below.

In some cases, the audio data to be processed may be generated from audio to be stored that is recorded by the recording device. The audio to be stored refers to audio directly recorded by the recording device after a user triggers a recording request on the recording device.

In addition, the embodiment of the present application is not limited to the generation manner of the audio data to be processed (i.e., the embodiment of S301), and for convenience of understanding, two possible embodiments are described below.

In a first possible implementation manner, S301 may specifically be: at least one audio data to be processed is extracted from the audio to be stored.

It should be noted that, the embodiment of the present application does not limit the extraction process of the audio data to be processed, and for ease of understanding, the following description will be made with reference to two cases.

In case 1, the extraction of the audio data to be processed may be performed during the recording process of the audio to be stored, which may specifically be: in the recording process of the audio to be stored, when a preset number of audio sampling points are collected, audio data to be processed can be directly generated according to the preset number of audio sampling points.

The preset number is preset, and the embodiment of the application is not limited to the preset number. For example, the preset number may be 1280.

As an example, when the audio to be stored includes 1280×t audio sampling points; the recording time of the w audio sampling point is earlier than that of the w+1th audio sampling point, w is a positive integer, and w+1 is less than or equal to 1280 multiplied by T; and when the preset number is 1280, S301 may specifically be: in the recording process of the audio to be stored, when the 1280 th audio sampling point is recorded, generating the 1 st audio data to be processed according to the 1 st audio sampling point to the 1280 th audio sampling point; when 2560 audio sampling points are recorded, generating 2 nd audio data to be processed according to 1281 st audio sampling points to 2560 th audio sampling points; … … (and so on); when 1280×T audio sampling points are recorded, generating the T-th audio data to be processed according to the 1280× (T-1) +1 audio sampling points to the 1280×T audio sampling points.

It should be noted that, the embodiment of the present application is not limited to the generation manner of the jth audio data to be processed, for example, the set from the 1280× (j-1) +1 audio sampling point to the 1280×j audio sampling point may be directly determined as the jth audio data to be processed. For another example, the 1280× (j-1) +1 audio sampling points to the 1280×j audio sampling points may be divided into 10 groups to obtain 10 first audio data, so that each first audio data includes 128 audio sampling points adjacent to each other in recording time; the set of 10 first audio data is then determined as the j-th audio data to be processed (from the 1 st audio data to the T-th audio data to be processed as shown in fig. 4). Wherein j is a positive integer, and j is less than or equal to T.

Based on the related content of the above case 1, in some cases, the audio to be stored may be recorded while extracting the audio data to be processed corresponding to the audio to be stored, which is beneficial to implementing the recording process of the audio to be stored and the data processing process to be able to be processed in parallel, which is beneficial to improving the generation efficiency of the audio data.

In case 2, the audio data to be processed may be extracted from the audio to be stored after the audio to be stored is recorded, which may specifically be: after the audio to be stored is obtained, the stored audio is segmented according to a first division rule, and at least one piece of audio data to be processed is obtained.

The first dividing rule is a preset audio segmentation rule; moreover, the embodiment of the present application does not limit the first division rule. To facilitate understanding of the first partitioning rule, the following description is made in connection with an example.

As an example, when the audio to be stored includes 1280×t audio sampling points; the recording time of the w audio sampling point is earlier than that of the w+1th audio sampling point, w is a positive integer, and w+1 is less than or equal to 1280 multiplied by T; when each piece of audio data to be processed includes 10 pieces of first audio data and each piece of first audio data includes 128 audio sampling points, S301 may specifically be: firstly, primarily dividing audio to be stored by taking 128 audio sampling points as a dividing unit to obtain 10 xT first audio data which are ordered according to the recording time (for example, ordered from high to low or ordered from low to high); and dividing the 10×t first audio data again by using the 10 first audio data as a dividing unit to obtain T pieces of audio data to be processed (as shown in fig. 4).

Based on the above-mentioned related content of the case 2, in some cases, the audio data to be processed may be extracted from the audio to be stored that has been recorded by audio division, so that the recording process of the audio to be stored is separated from the processing process of the audio to be stored, so that the data processing can be performed on the audio to be stored at any time after the audio to be stored is recorded, which is beneficial to improving the convenience of the data processing process of the audio to be stored.

Based on the above-mentioned related content of the first possible implementation manner of S301, in some cases, the audio data to be processed may be directly extracted from the audio to be stored, so that the audio information recorded in the audio to be stored can be accurately carried by the extracted audio data to be processed.

In some cases, in order to improve storage security of audio to be stored, audio information carried by the audio to be stored may be encrypted. Based on this, the present application embodiment also provides a second possible implementation manner of S301, which specifically includes S3011-S3012:

s3011: raw audio data is acquired.

Wherein, the original audio data refers to audio data directly extracted from audio to be stored.

It should be noted that the extraction process of the original audio data is similar to the extraction process in the first possible embodiment of S301 provided above, and the relevant content is referred to the first possible embodiment of S301 above.

S3012: and encrypting the original audio data to obtain the audio data to be processed.

According to the method and the device for encrypting the audio data, the original audio data can be encrypted by using the preset encryption algorithm, and the encrypted original audio data are determined to be the audio data to be processed, so that the storage safety of the audio to be stored is improved.

In addition, in some cases, the original audio data may be encrypted by using a preset encryption algorithm, so that the data amount of the encrypted original audio data is far smaller than that of the original audio data, so that after the encrypted original audio data is determined to be the audio data to be processed, the data amount of the audio data to be processed is also far smaller than that of the original audio data, which is beneficial to reducing the storage space requirement of the audio to be stored.

It should be noted that, the embodiment of the present application is not limited to the preset encryption algorithm, and the preset encryption algorithm may be set according to an application scenario. In one possible implementation, the preset encryption algorithm may be Sub-band coding (sbc), advanced audio coding (Advanced Audio Coding, AAC), or the like.

As an example, when the original audio data includes 128 audio sampling points and the preset encryption algorithm is sbc encoding, S3012 may specifically be: the 128 audio sampling points are encoded by sbc encoding, 128 bytes of encoded data are obtained, and the 128 bytes of encoded data are determined to be the audio data to be processed corresponding to the original audio data.

Based on the above-mentioned content related to S3011 to S3012, in some cases, at least one piece of original audio data may be extracted from the audio to be stored, so that the original audio data includes a plurality of audio sampling points; and then, each piece of original audio data is encrypted according to a preset encryption algorithm, and each piece of encrypted original audio data is respectively determined to be each piece of audio data to be processed, so that each piece of audio data to be processed can carry audio information recorded in the audio to be stored in a safe mode, and the storage safety of the audio to be stored is improved.

Based on the above-mentioned related content of S301, the to-be-stored audio data corresponding to the to-be-stored audio may be generated according to the to-be-stored audio recorded by the recording device, so that the identity information adding process and the data storing process of the to-be-stored audio may be implemented based on the to-be-stored audio data.

It should be noted that, the embodiment of the present application is not limited to the number of audio data to be processed, for example, the number of audio data to be processed is T; wherein T is a positive integer. In addition, if the audio data to be processed is generated according to the audio to be stored, the number of the audio data to be processed can be determined according to the total number of the audio sampling points in the audio to be stored.

S302: and acquiring the identity information of the audio data to be processed.

The identity information of the audio data to be processed is used for describing the identity of the audio information carried by the audio data to be processed (such as recording equipment information, recording equipment user information, transfer authorization related information and the like); moreover, the identity information of the audio data to be processed can be used for proving the validity of the audio information carried by the audio data to be processed.

In addition, the embodiment of the present application does not limit the process of acquiring the identity information of the audio data to be processed, and for convenience of understanding, a possible implementation of S302 is described below as an example.

In one possible implementation manner, when the number of the audio data to be processed is T, and the recording time corresponding to the T th audio data to be processed is earlier than the recording time corresponding to the t+1st audio data to be processed; when T is a positive integer and T is equal to or less than T-1, as shown in fig. 5, the identity information generating process of the T pieces of audio data to be processed may specifically be: generating identity information of the 1 st audio data to be processed according to the audio basic identity information corresponding to the T audio data to be processed; generating identity information of the 2 nd audio data to be processed according to the 1 st audio data to be processed and the identity information of the 1 st audio data to be processed; generating identity information of the 3 rd audio data to be processed according to the 2 nd audio data to be processed and the identity information of the 2 nd audio data to be processed; … … (and so on); and generating the identity information of the T-th audio data to be processed according to the T-1-th audio data to be processed and the identity information of the T-1-th audio data to be processed.

The recording time corresponding to the t-th audio data to be processed is used for describing the recording time of the audio information carried by the t-th audio data to be processed (i.e., the audio sampling point corresponding to the t-th audio data to be processed). For example, as shown in fig. 4, if the t-th audio data to be processed is generated according to the 1280× (t-1) +1 audio sampling points to the 1280×t audio sampling points in the audio to be stored, the recording time corresponding to the t-th audio data to be processed may be used to describe the 1280×inthe audio to be stored

(t-1) +1 audio sample points to 1280×t audio sample points.

The audio basic identity information corresponding to the T pieces of audio data to be processed may be generated in advance or set in advance according to an application scenario.

In some cases, the audio basic identity information corresponding to the T audio data to be processed may be used to represent a correspondence (i.e., a binding relationship) between the recording device and a user thereof, so that the subsequent transcription device can determine the audio basic identity information according to the correspondence between the recording device and the user thereof. Based on this, the embodiment of the present application further provides an implementation process for obtaining audio basic identity information corresponding to T pieces of audio data to be processed, which may specifically be: and generating audio basic identity information corresponding to the T pieces of audio data to be processed according to the user identity identifiers corresponding to the T pieces of audio data to be processed and the product serial numbers corresponding to the T pieces of audio data to be processed.

The user identity identifiers corresponding to the T pieces of audio data to be processed are used for uniquely identifying identities of users who use the recording equipment to record audio information carried by the T pieces of audio data to be processed. In addition, the embodiment of the application is not limited to the user identity, for example, the user identity may be a device login account corresponding to the recording device, user identity card information (such as an identity card number, an identity card copy, etc.), user body identification information (such as a face, a voiceprint, a fingerprint, etc.), and personalized identity information set by the user (such as a question-answer dialogue, etc.).

The product serial numbers corresponding to the T pieces of audio data to be processed refer to the equipment identifiers of the recording equipment for recording the audio information carried by the T pieces of audio data to be processed. For example, the product serial number corresponding to the T audio data to be processed may be a1234Y20201207.

In addition, the embodiment of the application is not limited to the generation process of the audio basic identity information corresponding to the T pieces of audio data to be processed, and for example, the generation process may specifically be: and fusing the user identity identifiers corresponding to the T pieces of audio data to be processed and the product serial numbers corresponding to the T pieces of audio data to be processed by using a preset fusion algorithm to obtain fusion character strings with preset lengths, and determining the fusion character strings as audio basic identity information corresponding to the T pieces of audio data to be processed. The preset length may be set according to an application scenario, for example, the preset length is 10.

It should be noted that, the embodiment of the present application is not limited to a preset fusion algorithm, and may be implemented by any method capable of fusing the user identity identifiers corresponding to the T pieces of audio data to be processed and the product serial numbers corresponding to the T pieces of audio data to be processed into a fusion string with a preset length.

Based on the related content of the audio basic identity information, in the embodiment of the present application, the user identity identifiers corresponding to the T pieces of audio data to be processed and the product serial numbers thereof may be fused to obtain the audio basic identity information corresponding to the T pieces of audio data to be processed, so that the audio basic identity information may represent a corresponding relationship between the recording device corresponding to the T pieces of audio data to be processed and the recording device user, so that the subsequent transfer device may determine whether the T pieces of audio data to be processed belong to legal audio data based on the corresponding relationship. It should be noted that, in the embodiment of the present application, the execution subject of the generating process of the audio basic identity information corresponding to the T pieces of audio data to be processed is not limited, and may be generated by a transcription device, for example.

In addition, the embodiment of the application does not limit the generation process of the identity information of the 1 st audio data to be processed. For example, the identity information of the audio basis corresponding to the T pieces of audio data to be processed may be directly determined as the identity information of the 1 st piece of audio data to be processed.

Furthermore, as can be seen from fig. 5 and the above description, for the identity information of the (t+1) -th audio data to be processed (i.e., the non-first audio data to be processed in the (T) -th audio data to be processed), the determination can be made according to the (T) -th audio data to be processed and the identity information of the (T) -th audio data to be processed. Wherein T is a positive integer, T is less than or equal to T-1, and T is a positive integer.

In order to facilitate understanding of the generation process of the identity information of the t+1th audio data to be processed, a possible implementation will be described below as an example.

In a possible implementation manner, the generation process of the identity information of the (t+1) th audio data to be processed includes the following steps 11-12:

step 11: and generating a first updating rule according to the t-th audio data to be processed.

The first updating rule is used for describing an adjusting rule according to which the identity information of the t-th audio data to be processed is needed to be adjusted.

The embodiment of the application does not limit the first updating rule, for example, when the t th audio data to be processed includes N _t First audio data, and the identity information of the t th audio data to be processed comprises N _t When the identity information of the first audio data is the identity information of the second audio data, the second updating rule may include a second sorting target, where the second sorting target is used to describe a target sorting sequence number (i.e., a sorting sequence number occupied after adjustment by the second updating rule) of the identity information of each second audio data in the t-th audio data to be processed. Wherein N is _t Is a positive integer.

That is, the first sorting objectiveMay be [ Q ] ₁ ，Q ₂ ，……，Q _N ]. Wherein Q is _h Representing the sequence number occupied by the identity information of the h first audio data in the t audio data to be processed after being adjusted by using a first updating rule; h is a positive integer, and h is less than or equal to N. For example, when the arrangement order of 10 identity information is "abc123456f", and the first ordering target is [5,6,8,9,0,1,2,3,4,7 ]]And if so, the 10 identity information subjected to sequencing adjustment according to the first sequencing target is 346fabc 125.

In addition, the embodiment of the present application does not limit the generation process of the first update rule, and in a possible implementation manner, when the t-th audio data to be processed includes N _t When the first audio data and the first update rule include the first ordering target, the N may be first determined by a predetermined hash algorithm _t Hash operation is carried out on the first audio data to obtain N _t N corresponding to the first audio data _t Hash values such that N is _t The hash value is from 1 to N _t A positive integer therebetween; and then put the N _t A set of hash values is determined as a first ordering target and a first update rule is determined based on the first ordering target (e.g., directly determining the first ordering target as a first update rule) to enable the first update rule to describe the N _t And the adjustment rule is needed to be based on when the arrangement sequence of the identity information of the first audio data is subjected to sequencing adjustment.

Based on the above-mentioned related content of step 11, after the t-th audio data to be processed is obtained, a first update rule may be generated according to the t-th audio data to be processed, so that the first update rule may accurately describe an adjustment rule according to which the identity information of the t-th audio data to be processed needs to be adjusted, so that the identity information of the t-th audio data to be processed may be adjusted by using the first update rule.

Step 12: updating the identity information of the t-th audio data to be processed according to a first updating rule to obtain the identity information of the t+1th audio data to be processed.

In the embodiment of the application, when obtainingAfter the first updating rule, the identity information of the t-th audio data to be processed can be adjusted according to the first updating rule to obtain the identity information of the t+1th audio data to be processed; and the adjustment process specifically can be: when the t th audio data to be processed includes N _t The identity information of the t th audio data to be processed comprises N _t When the identity information of the first audio data and the first updating rule includes the first sorting objective, the N can be determined by _t And sequencing and adjusting the identity information of the first audio data according to the first sequencing target to obtain the identity information of the (t+1) th audio data to be processed. For example, when the identity information of the t-th audio data to be processed is "abc123456f" and the first ordering target is [5,6,8,9,0,1,2,3,4,7 ]]And when the identification information of the t+1st audio data to be processed is 346fabc125".

Based on the above-mentioned related content of step 11 to step 12, the identity information of the (t+1) -th audio data to be processed may be generated according to the (t) -th audio data to be processed and the identity information of the (t) -th audio data to be processed, so that the identity information of the (t+1) -th audio data to be processed may simultaneously carry the audio information carried by the (t) -th audio data to be processed and the corresponding identity information thereof.

Based on the above-mentioned related content of S302, when the number of audio data to be processed is T, the identity information of the 1 st audio data to be processed may be determined according to the audio basic identity information corresponding to the T audio data to be processed, so that the identity information of the 1 st audio data to be processed carries a correspondence between the recording device and the user of the recording device; generating the identity information of the (t+1) th audio data to be processed according to the (t) th audio data to be processed and the identity information of the (t) th audio data to be processed, so that the identity information of the (t+1) th audio data to be processed carries the audio information of the (t) th audio data to be processed and the corresponding identity information thereof; t is a positive integer, T is less than or equal to T-1, and T is a positive integer.

S303: and generating target audio data according to the audio data to be processed and the identity information of the audio data to be processed, so that the target audio data carries the audio data to be processed and the identity information of the audio data to be processed.

The target audio data refer to storage data corresponding to the audio data to be processed; and the target audio data carries the audio data to be processed and the identity information of the audio data to be processed.

In addition, the embodiment of the application does not limit the generation process of the target audio data, for example, the audio data to be processed and the identity information of the audio data to be processed may be directly spliced to obtain the target audio data.

In some cases, if the identity information carried by the target audio data is integrally added, the illegal molecule can easily determine the structure of the target audio data by simply checking and analyzing the target audio data, so that the illegal molecule can package the illegal audio data by utilizing the structure of the target audio data to disguise the illegal audio data into legal audio data and attack the transcription equipment by utilizing the disguised legal audio data, and therefore, the transcription equipment needs to spend a great deal of time to transcribe the disguised legal audio data, and further, the transcription equipment cannot transcribe the real legal audio data timely.

Based on this, in order to further improve the structural security of the target audio data, the identity information may be added in a hashed (i.e., broken-up insertion) manner during the generation of the target audio data. Based on this, the embodiment of the present application further provides a possible implementation manner of generating the target audio data (that is, S303), in this implementation manner, when the audio data to be processed includes N pieces of first audio data, and the identity information of the audio data to be processed includes N pieces of identity information of the first audio data, S303 may specifically include S3031 to S3032:

S3031: generating the ith second audio data according to the ith first audio data and the identity information of the ith first audio data, so that the ith second audio data carries the ith first audio data and the identity information of the ith first audio data; wherein i is a positive integer, i is less than or equal to N, and N is a positive integer.

In this embodiment of the present invention, after the i first audio data and the identity information of the i first audio data are obtained, the i first audio data and the identity information of the i first audio data may be integrated to obtain the i second audio data, so that the i second audio data may include the i first audio data and the i first audio data.

In addition, the embodiment of the present application is not limited to the information integration manner, and may be implemented by any information integration method that is currently or future to be used for information integration. In addition, for ease of understanding, one possible embodiment is described below as an example.

In one possible implementation, S3031 may specifically be: and adding the identity information of the ith first audio data to a preset position of the ith first audio data to obtain the ith second audio data.

The preset position can be preset according to an application scene. For example, the preset position may be a front position of the ith first audio data, a rear position of the ith first audio data, or a position between any two adjacent characters in the ith first audio data.

The prepositions of the ith first audio data refer to the character positions before the positions of the first characters in the ith first audio data. For example, as shown in FIG. 6, when the ith first audio data is "C ₁ C ₂ ……C _W "and the identity information of the ith first audio data includes 1 character, then the prefix of the ith first audio data is" C "in FIG. 6 _B "located position".

The post position of the i-th first audio data refers to a character position located after the position of the last character in the i-th first audio data. For example, as shown in FIG. 6, when the ith first audio data is "C ₁ C ₂ ……C _W When the identity information of the ith first audio data comprises 1 character, the post position of the ith first audio data is"C" in FIG. 6 _A "located position".

Based on the above-mentioned related content of S3031, after the identity information of the ith first audio data and the ith first audio data is obtained, the identity information of the ith first audio data may be added to the preset position of the ith first audio data to obtain the ith second audio data, so that the ith second audio data carries the identity information of the ith first audio data and the ith first audio data. For example, as shown in fig. 7, when the identity information of the ith first audio data is "3", the ith first audio data is "128b_a0", and the preset position is the front position, the ith second audio data may be "3128b_a0". Wherein i is a positive integer, i is less than or equal to N, and N is a positive integer.

S3032: and obtaining target audio data according to the 1 st second audio data to the N th second audio data.

In this embodiment, if the audio data to be processed includes N first audio data, after acquiring the 1 st second audio data corresponding to the 1 st first audio data, the 2 nd second audio data corresponding to the 2 nd first audio data, … …, and the nth second audio data corresponding to the nth first audio data, the 1 st second audio data is directly spliced to the nth second audio data to obtain the target audio data. For example, as shown in fig. 8, when the 1 st second audio data is "3128b_a0", the 2 nd second audio data is "4128b_a2", … …, and the nth second audio data is "5128b_a9", the target audio data is "3128b_a0418b_a2 … … 5128b_a9".

Based on the above-mentioned content related to S3031 to S3032, after the to-be-processed audio data and the identity information of the to-be-processed audio data are obtained, the identity information of the to-be-processed audio data may be hashed into the to-be-processed audio data to obtain the target audio data. The identity information carried by the target audio data is added in a hash mode, so that the structure of the target audio data is complex, illegal molecules cannot roughly analyze the structure of the target audio data, illegal audio cannot be disguised into legal audio data by the illegal molecules, illegal audio attack on the transfer equipment can be effectively avoided, and timely transfer of the legal audio data by the transfer equipment can be ensured.

In addition, after the target audio data is generated, the target audio data may be stored, or the target audio data may be used (for example, playing the target audio data or transferring the target audio data), which is not specifically limited in the embodiment of the present application.

Based on the above-mentioned related content of S301 to S303, in the audio data generating method provided in the embodiment of the present application, the audio data to be processed and the identity information of the audio data to be processed are obtained first, and then the target audio data is generated according to the audio data to be processed and the identity information of the audio data to be processed, so that the target audio data carries the audio data to be processed and the identity information of the audio data to be processed. The identity information can represent that the audio information carried by the target audio data is legal, so that the subsequent transfer equipment can determine that the target audio data is legal according to the identity information carried by the target audio data, validity screening of the audio data can be realized in the transfer equipment, the transfer equipment can transfer the legal audio data only, illegal audio data is not needed, time consumed by the transfer equipment for transferring the illegal audio data can be saved, the transfer equipment can timely transfer the legal audio data, and transfer instantaneity of the transfer equipment, especially transfer instantaneity of the legal audio data can be improved.

Based on the audio data generating method provided by the above method embodiment, the present application further provides an audio data transferring method, which is explained and illustrated below with reference to the accompanying drawings.

Method embodiment II

Referring to fig. 9, a flowchart of an audio data transfer method according to an embodiment of the present application is shown.

The audio data transfer method provided by the embodiment of the application comprises the following steps of S901-S904:

s901: and obtaining the audio data to be transcribed.

Wherein the audio data to be transcribed is by using the aboveMethod embodiment oneTarget audio data generated by any one of the embodiments of the medium audio data generation method.

The audio data to be transcribed carries the actual identity information corresponding to the audio data to be transcribed. The actual identity information corresponding to the audio data to be transcribed is used for describing the identity of the audio information carried by the audio data to be transcribed (e.g., recording equipment information, recording equipment user information, transcription authorization related information, etc.).

The audio data to be transcribed may comprise at least one second audio data. Wherein the second audio data comprises at least one audio sample data and identity information of the at least one audio sample data. It should be noted that, please refer to the related content of the "audio sample data" in the above S301; also, the relevant content of the second audio data is referred to as "relevant content of the second audio data" in the above S303.

In addition, the embodiment of the application does not limit the number of audio data to be transcribed, for example, the number of audio data to be transcribed is M. Wherein M is a positive integer.

In addition, the embodiment of the application is not limited to the method for acquiring the audio data to be transcribed, for example, the transcription device may receive the audio data to be transcribed sent by other devices, and may also read the audio data to be transcribed from the designated storage space.

In some cases, the target user may select a stored audio data including a plurality of audio data to be transcoded for audio transcription, so the audio data to be transcoded may be determined from the stored audio data. Based on this, the embodiment of the present application further provides a possible implementation manner of obtaining the audio data to be transcribed (i.e. S901), which may specifically include S9011-S9012:

s9011: the stored audio data is retrieved from the target storage location.

The target storage location refers to an actual storage location of audio data designated by a user to be transcribed. In addition, the embodiments of the present application do not limit the target storage location, for example, the target storage location may be located in a storage space in the terminal device that triggers the audio transcription request.

Stored audio data may refer to utilizing the aboveMethod embodiment oneAny embodiment of the method for generating the audio data of the audio to be stored processes the stored data corresponding to the audio to be stored, and the stored audio data can comprise target audio data corresponding to the audio to be stored.

Based on the above-mentioned related content of S9011, in the embodiment of the present application, when a user wants to transfer one stored audio data, an audio transfer request may be triggered first, so that the audio transfer request is used to request to transfer the stored audio data; and acquiring the stored audio data from the target storage position according to the audio transcription request so as to be capable of carrying out transcription processing on the stored audio data subsequently.

S9012: dividing the stored audio data according to a second dividing rule to obtain at least one audio data to be transcribed.

The second division rule may be preset, or may be generated according to the structure of the above target audio data. For example, if one target audio data includes D characters, the second division rule may be set to have D characters as one division unit.

Based on the above-mentioned related content of S9012, after the stored audio data is obtained, the stored audio data may be divided according to the second division rule, so as to obtain at least one audio data to be transcribed. For example, when the stored audio data includes d×m characters and the second division rule is to use D characters as one division unit, the stored audio data may be divided according to the second division rule to obtain M audio data to be transcribed, so that each audio data to be transcribed includes D characters. Wherein D is a positive integer, and M is a positive integer.

Based on the above-mentioned related content of S9011 to S9012, when a user wants to transfer a stored audio data, the stored audio data may be read from the target storage location first, and then at least one audio data to be transferred may be extracted from the stored audio data, so that a transfer process of the stored audio data can be implemented based on the at least one audio data to be transferred.

S902: and extracting the actual identity information corresponding to the audio data to be transcribed from the audio data to be transcribed.

The actual identity information corresponding to the audio data to be transcribed is used for describing the identity of the audio information carried by the audio data to be transcribed.

The embodiment of the application does not limit the extraction process of the actual identity information, and only needs to ensure that the extraction process of the actual identity information corresponds to the insertion process of the identity information in the target audio data. To facilitate understanding of the extraction process of the actual identity information (i.e., S902), the following description is made in connection with an example.

As an example, when the identity information carried by the target audio data is added in a hashed manner, and the audio data to be transcribed includes N pieces of second audio data, S902 may specifically include S9021 to S9022:

s9021: extracting actual identity information corresponding to the kth second audio data from the kth second audio data; wherein k is a positive integer, k is less than or equal to N, and N is a positive integer.

In this embodiment, for the kth second audio data in the audio data to be transcribed, the actual identity information corresponding to the kth second audio data may be directly extracted from the kth second audio data, so that the actual identity information corresponding to the audio data to be transcribed may be generated based on the actual identity information corresponding to the kth second audio data.

It should be noted that, in the embodiment of the present application, the position of the actual identity information corresponding to the kth second audio data in the kth second audio data is not limited, for example, when the actual identity information corresponding to the kth second audio data is one character, the actual identity information corresponding to the kth second audio data may be located at the first character position, the middle character position, or the last character position in the kth second audio data.

S9022: generating the actual identity information corresponding to the audio data to be transcribed according to the actual identity information corresponding to the 1 st second audio data to the actual identity information corresponding to the N second audio data.

In this embodiment of the present application, after obtaining the actual identity information corresponding to each second audio data in the audio data to be transcribed, the actual identity information corresponding to the 1 st second audio data may be directly spliced to the actual identity information corresponding to the nth second audio data, to obtain the actual identity information corresponding to the audio data to be transcribed.

Based on the above-mentioned content related to S9021 to S9022, in this embodiment, as shown in fig. 10, after the audio data to be transcribed including N second audio data is obtained, actual identity information corresponding to the 1 st second audio data may be extracted from the 1 st second audio data, actual identity information corresponding to the 2 nd second audio data may be extracted from the 2 nd second audio data, … … (and so on), and actual identity information corresponding to the nth second audio data may be extracted from the nth second audio data; and splicing the actual identity information corresponding to the 1 st second audio data to the actual identity information corresponding to the N second audio data to obtain the actual identity information corresponding to the audio data to be transcribed.

Based on the above-mentioned related content of S902, in the embodiment of the present application, after the audio data to be transcribed is obtained, the actual identity information corresponding to the audio data to be transcribed may be directly extracted from the audio data to be transcribed, so that the validity of the audio information carried by the audio data to be transcribed may be determined based on the actual identity information.

S903: and determining whether the audio data to be transcribed is legal audio data or not according to the actual identity information corresponding to the audio data to be transcribed.

The embodiment of the application does not limit the determining process of whether the audio data to be transcribed is legal audio data, for example, whether the audio data to be transcribed is legal audio data can be determined by means of identity information comparison. Based on this, the embodiment of the application also provides a determining process of legal audio data, which specifically may include steps 21-23:

step 21: and acquiring theoretical identity information corresponding to the audio data to be transcribed.

The theoretical identity information corresponding to the audio data to be transcribed refers to standard identity information of the audio information carried by the audio data to be transcribed.

In addition, the generation process of the theoretical identity information corresponding to the audio data to be transcribed is similar to the generation process of the identity information of the audio data to be processed, so any implementation mode for generating the identity information of the audio data to be processed can be adopted for implementation. To facilitate an understanding of the theoretical identity information generation process, an example is described below.

As an example, when the number of audio data to be transcribed is M, the recording time corresponding to the mth audio data to be transcribed is earlier than the recording time corresponding to the (m+1) th audio data to be transcribed, and M is a positive integer, M is less than or equal to M-1, and M is a positive integer, the theoretical identity information corresponding to the (m+1) th audio data to be transcribed may be generated according to the theoretical identity information corresponding to the mth audio data to be transcribed and the mth audio data to be transcribed. The theoretical identity information corresponding to the 1 st audio data to be transcribed is determined according to the audio basic identity information corresponding to the M audio data to be transcribed.

The embodiment of the application does not limit the determination mode of the audio basic identity information corresponding to the M audio data to be transcribed. For ease of understanding, one possible embodiment is described below as an example.

In one possible implementation manner, the determining process of the audio basic identity information corresponding to the M audio data to be transcribed includes steps 31-32:

step 31: and obtaining at least one candidate identity information corresponding to the M pieces of audio data to be transcribed.

The method of obtaining the step 31 is not limited, for example, if the audio transfer request corresponding to the M audio data to be transferred is triggered by the target user, at least one candidate audio basic identity information corresponding to the user identity of the target user may be searched in the preset mapping relationship, and the candidate identity information corresponding to the M audio data to be transferred is determined.

The preset mapping relation comprises a corresponding relation between a user identity of a target user and at least one candidate audio basic identity information. In addition, each candidate audio basic identity information can be generated according to the user identity of the target user and the product serial number of each recording device owned by the target user; moreover, a correspondence exists between the user identity of the target user and the product serial number of the recording device owned by the target user.

For example, if user a owns recording device a, recording device B, and recording device C, the user identity of user a may correspond to the product serial number of recording device a, the product serial number of recording device B, and the product serial number of recording device C. At this time, first candidate audio basic identity information can be generated according to the user identity of the user A and the product serial number of the recording device A, second candidate audio basic identity information can be generated according to the user identity of the user A and the product serial number of the recording device B, and third candidate audio basic identity information can be generated according to the user identity of the user A and the product serial number of the recording device C; establishing a corresponding relation between the user identity of the user A and the first audio basic identity information, a corresponding relation between the user identity of the user A and the second audio basic identity information and a corresponding relation between the user identity of the user A and the third audio basic identity information; and finally, constructing a preset mapping relation according to the corresponding relation between the user identity of the user A and the first audio basic identity information, the corresponding relation between the user identity of the user A and the second audio basic identity information, and the corresponding relation between the user identity of the user A and the third audio basic identity information.

Based on the above-mentioned related content of step 31, in some cases, because the target user may have a plurality of recording devices, the target user corresponds to a plurality of candidate audio basic identity information, so that after the target user triggers the audio transcription requests corresponding to the M audio data to be transcribed, the plurality of candidate audio basic identity information corresponding to the target user may be determined as a plurality of candidate identity information corresponding to the M audio data to be transcribed, so that the audio basic identity information corresponding to the M audio data to be transcribed can be determined from the plurality of candidate identity information.

Step 32: and determining audio basic identity information corresponding to the M audio data to be transcribed according to the actual identity information corresponding to the 1 st audio data to be transcribed and at least one candidate identity information corresponding to the M audio data to be transcribed.

As an example, step 32 may specifically be: and respectively matching the actual identity information corresponding to the 1 st audio data to be transcribed with each candidate identity information corresponding to the M audio data to be transcribed, and determining the successfully matched candidate identity information as audio basic identity information corresponding to the M audio data to be transcribed.

Based on the related content in the steps 31 to 32, in some cases, because the target user may have a plurality of recording devices, at least one candidate identity information corresponding to the user identity may be determined according to the user identity of the target user; and screening out audio basic identity information corresponding to M pieces of audio data to be transcribed from the at least one candidate identity information according to the actual identity information corresponding to the 1 st audio data to be transcribed.

In addition, the embodiment of the application is not limited to the determination method of the theoretical identity information corresponding to the 1 st audio data to be transcribed, for example, the audio basic identity information corresponding to the M audio data to be transcribed may be directly determined as the theoretical identity information corresponding to the 1 st audio data to be transcribed.

In addition, in the embodiment of the present application, the generation process of the theoretical identity information corresponding to the (m+1) th audio data to be transcribed is similar to the generation process of the identity information of the (t+1) th audio data to be processed above. For ease of understanding, the following description is provided in connection with examples.

As an example, the generation process of the theoretical identity information corresponding to the m+1th audio data to be transcribed may include steps 41-42:

Step 41: and generating a second updating rule according to the m-th audio data to be transcribed.

The second updating rule is used for describing an adjusting rule which is needed to be based on when theoretical identity information corresponding to the mth audio data to be transcribed is adjusted. It should be noted that the second updating rule is similar to the first updating rule, and the relevant content is referred to the first updating rule.

In addition, the generation process of the second updating rule is similar to that of the first updating rule, so the second updating rule can be generated according to the audio information carried by the m-th audio data to be transcribed. Based on this, the present embodiment provides a possible implementation manner of step 41, which is specifically: firstly, extracting audio information corresponding to the m-th audio data to be transcribed from the m-th audio data to be transcribed, and then generating a second updating rule according to the audio information corresponding to the m-th audio data to be transcribed.

The audio information corresponding to the mth audio data to be transcribed is used for describing audio sampling data carried by the mth audio data to be transcribed. It can be seen that the audio information corresponding to the mth audio data to be transcribed may include the remaining information after the identity information in the mth audio data to be transcribed is removed. For example, when the m-th audio data to be transcribed is "3128b_a0" and the identity information corresponding to the m-th audio data to be transcribed is "3", the audio information corresponding to the m-th audio data to be transcribed may be "128b_a0".

In addition, since the audio information corresponding to the mth audio data to be transcribed is similar to the above t audio data to be processed, and the second updating rule is similar to the above first updating rule, the implementation of the step of generating the second updating rule according to the audio information corresponding to the mth audio data to be transcribed is similar to the implementation of the above step 11, and the related content is referred to the above step 11.

Based on the above-mentioned related content of step 41, for the mth audio data to be transcribed, the audio information corresponding to the mth audio data to be transcribed may be extracted from the mth audio data to be transcribed, and then a second update rule is generated according to the audio information corresponding to the mth audio data to be transcribed, so that the second update rule may be used to describe an adjustment rule required by adjusting the identity information corresponding to the mth audio data to be transcribed, thereby enabling the theoretical identity information corresponding to the t+1th audio data to be determined according to the second update rule.

Step 42: and updating the theoretical identity information corresponding to the m-th audio data to be transcribed according to a second updating rule to obtain the theoretical identity information corresponding to the m+1-th audio data to be transcribed.

In fact, since the second updating rule is similar to the first updating rule, and the theoretical identity information corresponding to the mth audio data to be transcribed is similar to the identity information of the mth audio data to be processed, the implementation of step 42 is similar to the implementation of step 12, and please refer to step 12. For ease of understanding, the following description is provided in connection with examples.

As an example, when the mth audio data to be transferred includes N _m The theoretical identity information corresponding to the mth audio data to be transcribed comprises N _m When the theoretical identity information corresponding to the second audio data and the second updating rule includes the second sorting target, step 42 may specifically be: will N _m And sequencing and adjusting the theoretical identity information corresponding to the second audio data according to a second sequencing target to obtain theoretical identity information corresponding to the (m+1) th audio data to be transcribed.

It should be noted that the second sorting objective is similar to the first sorting objective described above, and the relevant content is referred to the first sorting objective described above.

Based on the above-mentioned related content of steps 41 to 42, the theoretical identity information corresponding to the (m+1) th audio data to be transcribed can be generated according to the audio information corresponding to the (m) th audio data to be transcribed and the identity information thereof, so that the theoretical identity information corresponding to the (m+1) th audio data to be transcribed can carry the audio information carried by the (m) th audio data to be transcribed and the corresponding identity information thereof.

Based on the above-mentioned related content of step 21, in the embodiment of the present application, for M pieces of audio data to be transcribed, the theoretical identity information corresponding to the 1 st piece of audio data to be transcribed may be determined according to the audio basic identity information corresponding to the M pieces of audio data to be transcribed; generating theoretical identity information corresponding to the 2 nd audio data to be transcribed according to the 1 st audio data to be transcribed and the theoretical identity information corresponding to the 1 st audio data to be transcribed; generating theoretical identity information corresponding to the 3 rd audio data to be transcribed according to the 2 nd audio data to be transcribed and the theoretical identity information corresponding to the 2 nd audio data to be transcribed; … … (and so on); and generating theoretical identity information corresponding to the M-1-th audio data to be transcribed according to the M-1-th audio data to be transcribed and the theoretical identity information corresponding to the M-1-th audio data to be transcribed, so that the legality of the M audio data to be transcribed can be analyzed according to the theoretical identity information corresponding to the M audio data to be transcribed.

Step 22: and matching the actual identity information corresponding to the audio data to be transcribed with the theoretical identity information corresponding to the audio data to be transcribed, so as to obtain an identity matching result corresponding to the audio data to be transcribed.

As an example, when the number of audio data to be transcribed is M, step 22 may specifically be: matching the actual identity information corresponding to the r audio data to be transcribed with the theoretical identity information corresponding to the r audio data to be transcribed to obtain an identity matching result corresponding to the r audio data to be transcribed; wherein r is a positive integer, r is less than or equal to M, and M is a positive integer.

In the embodiment of the present application, if the number of the audio data to be transcribed is M, the actual identity information corresponding to the 1 st audio data to be transcribed and the theoretical identity information thereof may be matched, so as to obtain an identity matching result corresponding to the 1 st audio data to be transcribed; matching the actual identity information corresponding to the 2 nd audio data to be transcribed with the theoretical identity information thereof to obtain an identity matching result corresponding to the 2 nd audio data to be transcribed; … … (and so on); and matching the actual identity information corresponding to the M-th audio data to be transcribed with the theoretical identity information thereof to obtain an identity matching result corresponding to the M-th audio data to be transcribed, so that the legitimacy of the M audio data to be transcribed can be judged by using the identity matching result corresponding to the M audio data to be transcribed.

Step 23: and determining whether the audio data to be transcribed is legal audio data or not according to an identity matching result corresponding to the audio data to be transcribed.

In some cases (e.g., where the M audio data to be transcribed is determined from stored audio data), validity of the M audio data to be transcribed should be determined as a whole. Based on this, the present embodiment provides a possible implementation of step 23, which may specifically be: if the identity matching results corresponding to the M pieces of audio data to be transcribed all show that the matching is successful, determining that the M pieces of audio data to be transcribed are legal audio data; if at least one identity matching result corresponding to the M pieces of audio data to be transcribed shows that matching fails, determining that the M pieces of audio data to be transcribed are illegal audio data.

As can be seen, for the stored audio data including M audio data to be transcribed, when the identity matching result corresponding to the 1 st audio data to be transcribed is successful, the identity matching result corresponding to the 2 nd audio data to be transcribed is successful, … … (analogized), and when the identity matching result corresponding to the M audio data to be transcribed is successful, the M audio data to be transcribed is determined to be legal, so that the stored audio data including the M audio data to be transcribed can be determined to be legal; otherwise, it may be determined that the stored audio data including the M pieces of audio data to be transcribed is illegal audio data.

Based on the above-mentioned related content of step 21 to step 23, in the embodiment of the present application, it may be determined whether the audio data to be transcribed is legal audio data according to the matching result of the theoretical identity information corresponding to the audio data to be transcribed and the actual identity information thereof.

It should be noted that, under some circumstances (for example, the audio basic identity information may represent a corresponding relationship between the recording device and the user), because the illegal audio data is collected by the illegal recording device without the transcription authorization, the corresponding relationship corresponding to the illegal recording device does not exist in the transcription device, so that the transcription device cannot query the stored audio basic identity information for the audio basic identity information matched with the actual identity information of the 1 st audio data to be transcribed extracted from the illegal audio data. It can be seen that when it is determined that the matching between the actual identity information corresponding to the 1 st audio data to be transcribed and at least one candidate identity information corresponding to the M audio data to be transcribed fails, it may be determined that the stored audio data including the 1 st audio data to be transcribed is illegal audio data.

S904: and when the audio data to be transcribed is determined to be legal audio data, conducting transcription processing on the audio data to be transcribed to obtain characters corresponding to the audio data to be transcribed.

The transcription process refers to transcribing the audio information carried in the audio data to be transcribed into characters.

In some cases, if the audio data to be transcribed includes the audio sampling point after the encryption processing, the audio information in the audio data to be transcribed may be decrypted first, and then the decrypted audio information may be transcribed. Based on this, the present application embodiment also provides a possible implementation manner of S904, which may specifically include S9041-S9043:

s9041: and extracting the audio data to be decrypted corresponding to the audio data to be transcribed from the audio data to be transcribed.

The audio data to be decrypted corresponding to the audio data to be transcribed refers to audio information carried in the audio data to be transcribed. In addition, the extraction process of the audio data to be decrypted corresponding to the audio data to be transferred is similar to the extraction process of the audio information corresponding to the m-th audio data to be transferred in the above step 41, and the related content is referred to above.

S9042: decrypting the audio data to be decrypted corresponding to the audio data to be transcribed to obtain the decrypted audio data corresponding to the audio data to be transcribed.

Wherein, the decrypted audio data corresponding to the audio data to be transcribed comprises at least one audio sampling point.

In some cases, if the audio data to be decrypted corresponding to the audio data to be transcribed is encrypted by using a preset encryption algorithm, S9042 may specifically be: and decrypting the audio data to be decrypted corresponding to the audio data to be transcribed by using a decryption algorithm corresponding to the preset encryption algorithm to obtain the decrypted audio data corresponding to the audio data to be transcribed.

S9043: and decrypting the audio data corresponding to the audio data to be transcribed to obtain the characters corresponding to the audio data to be transcribed.

The embodiment of the present application is not limited to the transcription method, and may be implemented by any method that is capable of transcribing audio into text, existing or appearing in the future.

Based on the above-mentioned related content of S9041 to S9043, if the audio data to be transcribed includes the encrypted audio information, the encrypted audio information may be extracted from the audio data to be transcribed, then the encrypted audio information is decrypted to obtain decrypted audio information, and finally the decrypted audio information is transcribed to obtain the text corresponding to the audio data to be transcribed.

Based on the above-mentioned related content of S901 to S904, in this embodiment of the present application, if the target user wants to transfer one stored audio data, the target user may trigger an audio transfer request for requesting to transfer the stored audio data, so that the transfer device may transfer the audio information carried by the stored audio data by using S901 to S904 based on the audio transfer request, to obtain the text corresponding to the stored audio data. The method comprises the steps that a transfer device is used for transferring the stored audio data, the stored audio data carries the identity information of the audio information, and whether the stored audio data is legal audio data or not can be determined based on the identity information carried by the stored audio data, so that validity screening of the audio data can be achieved, the transfer device only needs to transfer the legal audio data, illegal audio data is not required to be transferred, time consumed by the transfer device for transferring the illegal audio data can be saved, transfer of the legal audio data can be timely carried out by the transfer device, transfer instantaneity of the transfer device can be improved, and especially transfer instantaneity of the transfer device for the legal audio data is improved.

In order to facilitate understanding of the above audio data generation method and audio data transfer method, a description will be given below in connection with a scene embodiment.

Scene embodiment

In some cases, the recording device may record and store audio, but the recording device cannot perform other complex operations (e.g., audio transcription operation), and at this time, the recording device may perform other complex operations by means of a user terminal device (e.g., a terminal device such as a mobile phone, a computer, etc.). For ease of understanding, the following description is given in connection with the application scenario shown in fig. 11. Fig. 11 is a schematic view of an application scenario provided in the embodiment of the present application.

In the application scenario shown in fig. 11, the recording pen 1101 is capable of recording and storing audio; the recording pen 1101 can be mounted to the user terminal device 1102 through a preset connection mode (for example, a USB interface connection mode), so that the user terminal device 1102 can directly read audio data stored in the recording pen 1101; the recording pen 1101 is also capable of communicating with the user terminal device 1102 through a first communication means (e.g., a wireless communication means) so that the recording device 1101 can acquire information transmitted by the transcription server 1103 by means of the user terminal device 1102. Wherein the transcription server 1103 is capable of communicating with the user terminal device 1102 through a second communication means.

It should be noted that the embodiment of the present application is not limited to the user terminal device 1102, and the user terminal device 1102 may be a smart phone, a computer, a personal digital assistant (Personal Digital Assitant, PDA), a tablet computer, or the like, for example.

In addition, in order to enable the transfer server 1103 to distinguish between legitimate audio data and illegitimate audio data, the transfer server 1103 may send a transfer authorization code to the recording pen 1101. The transfer authorization code is determined according to the correspondence between the recording pen 1101 and the user thereof, and the generation process of the transfer authorization code specifically includes steps 51-53:

step 51: when the user terminal device 1102 and the recording pen 1101 are successfully connected through the first communication mode and/or the preset connection mode, the user may trigger an audio transcription authorization request on the user terminal device 1102, so that the user terminal device 1102 sends the audio transcription authorization request to the transcription server 1102. Wherein the audio transcription authorization request carries the user identification of the user and the product serial number of the recording pen 1101.

Step 52: the transcription server 1102 generates audio basic identity information corresponding to the user identity of the user according to the user identity of the user carried by the audio transcription authorization request and the product serial number of the recording pen 1101, and records the corresponding relationship between the user identity of the user and the audio basic identity information corresponding to the user identity of the user.

Step 53: the transcription server 1102 feeds back the audio basic identity information corresponding to the user identity of the user to the user terminal device 1102, so that the user terminal device 1102 forwards the audio basic identity information to the recording pen 1101 for storage, and the subsequent recording pen 1101 can generate audio data carrying the identity information based on the audio basic identity information.

Based on the above-mentioned related content in steps 51 to 53, because the audio basic identity information is generated according to the user identity of the user and the product serial number of the recording pen 1101, the corresponding relationship between the user identity of the user and the product serial number of the recording pen 1101 is hidden in the audio basic identity information, so that after the recording pen 1101 receives the audio basic identity information, the recording pen 1101 can determine the corresponding relationship between the user identity of the user and the product serial number of the recording pen 1101, which is approved by the transcription server 1103, and thus the transcription server 1103 can identify the audio data carrying the identity information generated based on the audio basic identity information as legal audio data.

In addition, the recording pen 1101 may implement generation and storage of audio storage data based on the audio basic identity information, as shown in fig. 12, and the process may specifically include S1201-S1209:

S1201: after the recording pen 1101 receives the recording request, the recording pen 1101 is in a recording state of the audio sampling point.

Wherein, the recording request is used for requesting the recording pen 1101 to record sound. The recording state means that the recording pen 1101 is in a sound collection state.

S1202: the recording pen 1101 records the audio sampling points according to the preset recording frequency, and determines the recorded audio sampling points as unprocessed sampling data. The preset recording frequency can be preset according to the application scene.

S1203: judging whether the number of unprocessed sampling data reaches a first value or not; if yes, executing S1204; if not, the process returns to S1203.

The first value may be preset, for example, the first value may be 128.

S1204: encrypting the first-value unprocessed sample data to obtain first-value encrypted sample data, deleting the first-value unprocessed sample data, and determining a set of the first-value encrypted sample data as first audio data.

Note that the encryption process in S1204 is similar to the encryption process in S3012, and the relevant contents are referred to in S3012.

S1205: judging whether the quantity of the first audio data reaches a second value; if yes, then execute S1206; if not, the process returns to S1203.

The second value may be preset, for example, the second value may be N (e.g., 10).

S1206: and determining the set of the first audio data with the second value as the current audio data to be processed, and deleting the first audio data with the second value.

S1207: and acquiring the identity information of the current audio data to be processed.

It should be noted that S1207 may be implemented using the specific embodiment of S302 above. For example, if the current audio data to be processed is the 1 st audio data to be processed, the audio basic identity information stored in the recording pen 1101 may be determined as the identity information of the current audio data to be processed; and if the current audio data to be processed is the identity information of the (t+1) th audio data to be processed, the identity information of the current audio data to be processed can be generated according to the (t) th audio data to be processed and the identity information of the (t) th audio data to be processed. Wherein T is a positive integer, T is less than or equal to T-1, and T is the total number of the audio data to be processed generated in the audio recording process.

S1208: generating current target audio data according to the current audio data to be processed and the identity information of the current audio data to be processed, so that the current target audio data carries the identity information of the current audio data to be processed and the identity information of the current audio data to be processed, and storing the current target audio data.

It should be noted that S1208 may be implemented using the specific embodiment of S303 above. For example, the identity information of the current audio data to be processed is added to the preset position of each first audio data in the current audio data to be processed in a hash mode, so that the current target audio data is obtained, and the current target audio data is stored.

S1209: judging whether a stopping condition is met, if so, ending the generation process of the audio data; if not, the process returns to S1203.

The stop condition may be preset, for example, the stop condition may be that unprocessed sample data is not present.

Based on the above-mentioned related content in S1201-S1209, when the user wants to record the audio with the recording pen 1101, the user can trigger the recording request, so that the recording pen 1101 records the audio sampling points after receiving the recording request, and processes and stores the recorded audio sampling points in the recording process of the audio sampling points, so that the stored audio data in the recording pen 1101 all carry the identity information, so that the subsequent transcription server 1103 can determine that the stored audio data are legal audio data based on the identity information.

In addition, when the user wants to transfer the stored audio data in the recording pen 1101, the user can mount the recording pen 1101 on the user terminal device 1102, trigger an audio transfer request in the user terminal device 1102, and send the audio transfer request and the stored audio data selected by the user to the transfer server 1103 by the user terminal device 1102, so that the transfer server 1103 can perform audio transfer by using any implementation mode of the audio data transfer method provided by the embodiment of the present application. The audio transcription request carries the user identity of the user.

As an example, the transcription server 1101 may specifically include steps 61-65:

step 61: the transfer server 1101 divides the received stored audio data according to a second division rule, resulting in at least one audio data to be transferred.

It should be noted that step 61 may be performed using any of the embodiments of S9012 above.

Step 62: and extracting the actual identity information corresponding to each piece of audio data to be transcribed from each piece of audio data to be transcribed.

It should be noted that, the extraction process of the actual identity information corresponding to each audio data to be transcribed is similar to the extraction process of S902 above, and the relevant content is referred to S902 above.

Step 63: the transcription server 1101 searches at least one candidate audio basic identity information corresponding to the user identity in the stored preset mapping relation according to the user identity of the user carried by the audio transcription request, and determines the candidate audio basic identity information as at least one candidate identity information.

It should be noted that the determination process of the "at least one candidate identity information" is similar to the determination process of the "at least one candidate identity information corresponding to the M audio data to be transcribed" above, and the relevant content is referred to above.

Step 64: judging whether at least one candidate identity information has the candidate identity information successfully matched with the actual identity information corresponding to the 1 st audio data to be transcribed, if so, executing the step 65; if not, determining that the stored audio data is illegal audio data, and adopting processing operation corresponding to the illegal audio data.

The processing operation corresponding to the illegal audio data may be preset, for example, the processing operation corresponding to the illegal audio data may include an operation related to audio transfer payment, and may also include an operation of ending a transfer flow of the stored audio data.

Step 65: and acquiring theoretical identity information corresponding to each piece of audio data to be transcribed.

It should be noted that, the step 65 may be implemented by using the embodiment of the step 21. For example, candidate identity information successfully matched with the actual identity information corresponding to the 1 st audio data to be transcribed is determined as theoretical identity information corresponding to the 1 st audio data to be transcribed; and generating theoretical identity information corresponding to the (m+1) th audio data to be transcribed according to the (m) th audio data to be transcribed and the theoretical identity information corresponding to the (m) th audio data to be transcribed. Wherein M is a positive integer, M is less than or equal to M-1, M is a positive integer, and M is the total number of audio data to be transcribed extracted from stored audio data.

Step 66: and matching the actual identity information of the theoretical identity information machine corresponding to each piece of audio data to be transcribed, and obtaining an identity matching result corresponding to each piece of audio data to be transcribed.

It should be noted that step 66 may be implemented using the embodiment of step 22 above.

Step 67: if the identity matching results corresponding to all the audio data to be transcribed are successful, determining that all the audio data to be transcribed are legal audio data (namely, the stored audio data are legal audio data), and carrying out transcription processing on all the audio data to be transcribed to obtain characters corresponding to all the audio data to be transcribed (namely, the characters corresponding to the stored audio data).

The "transfer process" in step 67 may be performed using the embodiment of S904.

Step 68: if at least one matching failure exists in the identity matching results corresponding to all the audio data to be transcribed, determining that the stored audio data are illegal audio data, and adopting processing operation corresponding to the illegal audio data.

Based on the above-mentioned related content in steps 61 to 68, after receiving the audio transcription request and the stored audio data, the transcription server 1103 may determine at least one candidate audio basic identity information corresponding to the user according to the user identity carried by the audio transcription request; and determining whether the stored audio data is legal audio data according to at least one candidate audio basic identity information corresponding to the user, the audio information carried by the stored audio data and the identity information thereof, so that when the stored audio data is determined to be legal audio data, the stored audio data is subjected to transcription processing, and characters corresponding to the stored audio data are obtained.

Based on the above description of fig. 11, if the recording pen 1101 cannot directly communicate with the transcription server 1103, the recording pen 1101 can implement a communication process with the transcription server by means of the user terminal device 1102.

However, in some cases, if the recording pen can directly communicate with the transcription server, the recording pen does not need to use a user terminal device, and the recording pen can directly send an audio transcription request to the transcription server, and can also directly send a user identity of a user and a product serial number of the recording pen to the transcription server, so as to directly receive audio basic identity information corresponding to the user identity of the user from the transcription server.

Based on the audio data generating method provided by the method embodiment, the embodiment of the application also provides an audio data generating device, which is explained and illustrated below with reference to the accompanying drawings.

Device embodiment 1

Device embodiments an audio data generating device is described, and reference is made to the above-described method embodiments for relevant content.

Referring to fig. 13, a schematic structural diagram of an audio data generating apparatus according to an embodiment of the present application is shown.

The audio data generating apparatus 1300 provided in the embodiment of the present application includes:

a first obtaining unit 1301, configured to obtain audio data to be processed and identity information of the audio data to be processed;

the data generating unit 1302 is configured to generate target audio data according to the audio data to be processed and identity information of the audio data to be processed, so that the target audio data carries the audio data to be processed and the identity information of the audio data to be processed.

In one possible implementation manner, in order to improve the transfer real-time performance of the transfer device, the data generating unit 1302 includes:

a first generation subunit, configured to generate, when the audio data to be processed includes N first audio data and the identity information of the audio data to be processed includes identity information of N first audio data, an ith second audio data according to the ith first audio data and the identity information of the ith first audio data, so that the ith second audio data carries the identity information of the ith first audio data and the ith first audio data; wherein i is a positive integer, i is less than or equal to N, and N is a positive integer;

and the second generation subunit is used for obtaining target audio data according to the 1 st second audio data to the N second audio data.

In one possible implementation manner, in order to improve the transfer real-time performance of the transfer device, the first generating subunit is specifically configured to:

In one possible implementation manner, in order to improve the transfer real-time performance of the transfer apparatus, the first obtaining unit 1301 includes:

The first acquisition subunit is used for generating the identity information of the (t+1) th audio data to be processed according to the (T) th audio data to be processed and the identity information of the (T) th audio data to be processed if the number of the audio data to be processed is T; the recording time corresponding to the t-th audio data to be processed is earlier than the recording time corresponding to the t+1th audio data to be processed; t is a positive integer, T is less than or equal to T-1, and T is a positive integer; the identity information of the 1 st audio data to be processed is determined according to the audio basic identity information corresponding to the T audio data to be processed.

In one possible implementation manner, in order to improve transfer instantaneity of the transfer device, the process of acquiring audio basic identity information corresponding to the T pieces of audio data to be processed is:

and generating audio basic identity information corresponding to the T pieces of audio data to be processed according to the user identity identifiers corresponding to the T pieces of audio data to be processed and the product serial numbers corresponding to the T pieces of audio data to be processed.

In one possible implementation manner, in order to improve the transfer real-time performance of the transfer device, the first obtaining subunit includes:

a third generating subunit, configured to generate a first update rule according to the t-th audio data to be processed;

And the fourth generation subunit is used for updating the identity information of the t-th audio data to be processed according to the first updating rule to obtain the identity information of the t+1th audio data to be processed.

In one possible implementation manner, in order to improve the transfer real-time performance of the transfer device, the fourth generating subunit is specifically configured to:

when the t th audio data to be processed includes N _t A first audio data, the identity information of the t th audio data to be processed comprises N _t When the identity information of the first audio data and the first updating rule comprise a first ordering target, the N is determined to be the same as the first ordering target _t Identity information of the first audio dataPerforming sequencing adjustment according to the first sequencing target to obtain identity information of the (t+1) th audio data to be processed; wherein N is _t Is a positive integer.

a second acquisition subunit, configured to acquire original audio data; encrypting the original audio data to obtain the audio data to be processed.

Based on the audio data transfer method provided by the above method embodiment, the embodiment of the application also provides an audio data transfer device, which is explained and illustrated below with reference to the accompanying drawings.

Device example two

The second embodiment of the apparatus is described with respect to an audio data transfer apparatus, and the related content is referred to the above method embodiment.

Referring to fig. 14, the structure of an audio data transfer device according to an embodiment of the present application is shown.

The audio data transfer apparatus 1400 provided in the embodiment of the present application includes:

a second acquisition unit 1401 for acquiring audio data to be transcribed; the audio data to be transcribed is target audio data generated by any implementation mode of the audio data generation method provided by the embodiment of the application;

an information extracting unit 1402, configured to extract actual identity information corresponding to the audio data to be transcribed from the audio data to be transcribed;

a validity determining unit 1403, configured to determine whether the audio data to be transcribed is legal audio data according to actual identity information corresponding to the audio data to be transcribed;

and the audio transcription unit 1404 is configured to, when determining that the audio data to be transcribed is legal audio data, perform transcription processing on the audio data to be transcribed, so as to obtain a text corresponding to the audio data to be transcribed.

In one possible implementation manner, in order to improve the transfer real-time performance of the transfer device, the information extraction unit 1402 includes:

An information extraction subunit, configured to extract actual identity information corresponding to the kth second audio data from the kth second audio data if the audio data to be transcribed includes N second audio data; wherein k is a positive integer, k is less than or equal to N, and N is a positive integer;

and the fifth generation subunit is used for generating the actual identity information corresponding to the audio data to be transcribed according to the actual identity information corresponding to the 1 st second audio data to the actual identity information corresponding to the N second audio data.

In one possible embodiment, in order to improve the transfer real-time performance of the transfer device, the audio data transfer apparatus 1400 further includes:

the third acquisition unit is used for acquiring theoretical identity information corresponding to the audio data to be transcribed;

the validity determination unit 1403 includes:

the third acquisition subunit is used for matching the actual identity information corresponding to the audio data to be transcribed with the theoretical identity information corresponding to the audio data to be transcribed to obtain an identity matching result corresponding to the audio data to be transcribed;

the first determining subunit is configured to determine, according to an identity matching result corresponding to the audio data to be transcribed, whether the audio data to be transcribed is legal audio data.

In one possible implementation manner, in order to improve the transfer real-time performance of the transfer apparatus, the third obtaining unit is specifically configured to:

if the number of the audio data to be transcribed is M, generating theoretical identity information corresponding to the (m+1) th audio data to be transcribed according to the (M) th audio data to be transcribed and theoretical identity information corresponding to the (M) th audio data to be transcribed; wherein M is a positive integer, M is less than or equal to M-1, and M is a positive integer; the recording time corresponding to the mth audio data to be transcribed is earlier than the recording time corresponding to the (m+1) th audio data to be transcribed; the theoretical identity information corresponding to the 1 st audio data to be transcribed is determined according to the audio basic identity information corresponding to the M audio data to be transcribed.

a sixth generation subunit, configured to generate a second update rule according to the mth audio data to be transcribed;

and a seventh generating subunit, configured to update the theoretical identity information corresponding to the m-th audio data to be transcribed according to the second updating rule, so as to obtain theoretical identity information corresponding to the m+1th audio data to be transcribed.

In one possible implementation manner, in order to improve the transfer real-time performance of the transfer device, the seventh generating subunit is specifically configured to:

when the mth audio data to be transferred includes N _m The theoretical identity information corresponding to the mth audio data to be transcribed comprises N _m When the theoretical identity information corresponding to the second audio data and the second updating rule includes a second ordering target, the N is determined to be the same as the theoretical identity information corresponding to the second audio data _m And sequencing and adjusting the theoretical identity information corresponding to the second audio data according to the second sequencing target to obtain theoretical identity information corresponding to the (m+1) th audio data to be transcribed.

In one possible implementation manner, in order to improve the transfer real-time performance of the transfer device, the third obtaining subunit is specifically configured to:

if the number of the audio data to be transcribed is M, matching the actual identity information corresponding to the r audio data to be transcribed with the theoretical identity information corresponding to the r audio data to be transcribed, and obtaining an identity matching result corresponding to the r audio data to be transcribed; wherein r is a positive integer, r is less than or equal to M, and M is a positive integer;

the first determining subunit is specifically configured to:

In one possible implementation manner, in order to improve the transfer real-time performance of the transfer device, the audio transfer unit 1404 is specifically configured to:

extracting audio data to be decrypted corresponding to the audio data to be transcribed from the audio data to be transcribed; decrypting the audio data to be decrypted corresponding to the audio data to be transcribed to obtain the decrypted audio data corresponding to the audio data to be transcribed; and decrypting the audio data corresponding to the audio data to be transcribed to obtain the text corresponding to the audio data to be transcribed.

Further, an embodiment of the present application further provides an audio data generating device, including: a processor, memory, system bus;

the processor and the memory are connected through the system bus;

the memory is for storing one or more programs, the one or more programs comprising instructions, which when executed by the processor, cause the processor to perform any of the implementations of the audio data generation method described above.

Further, the embodiment of the application also provides an audio data transfer device, which comprises: a processor, memory, system bus;

the processor and the memory are connected through the system bus;

the memory is for storing one or more programs, the one or more programs comprising instructions, which when executed by the processor, cause the processor to perform any of the implementations of the audio data transcription method described above.

Further, the embodiment of the application also provides a computer readable storage medium, in which instructions are stored, when the instructions run on a terminal device, the terminal device is caused to execute any implementation method of the audio data generation method or execute any implementation method of the audio data transfer method.

Further, the embodiment of the application also provides a computer program product, which when run on a terminal device, causes the terminal device to execute any implementation method of the audio data generation method or execute any implementation method of the audio data transfer method.

From the above description of embodiments, it will be apparent to those skilled in the art that all or part of the steps of the above described example methods may be implemented in software plus necessary general purpose hardware platforms. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network communication device such as a media gateway, etc.) to perform the methods described in the embodiments or some parts of the embodiments of the present application.

It should be noted that, in the present description, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

It is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of generating audio data, the method comprising:

generating target audio data according to the audio data to be processed and the identity information of the audio data to be processed, so that the target audio data carries the audio data to be processed and the identity information of the audio data to be processed;

if the number of the audio data to be processed is T, the obtaining the identity information of the audio data to be processed includes:

2. The method according to claim 1, wherein when the audio data to be processed includes N pieces of first audio data and the identity information of the audio data to be processed includes identity information of N pieces of first audio data, the generating the target audio data from the audio data to be processed and the identity information of the audio data to be processed includes:

3. The method of claim 2, wherein generating the ith second audio data from the ith first audio data and the identity information of the ith first audio data comprises:

4. The method according to claim 1, wherein the method further comprises:

generating audio basic identity information corresponding to the T pieces of audio data to be processed according to user identity identifiers corresponding to the T pieces of audio data to be processed and product serial numbers corresponding to the T pieces of audio data to be processed; the product serial number refers to the equipment identifier of the recording equipment used for recording the audio information carried by the T pieces of audio data to be processed;

The obtaining the identity information of the audio data to be processed further includes:

and determining the identity information of the 1 st audio data to be processed according to the audio basic identity information corresponding to the T audio data to be processed.

5. The method of claim 1, wherein the acquiring the audio data to be processed comprises:

acquiring original audio data;

encrypting the original audio data to obtain the audio data to be processed.

6. A method of audio data transcription, the method comprising:

acquiring audio data to be transcribed; wherein the audio data to be transcribed is target audio data generated by the audio data generation method according to any one of claims 1 to 5;

7. The method of claim 6, wherein if the audio data to be transcribed includes N pieces of second audio data, the extracting actual identity information corresponding to the audio data to be transcribed from the audio data to be transcribed includes:

extracting actual identity information corresponding to kth second audio data from the kth second audio data; wherein k is a positive integer, k is less than or equal to N, and N is a positive integer;

8. The method of claim 6, wherein the method further comprises:

9. The method of claim 8, wherein if the number of audio data to be transcribed is M, the obtaining theoretical identity information corresponding to the audio data to be transcribed includes:

10. The method of claim 9, wherein the generating theoretical identity information corresponding to the (m+1) th audio data to be transcribed according to the (m) th audio data to be transcribed and the theoretical identity information corresponding to the (m) th audio data to be transcribed, comprises:

11. The method of claim 10, wherein when the mth audio data to be transferred includes N _m The theoretical identity information corresponding to the mth audio data to be transcribed comprises N _m When the second updating rule includes a second sorting target, updating the theoretical identity information corresponding to the mth audio data to be transcribed according to the second updating rule to obtain theoretical identity information corresponding to the mth+1th audio data to be transcribed, including:

12. The method of claim 8, wherein if the number of the audio data to be transcribed is M, the matching the actual identity information corresponding to the audio data to be transcribed with the theoretical identity information corresponding to the audio data to be transcribed to obtain an identity matching result corresponding to the audio data to be transcribed, includes:

13. The method of claim 6, wherein the performing a transcription process on the audio data to be transcribed to obtain text corresponding to the audio data to be transcribed comprises:

14. An audio data transcription device, characterized in that the device comprises:

the second acquisition unit is used for acquiring the audio data to be transcribed; wherein the audio data to be transcribed is target audio data generated by the audio data generation method according to any one of claims 1 to 5;

15. A computer readable storage medium having instructions stored therein, which when run on a terminal device, cause the terminal device to perform the audio data generating method of any one of claims 1 to 5 or the audio data transferring method of any one of claims 6 to 13.