This page will explain how to transcribe audio files on Relevance AI and apply text analysis such as clustering and tagging to the transcription to extract insight.
A workflow which provides you with
- audio transcription
- speaker diarization
You can upload your audio file(s) to the platform using the Upload media. Alternatively, if you have already stored your audio files on the web (i.e. you can access them via an
http... URL), include their URLs in a CSV file (as shown here) and upload the CSV to Relevance AI.
Note: When uploading your media files to Relevance AI, if your audio files are large, allow time for the upload process to finalize. On the Upload media configuration wizard, uploaded files are highlighted in green.
As a result, you will have a dataset in which each entry represents an audio file including a URL to where the file is uploaded.
Select your audio dataset, go to workflows and locate Audio Core. After setting up the workflow wizard and successfully running the workflow, you will have a new dataset called
<Original-Dataset-Name>_utterance in your account. For instance if the original dataset name is
audio_dataset, a new dataset named
audio_dataset_utterance will be added to your account.
This new dataset included the following main fields/columns:
- Text: transcription of the audio file(s)
- Speaker: A, B, C, etc. labels assigned to the voices heard in the audio
- Start: time in the audio indicating the beginning of a spoken piece (Utterance)
- End: time in the audio indicating the end of a spoken piece (Utterance)
- File Name: original file name
Note 1: Transcription can take long depending on the size of your audio file (for example 1.5 hours of audio takes around 20 minutes)
Note 2: You might need to refresh the page for
<Original-Dataset-Name>_utterance to appear under Datasets.
Even though audio analysis result is written back to the original dataset, for further processing of your data, use the new dataset. It can be found under
Some common next steps are:
- Process the Text field (i.e. the transcription) for insights. This is similar to any text processing. Select
<Original-Dataset-Name>_utterancedataset and apply AI Clustering or AI Tagging to the Text field.
- Analyse data and visualize the insight on the Explorer dashboard
- Export your data: This can be done directly through an export workflow or on the Explorer dashboard. For the latter, set up a categorical view, you can choose any categories such as the Speaker field. then use export which generates a CSV file including the fields you select to download.
Note 1: utterances in the downloaded file might not be in order. We recommend including the Start field in your export, so you can sort the data accordingly.
Note 2: If you are working with multiple audio files, you can use the File Name field to separate the transcriptions.
Updated 1 day ago