Audio Core

When data sources are audio files (e.g. depth interviews, customer calls, focus groups) a huge amount of time is needed to listen to each file to take notes and identify key items. Not only time-consuming, but the manual process to analyze this data doesn't scale across projects, read more about the audio use case here.

Relevance AI's Audio Premium workflow provides you with the ability to:

  • Audio transcription
  • Speaker diarization
  • Utterances

When the workflow is successfully finalized, you will have a new dataset under your account.

How to use Audio Core

  1. Upload your audio files to the Relevance AI platform using Upload media. Alternatively, if you have already stored your audio files on the web (i.e. you can access them via an http... URL), include their URLs in a CSV file (a sample here) and upload the CSV to Relevance AI.

Note: When uploading your media files to Relevance AI, if your audio files are large, allow time for the upload process to finalize; uploaded files are highlighted in green on the upload wizard.

As a result, you will have a dataset in which each entry represents an audio file including a URL to where the file is uploaded.

  1. Once your dataset is ready, locate "Audio Core" under Workflows.
Relevance AI - Access to Audio Core workflow

Relevance AI - Access to Audio Core workflow

Follow the steps in the setup wizard:

  • Select the field that contains the URLs to your audio file(s)
  • Select "Utterance" as the analysis mode
Relevance AI - Audio Core setup wizard

Relevance AI - Audio Core setup wizard

Note: You can track the progress on workflow history or wait till you receive an email notification on workflow is finalized.

Advanced Options

A very useful tool for processing Focus Group data is the section markers. The audio is broken into sections, according to the specified phrase identifiers used by moderators.

Simply enter the identifier phrase and the corresponding Topic for the section as shown in the example below.

Relevance AI - Audio Core - Topic section identifier (Optional)

Relevance AI - Audio Core - Topic section identifier (Optional)

Note: To use this feature among multiple focus groups, all moderators should use the exact same identifier phrases per section. Identifier phrase for a section must be selected in a way to be unlikely to be heard more than once in an audio.

Audio Core workflow outputs

After setting up the workflow wizard and successfully running the workflow, you will have a new dataset called <Original-Dataset-Name>_utterance in your account. For instance if the original dataset name is audio_dataset, a new dataset named audio_dataset_utterance will be added to your account.

This new dataset included the following main fields/columns:

  • Text: transcription of the audio file(s)
  • Speaker: A, B, C, etc. labels assigned to the voices heard in the audio
  • Start: time in the audio indicating the beginning of a spoken piece (Utterance)
  • End: time in the audio indicating the end of a spoken piece (Utterance)
  • File Name: original file name

Note 1: Transcription can take long depending on the size of your audio file (for example 1.5 hours of audio takes around 20 minutes)
Note 2: You might need to refresh the page for <Original-Dataset-Name>_utterance to appear under Datasets.



Even though audio analysis result is written back to the original dataset, for further processing of your data, use the new dataset. It can be found under <Original-Dataset-Name>_utterance.

Common questions

What do I do next?

After successful execution of the workflow, the audio is transcribed and chunked in three different ways. Results are saves as datasets under your account. You can treat the data as text and apply workflows such as AI Clustering and AI Tagging.

What are your tips for the most efficient way to get high level themes?

There are Gist and Summary fields under the chapter dataset which can provide you with high level analysis over the theme. But to further understand the data and deeper analysis, we recommend applying AI Clustering and AI Tagging to your data. Don't forget to switch to <Original-Dataset-Name>_utterance dataset.

What is my best tool to better study and visualize the results

Relevance AI's Explorer is a great tool that provides you with variety of configurable data views as well as search and filtering on your data.

How to extract the transcription

This can be done directly through an export workflow or on the Explorer dashboard. For the latter, set up a categorical view, you can choose any categories such as the Speaker field. then use export which generates a CSV file including the fields you select to download.

Note 1: utterances in the downloaded file might not be in order. We recommend including the Start field in your export, so you can sort the data accordingly.

Note 2: If you are working with multiple audio files, you can use the File Name field to separate the transcriptions.