AphasiaBank English — Data for the PSST Challenge

The PSST: Post-Stroke Speech Transcription shared task was held as part of RaPID-4, at the 13th Language Resources and Evaluation Conference (LREC 2022) in Marseille, France. More background is described at the official PSST website.

The PSST Data

Conditions for using this dataset are described in the call for participation.

Archive files in .tar.gz format. See below for a description of the contents, as well as a version history.

Additional Resources

psstdata : a set of Python scripts for automatically downloading and loading this dataset

psstbaseline : the code to download, use, and reproduce the baseline model from the PSST challenge

Publications

Dimitrios Kokkinakis, Charalambos K. Themistocleous, Kristina Lundholm Fors, Athanasios Tsanas, and Kathleen C. Fraser. 2022. Proceedings of the 4th RaPID Workshop: Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments. European Language Resources Association, Marseille, France.

  • The Post-Stroke Speech Transcription (PSST) Challenge
    Robert C. Gale, Mikala Fleegle, Gerasimos Fergadiotis, and Steven Bedrick
    (pages 41–55)
  • Post-Stroke Speech Transcription Challenge (Task B):
    Correctness Detection in Anomia Diagnosis with Imperfect Transcripts.

    Trang Tran
    (pages 56–61)
  • Speech Data Augmentation for Improving Phoneme Transcriptions of Aphasic Speech using wav2vec 2.0 for the PSST Challenge.
    Birger Moël, Jim O’Regan, Shivam Mehta, Ambika Kirkland, Harm Lameris, Joakim Gustafsson, and Jonas Beskow
    (pages 62–70)
  • Data Augmentation for the Post-Stroke Speech Transcription (PSST) Challenge: Sometimes Less is More.
    Jiahong Yuan, Xingyu Cai, and Kenneth Church.
    (pages 71–79)

PSST Data Pack Contents

The data packs contain audio files and labels for the PSST Challenge. The contents of the data packs are organized as follows:

The ./audio directory

The "audio" directory contains sub-directories for the BNT and VNT naming tasks (see task description for more details)

  • Within each task directory, there is a subdirectory for each session (e.g. "elman11a")
  • Within each session directory, there is a .wav file for each test item (e.g. "elman11a-BNT01-house.wav")
  • The naming scheme is consistent across instances, but not all items are present for all speakers
  • The audio files are mono audio recordings in standard PCM format, at a sampling rate of 16 kHz and a bitrate of 256 kb/s

The ./utterances.tsv file

The labels are in the file "utterances.tsv", which is a UTF-8 encoded, tab-separated file with the following fields:

  • utterance_id is a unique identifier for each production, of the form {session}-{test}{item}-{prompt} (e.g. "ACWT02a-BNT01-house")
  • session is the name of the AphasiaBank session from which the production was taken
  • test indicates which test each utterance comes from, either BNT (Boston Naming Test) or VNT (Verb Naming Test)
  • prompt is an orthographic rendering of the target word
  • transcript is the phonemic transcription of the production, in ARPAbet.
    • Silence is marked using <sil>
    • Spoken noise is marked using <spn>
  • correctness is marked as TRUE if the production is "correct" according to the clinical scoring rules of the BNT/VNT, FALSE otherwise
    • For task 2 (correctness), this is the outcome label
  • aq_index is the participant's Aphasia Quotient (AQ). AQ is the Western Aphasia Battery - Revised Aphasia Quotient (Kertesz, 2007) and it is a standardized total score that reflects overall aphasia severity. Values can fall between between 0.0 and 100.0. A lower number indicates higher severity.
  • duration_frames is the number of audio frames in each recording, or the duration in seconds times 16000
  • filename contains the relative path within the data pack to the file containing the audio recording for this production

For any questions about the contents of this data pack, please contact Robert Gale (galer@ohsu.edu) and Steven Bedrick (bedricks@ohsu.edu)