This zip file contains the preprocessed data used in the paper "Automating intended target identification for paraphasias in discourse using a large language model" (https://www.medrxiv.org/content/10.1101/2023.06.18.23291555v1). The folder aphasia-chat/ contains filtered down and edited versions of the CHAT files from people with aphasia, including our resolved targets for the paraphasias. The folder aphasia-proprocessed/ contains structured versions of the transcripts from aphasia-chat/, in .json format. The folder controls-preprocessed/ contains structured versions of the control transcripts, with synthetic paraphasias added, in .json format. More information on the transcript preparation can be found in the paper.