AphasiaBank | Core Lexicon

AphasiaBank

Core Lexicon

This page provides information about Core Lexicon.

Selected articles on core lexicon:

Dalton et al. (2020) -- Moving Toward Non-transcription based Discourse Analysis in Stable and Progressive Aphasia
Dalton et al. (2020) -- A Compendium of Core Lexicon Checklists
Kim et al. (2019) -- Measuring Word Retrieval in Narrative Discourse: Core Lexicon in Aphasia
Kim & Wright (2020) -- A Tutorial on Core Lexicon: Development, Use, and Application
Kim & Wright (2020) -- Concurrent Validity and Reliability of the Core Lexicon Measure as a Measure of Word Retrieval Ability in Aphasia Narratives

Scoring

1. Manually. Scoring can be done manually without transcribing, using core lexicon checklists and a recording of the language sample, marking off which words were used.

2. Automatically with CLAN. Scoring can be done automatically from a CHAT file that has a %mor tier (from running the MOR program). The CORELEX command will produce a spreadsheet showing which words from the core lexicon checklist were used. The "Types" column in the spreadsheet will show how many words from the list were used at least once. These core lexicon lists are for the five AphasiaBank Discourse Protocol tasks, as published in Dalton et al. (2020).

Here are the a few important things to know:

Be sure you have a new CLAN program. This command was finalized and added to the program on June 30, 2021.
The command will automatically extract the appropriate gem (e.g., Cinderella), so be sure the transcript has the appropriate gem heading at the beginning of each task -- e.g., @G: Window (where the colon is followed by a tab, no spaces). The current list of gem headings for the CORELEX command is: Window, Umbrella, Cat, Cinderella, Sandwich, Cookie.
Type this command into the CLAN commands window -- corelex +lcat +t*par filename.cha.
- You can use *.cha if you want to evaluate all CHAT files in a folder.
- Substitute the appropriate task name -- cinderella, window, umbrella, sandwich, cookie -- for "cat" in the above example.
The CORELEX command counts words in utterances marked with [+ exc] post-codes. If you have that post-code in your transcripts and want to exclude those utterances from the CORELEX count, use this command -- corelex +lcat +t*par -s"<+ exc>" filename.cha.
The columns will show which specific words from the list were used and how frequently. Be sure to save the spreadsheet as an .xlsx Workbook in Excel.
If you want to compare your results to the norms reported in the supplemental materials from Dalton, Hubbard, and Richardson (2020), you need to do two steps before running the CORELEX command because the norms included revised words and excluded target replacements for semantic paraphasias. The CORELEX program counts lemmas on the %mor tier so it can capture different forms of a word (e.g., "was" for "be"), and the %mor tier excludes revisions and includes target replacements. To fix that (include revised words, exclude target replacements):
- Run this command on your CHAT file(s) -- chstring +q1 filename.cha -- to remove revision codes and underscores (e.g., all_of_a_sudden) in the transcript and replace target replacements for semantic paraphasias (e.g., grandmother [: godmother]) with [= target] instead of [: target]. Note: If you have a CLAN program from earlier than April 23, 2025, this command will replace target replacements with [:: target] instead of [= target], but everything will work as it should.
- Re-run the MOR command -- mor filename.chstr.cex -- on the new file(s).
- Run the CORELEX command -- corelex +lcat +t*par filename.chstr.cex -- on the new file(s). (Substitute the appropriate task name for "cat" in the example and use *.cex or *.chstr.cex instead of filename.chstr.cex for multiple files in a folder.)

3. Automatically with a web-app. Scoring can be done automatically with this web-app --
https://rb-cavanaugh.shinyapps.io/coreLexicon/ . Using simple orthographic transcription of the language sample (Broken Window, Refused Umbrella, Cat Rescue, Cinderella, Sandwich), the app produces a summary page with total scores and percentiles based on average norms relative to healthy controls and other individuals with aphasia. It also allows users to download a spreadsheet of their data and a PDF report.

The app was developed by Rob Cavanaugh, Sarah Grace Dalton, and Jessica Richardson with grant support from NIH/NIDCD (Cavanaugh, F31 DC019853-01). Citation for this software and link for source code: Cavanaugh, R., Dalton, S. G., & Richardson, J. (2021). coreLexicon: An open-source web-app for scoring core lexicon analysis. R package version 0.0.1.0000. https://github.com/aphasia-apps/coreLexicon . Comments, feedback, and bug-reports can be made on the github page.