AphasiaBank Core Lexicon

This page provides information about Core Lexicon.

Selected articles on core lexicon:


Manually. Scoring can be done manually without transcribing, using core lexicon checklists and a recording of the language sample, marking off which words were used.

Automatically. Scoring can also be done automatically from a CHAT file that has a %mor tier (from running the MOR program). The CORELEX command will produce a spreadsheet showing which words from the core lexicon checklist were used. The "Types" column in the spreadsheet will show how many words from the list were used at least once. These core lexicon lists are for the five AphasiaBank Discourse Protocol tasks, as published in Dalton et al. (2020).

Here are the a few important things to know:

  1. Be sure you have a new CLAN program. This command was finalized and added to the program on June 30, 2021.
  2. The command will automatically extract the appropriate gem (e.g., Cinderella), so be sure the transcript has the appropriate gem heading at the beginning of each task -- e.g., @G: Window (where the colon is followed by a tab, no spaces). The current list of gem headings for the CORELEX command is: Window, Umbrella, Cat, Cinderella, Sandwich.
  3. Type this command into the CLAN commands window -- corelex +lcat +t*par filename.cha.
    • You can use *.cha if you want to evaluate all CHAT files in a folder.
    • Substitute the appropriate task name -- cinderella, window, umbrella, sandwich -- for "cat" in the above example.
  4. The CORELEX command counts words in utterances marked with [+ exc] post-codes. If you have that post-code in your transcripts and want to exclude those utterances from the CORELEX count, use this command -- corelex +lcat +t*par -s"<+ exc>" filename.cha.
  5. The columns will show which specific words from the list were used and how frequently. Be sure to save the spreadsheet as an .xlsx Workbook in Excel.
  6. If you want to compare your results to the norms reported in the supplemental materials from Dalton, Hubbard, and Richardson (2020), you need to do two steps before running the CORELEX command because the norms included revised words and excluded target replacements for semantic paraphasias. The CORELEX program counts lemmas on the %mor tier so it can capture different forms of a word (e.g., "was" for "be"), and the %mor tier excludes revisions and includes target replacements. To fix that (include revised words, exclude target replacements):
    • Run this command on your CHAT file(s) -- chstring +q1 filename.cha -- to remove revision codes and underscores (e.g., all_of_a_sudden) in the transcript and replace target replacements for semantic paraphasias (e.g., grandmother [: godmother]) with double colons instead of single colons.
    • Re-run the MOR command -- mor filename.chstr.cex -- on the new file(s).
    • Run the CORELEX command -- corelex +lcat +t*par filename.chstr.cex -- on the new file(s). (Substitute the appropriate task name for "cat" in the example and use *.cex or *.chstr.cex instead of filename.chstr.cex for multiple files in a folder.)