Automated assessment of listener transcripts with the Token Sort Ratio

Supply an input .csv file with minimally one column labelled ‘target’ and one column labelled ‘response’. Also select the separator to be used.

Please first select a separator.

The data you provide are not stored, nor will any information contained therein be used for any purposes other than to produce the output that you request.

This open-source tool automatically calculates the Token Sort Ratio (TSR) for orthographic listener transcripts. The TSR score is a fuzzy string matching metric that – at the most basic level – quantifies the orthographic match between a target string and a response string (value between 0 = no match and 100 = perfect match). The TSR score has been shown to strongly correlate with human-generated scores of percentage words correct (Bosker, 2021). It is an efficient, reliable, and accurate tool for use in speech perception research (e.g., studies that examine the perception of speech in adverse listening conditions, or degraded speech) or for generating listener intelligibility measures in clinical disciplines such as speech-language pathology or audiology.

This online tool runs the function token_sort_ratio() from the fuzzywuzzy module in Python. It takes a .csv file as input, with minimally one column labelled ‘target’ and one column labelled ‘response’ in the header (= first row). A comma, semicolon or tab can be used as separator. For instance:

participant trial item condition target response
1 1 1 clear the dog bit the man the dog bit teh man
1 2 1 degraded the dog bit the man the dog ?
1 3 2 clear the child was happy the child was happy
1 4 2 degraded the child was happy child
2 1 1 clear the dog bit the man the dog bit the man
2 2 1 degraded the dog bit the man dog bites man
2 3 2 clear the child was happy the chidl was happy
2 4 2 degraded the child was happy happy

It runs the function token_sort_ratio(target, response) and prints the output in a new column named ‘TSR_score’. For instance:

participant trial item condition target response TSR_score
1 1 1 clear the dog bit the man the dog bit teh man 95
1 2 1 degraded the dog bit the man the dog ? 54
1 3 2 clear the child was happy the child was happy 100
1 4 2 degraded the child was happy child 42
2 1 1 clear the dog bit the man the dog bit the man 100
2 2 1 degraded the dog bit the man dog bites man 69
2 3 2 clear the child was happy the chidl was happy 95
2 4 2 degraded the child was happy happy 42

This online tool only uses the default settings of token_sort_ratio(). For detailed documentation, please see: https://github.com/seatgeek/fuzzywuzzy and https://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/. This webpage was created by Tim van Schie and the underlying code is available on Github: https://github.com/timvanschie1/TSR_score. A Python script that runs the same code on a local .csv input file, which may be used for changing specific settings, is available for download from https://osf.io/73dnj/.

Author: Hans Rutger Bosker
Donders Institute, Radboud University
https://hrbosker.github.io
© Hans Rutger Bosker

To cite this tool, please use the following citation:

Bosker, H. R. (2021). Using fuzzy string matching for automated assessment of listener transcripts in speech intelligibility studies. Behavior Research Methods 53, 1945–1953. doi: 10.3758/s13428-021-01542-4. Fulltext