Automated assessment of listener transcripts with the Token Sort Ratio
Supply an input .csv file with minimally one column labelled ‘target’ and one column labelled ‘response’. Also select the separator to be used.
Please first select a separator.
The data you provide are not stored, nor will any information contained therein be used for any purposes other than to produce the output that you request.
This open-source tool automatically calculates the Token Sort Ratio (TSR) for orthographic listener transcripts. The TSR score is a fuzzy string matching metric that – at the most basic level – quantifies the orthographic match between a target string and a response string (value between 0 = no match and 100 = perfect match). The TSR score has been shown to strongly correlate with human-generated scores of percentage words correct (Bosker, 2021). It is an efficient, reliable, and accurate tool for use in speech perception research (e.g., studies that examine the perception of speech in adverse listening conditions, or degraded speech) or for generating listener intelligibility measures in clinical disciplines such as speech-language pathology or audiology.
This online tool runs the function token_sort_ratio() from the fuzzywuzzy module in Python. It takes a .csv file as input, with minimally one column labelled ‘target’ and one column labelled ‘response’ in the header (= first row). A comma, semicolon or tab can be used as separator. For instance:
participant | trial | item | condition | target | response |
---|---|---|---|---|---|
1 | 1 | 1 | clear | the dog bit the man | the dog bit teh man |
1 | 2 | 1 | degraded | the dog bit the man | the dog ? |
1 | 3 | 2 | clear | the child was happy | the child was happy |
1 | 4 | 2 | degraded | the child was happy | child |
2 | 1 | 1 | clear | the dog bit the man | the dog bit the man |
2 | 2 | 1 | degraded | the dog bit the man | dog bites man |
2 | 3 | 2 | clear | the child was happy | the chidl was happy |
2 | 4 | 2 | degraded | the child was happy | happy |
It runs the function token_sort_ratio(target, response) and prints the output in a new column named ‘TSR_score’. For instance:
participant | trial | item | condition | target | response | TSR_score |
---|---|---|---|---|---|---|
1 | 1 | 1 | clear | the dog bit the man | the dog bit teh man | 95 |
1 | 2 | 1 | degraded | the dog bit the man | the dog ? | 54 |
1 | 3 | 2 | clear | the child was happy | the child was happy | 100 |
1 | 4 | 2 | degraded | the child was happy | child | 42 |
2 | 1 | 1 | clear | the dog bit the man | the dog bit the man | 100 |
2 | 2 | 1 | degraded | the dog bit the man | dog bites man | 69 |
2 | 3 | 2 | clear | the child was happy | the chidl was happy | 95 |
2 | 4 | 2 | degraded | the child was happy | happy | 42 |
This online tool only uses the default settings of token_sort_ratio(). For detailed documentation, please see: https://github.com/seatgeek/fuzzywuzzy and https://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/. This webpage was created by Tim van Schie and the underlying code is available on Github: https://github.com/timvanschie1/TSR_score. A Python script that runs the same code on a local .csv input file, which may be used for changing specific settings, is available for download from https://osf.io/73dnj/.
Author: Hans Rutger Bosker
Donders Institute, Radboud University
https://hrbosker.github.io
© Hans Rutger Bosker
To cite this tool, please use the following citation:
Bosker, H. R. (2021). Using fuzzy string matching for automated assessment of listener transcripts in speech intelligibility studies. Behavior Research Methods 53, 1945–1953. doi: 10.3758/s13428-021-01542-4. Fulltext