Automated assessment of listener transcripts with the Token Sort Ratio

Supply an input .csv file with minimally one column labelled ‘target’ and one column labelled ‘response’. Also select the separator to be used.

Please first select a separator.

Select file and run

The data you provide are not stored, nor will any information contained therein be used for any purposes other than to produce the output that you request.

This open-source tool automatically calculates the Token Sort Ratio (TSR) for orthographic listener transcripts. The TSR score is a fuzzy string matching metric that – at the most basic level – quantifies the orthographic match between a target string and a response string (value between 0 = no match and 100 = perfect match). The TSR score has been shown to strongly correlate with human-generated scores of percentage words correct (Bosker, 2021). It is an efficient, reliable, and accurate tool for use in speech perception research (e.g., studies that examine the perception of speech in adverse listening conditions, or degraded speech) or for generating listener intelligibility measures in clinical disciplines such as speech-language pathology or audiology.

This online tool runs the function token_sort_ratio() from the fuzzywuzzy module in Python. It takes a .csv file as input, with minimally one column labelled ‘target’ and one column labelled ‘response’ in the header (= first row). A comma, semicolon or tab can be used as separator. For instance:

participant	trial	item	condition	target	response
1	1	1	clear	the dog bit the man	the dog bit teh man
1	2	1	degraded	the dog bit the man	the dog ?
1	3	2	clear	the child was happy	the child was happy
1	4	2	degraded	the child was happy	child
2	1	1	clear	the dog bit the man	the dog bit the man
2	2	1	degraded	the dog bit the man	dog bites man
2	3	2	clear	the child was happy	the chidl was happy
2	4	2	degraded	the child was happy	happy

download this example input.csv file

It runs the function token_sort_ratio(target, response) and prints the output in a new column named ‘TSR_score’. For instance:

participant	trial	item	condition	target	response	TSR_score
1	1	1	clear	the dog bit the man	the dog bit teh man	95
1	2	1	degraded	the dog bit the man	the dog ?	54
1	3	2	clear	the child was happy	the child was happy	100
1	4	2	degraded	the child was happy	child	42
2	1	1	clear	the dog bit the man	the dog bit the man	100
2	2	1	degraded	the dog bit the man	dog bites man	69
2	3	2	clear	the child was happy	the chidl was happy	95
2	4	2	degraded	the child was happy	happy	42

This online tool only uses the default settings of token_sort_ratio(). For detailed documentation, please see: https://github.com/seatgeek/fuzzywuzzy and https://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/. This webpage was created by Tim van Schie and the underlying code is available on Github: https://github.com/timvanschie1/TSR_score. A Python script that runs the same code on a local .csv input file, which may be used for changing specific settings, is available for download from https://osf.io/73dnj/.

To cite this tool, please use the following citation:

Bosker, H. R. (2021). Using fuzzy string matching for automated assessment of listener transcripts in speech intelligibility studies. Behavior Research Methods 53, 1945–1953. doi: 10.3758/s13428-021-01542-4. Fulltext

Automated assessment of listener transcripts with the Token Sort Ratio

Error

Agree with citation