Errors in base calling can occur on the barcode sequences like any other part of the sequence read. The observed barcode sequence is compared to the expected sample barcode sequence to determine if the associated read should be assigned to that sample. If the samples have dual index barcodes, the sequences from the 2 barcodes are concatenated together into 1 longer barcode sequence for comparison by the Singular Demultiplex software.
The main options for controlling barcode mismatch tolerance are the following:
--allowed-mismatches
The --allowed-mismatches option allows the user to control how many mismatches are allowed between the expected barcode sequence and the actual barcode sequence for the sample while still assigning the read to the sample. If the actual mismatch is greater than the allowed mismatch for all samples, the read is not assigned to any sample, and sent to the Undetermined file. Increasing the --allowed-mismatches parameter increases the number of reads assigned to samples by allowing index sequences with partial matches to be assigned to the sample. However, this also increases the probability that a read is misassigned to a sample. For highly stringent applications, only perfectly matching barcode sequence should be assigned to samples and --allowed-mismatches is recommended to be set to 0.
--min-delta
A key related option is --min-delta, which controls the minimum number of mismatches by which the best match for a read must be better than the next best match to accept the best match result. Otherwise, the read gets sent to the Undetermined file. This is another way to control the stringency of assigning reads to samples. The higher the --min-delta parameter, the less likely a read is misassigned, but more reads are discarded.
The default values for the onboard G4 Sequencing Platform Demultiplex workflow and the standalone Singular Genomics Demultiplex software are the following:
Option
Onboard Demultiplex Workflow
Standalone Demultiplex Software
--allowed-mismatches
3
1
--min-delta
1
2
Note
The default values for the two workflows differ because the onboard demultiplexing workflow is optimized for demultiplexing indices introduced through the standard Singular Genomics adapter sequences.
If you do not specify the option via either the command line control or the sample sheet, the default options will be applied. The default values for these options are intended for longer, dual index barcodes such as the 2x12 bp Singular Genomics indices. For shorter indices, we recommend reducing the values for both options. For example, we recommend that 6 bp single index barcodes should be demultiplexed with --allowed-mismatches 1 (or lower) and --min-delta 1.
A number of metric files are generated by the Singular Genomics Demultiplex software. The most_frequent_unmatched.tsv tabulates the frequency of observed barcode sequences that were not assigned to a sample, sorted in descending order. For troubleshooting demultiplexing errors, it is often helpful to use --allowed-mismatches 0 and --min-delta 1 so partial matches are not assigned to samples and are tabulated in the most_frequent_unmatched.tsv metric file.