The article details the output files generated by the G4 Sequencing Platform after each flow cell.
The G4 Sequencing Platform will generate a number of files to your designated output location for each flow cell. Here is the description of the output files.
1. The "demux_filtered_fastqs" directory.
The folder contains the FASTQ files that have been demultiplexed using the Sample Data Sheet associated with the run. If the demultiplexing was successful, a "demux_complete.txt" file will be created. If there is not a valid Sample Data Sheet, the output FASTQs files will be filtered to remove PhiX reads, and bases with Qscore < 10 will be N-masked. Onboard demultiplexing uses the Singular Demultiplexer software described below, which you can use off-instrument as well.
2. The "unfiltered_fastqs" directory.
The folder contains the lane-level FASTQ files for each flow cell before demultiplexing, PhiX read removal, and base quality masking. There should be 1 set of FASTQ files for each of the 4 lanes. This is the most raw/primitive sequence data output from the G4 Sequencing Platform available to the customer. These are the files to use if you would like to demultiplex your data manually (discussed below).
3. Reporting files:
- A file named "run_complete.txt" will be in the folder if the sequencing run was completed successfully.
- A file named "demux_complete.txt" will be in the folder if the demultiplexing step was completed successfully.
- A file named "transfer_complete.txt" will be in the folder if the data transfer was completed successfully.
4. Metric files:
Successful demultiplexing will also generate several metric files (with metric in the name) that can be used to determine how well the FASTQ data was demultiplexed. The "per_sample_metrics.tsv" summarizes how many reads and the associated read qualities are assigned to each barcode. The "most_frequent_unmatched.tsv" file summarizes the number of reads that match to barcodes not found in the Sample Data Sheet. A detailed guide to the metric files can be found below:
https://github.com/Singular-Genomics/singular-demux#metrics
- The "run_config.json" contains the configuration of the run in json format.
- The "run_summary.json" contains summary information of the run.
- The "phix_stats" directory contains the PhiX statistics from the run.
- The "thumbnails" directory contains select thumbnail images from the sequencing run.
- The "logs" directory contains the data analysis logs from the Secondary PC.
See also the topic Run Folder in the G4 Sequencing Platform User Guide.
This article is part of our Guide to Working With G4 Sequencing Data Technical Bulletin.