测序数据质控

该软件只做评估，不会进行过滤。

项目地址：https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

安装

conda install -c conda-forge -c bioconda fastqc

一般默认参数运行即可

fastqc --outdir ./qc_out --threads 8 in_R1.fastq.gz in_R2.fastq.gz

会生成网页版的报告。

原始测序数据下机后，可能会包含低质量、接头等，该软件可以对测序数据进行过滤。

项目地址：https://github.com/OpenGene/fastp

安装

conda install -c conda-forge -c bioconda fastp

一般默认参数运行即可

fastp \
--thread 8 \ # 线程数
-i in_R1.fastq.gz \ # 输入数据fq1 
-I in_R2.fastq.gz \ # 输入数据fq2 
-o filter_R1.fastq.gz \ # 输出数据fq1
-O filter_R2.fastq.gz \ # 输出数据fq2
-j fastp.json \ # json格式日志
-h fastp.html \ # 网页版日志 
1>fastp.log \
2>fastp.err

除了过滤后的数据，还会生成网页版和json版的报告，包括过滤前后的数据量等。

项目地址：https://github.com/yfukasawa/LongQC

安装

conda install -c conda-forge -c bioconda longqc

运行

longQC.py \ 
sampleqc \
-p8 \ # 线程
-o qc_hifi \ # 输出目录
-x pb-hifi \ # 数据类型 
hifi.fastq.gz # 输入数据

会输出json和html格式的评估报告。包含质量值、测序覆盖度、GC含量信息。

对三代测序数据进行过滤。

通过--contam提供参考序列，可以对污染reads进行过滤。

项目地址：https://github.com/wdecoster/chopper

安装

conda install -c conda-forge -c bioconda chopper

运行

gzip -dc ont.fastq.gz | \
chopper -q10 \ # 要求的平均质量值
-l1000 \ # 长度阈值
--headcrop 20 \ # trim 开 头
--tailcrop 20 \ # trim 结 尾
| gzip > ont.filter.fastq.gz

可以对100+种生信软件的日志进行汇总。

2023-10-02

测序数据质控

二代数据质控

fastqc

fastp

三代测序数据质控

LongQC

chopper

multiqc

本页目录