Quantcast
Channel: indelrealigner — GATK-Forum
Viewing all 267 articles
Browse latest View live

Badly formed genome loc ERROR

$
0
0

Dear GATK help team,

I have a cut chromosome file (cur 17) in which I have processed through sorting, alignment, adding headers, and even through the realignertargetcreator. Yet, when I would like to call my indels from the Indel realigned. I have received an error.
ERROR MESSAGE: Badly formed genome loc: Parameters to GenomeLocParser are incorrect:The contig index 0 is bad, doesn't equal the contig index 17 of the contig from a string chr17

I have cut these chromosomes and processed the chr17 first since that is my region of interest, and did it since I thought it might save memory issues.

I am currently a newbie, have check the forum for help, yet only found one similar post with no solution.
Please help-- stuck at this stage. My code for the index realigned is the following:
java -jar /Users/yotsukurasohiya/build/softwares/GenomeAnalysisTK-3.2-2/GenomeAnalysisTK.jar -T IndelRealigner -R /Volumes/Pegasus/broadref/ucsc.hg19.fasta -I /Volumes/Pegasus/tmp/mardup.pregatk.bam -targetIntervals 2_target_intervals.list -known /Volumes/Pegasus/broadref/Mills_and_1000G_gold_standard.indels.hg19.vcf -known /Volumes/Pegasus/broadref/dbsnp_138.hg19.vcf -known /Volumes/Pegasus/broadref/1000G_phase1.snps.high_confidence.hg19.vcf -o 2_realigned_reads.bam

The heads that I have added are through picard softwares addorreplacereadgroups.
SO=coordinate CREATE_INDEX=true SM=temp PL=Illumina PU=barcode LB=bar ID=id


stack trace error using IndelRealigner

$
0
0

Hello,
I'm trying to realign approximately 115 bam files. I am able to do this with the -o command, but this results in an impressively large bam file that I cannot fix in Picard (FixMateInformation and SortSam). Unfortunately these are corrections that need to happen before the downstream GATK snp discovery. So I tried the -nWayOut command, to get an individual realigned bam file for each input, but this returns a stack trace ERROR that includes something about an unavailable reader id. I've pasted it below.

INFO 14:06:32,838 ProgressMeter - scaffold_0:4430818 1.17606954E8 59.5 m 30.0 s 0.4% 9.8 d 9.7 d
INFO 14:07:32,840 ProgressMeter - scaffold_0:4474066 1.18707144E8 60.5 m 30.0 s 0.4% 9.8 d 9.8 d
INFO 14:08:32,841 ProgressMeter - scaffold_0:4505563 1.1980727E8 61.5 m 30.0 s 0.4% 9.9 d 9.9 d
INFO 14:09:32,843 ProgressMeter - scaffold_0:4506325 1.20407434E8 62.5 m 31.0 s 0.4% 10.1 d 10.0 d
INFO 14:09:55,236 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

org.broadinstitute.gatk.utils.exceptions.ReviewedGATKException: No such reader id is available
at org.broadinstitute.gatk.engine.datasources.reads.SAMDataSource$SAMResourcePool.getReaderID(SAMDataSource.java:809)
at org.broadinstitute.gatk.engine.datasources.reads.SAMDataSource.getReaderID(SAMDataSource.java:430)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.getReaderIDForRead(GenomeAnalysisEngine.java:803)
at org.broadinstitute.gatk.utils.sam.NWaySAMFileWriter.addAlignment(NWaySAMFileWriter.java:158)
at org.broadinstitute.gatk.tools.walkers.indels.ConstrainedMateFixingManager.writeRead(ConstrainedMateFixingManager.java:356)
at org.broadinstitute.gatk.tools.walkers.indels.ConstrainedMateFixingManager.addRead(ConstrainedMateFixingManager.java:261)
at org.broadinstitute.gatk.tools.walkers.indels.ConstrainedMateFixingManager.addRead(ConstrainedMateFixingManager.java:237)
at org.broadinstitute.gatk.tools.walkers.indels.IndelRealigner.emit(IndelRealigner.java:492)
at org.broadinstitute.gatk.tools.walkers.indels.IndelRealigner.map(IndelRealigner.java:529)
at org.broadinstitute.gatk.tools.walkers.indels.IndelRealigner.map(IndelRealigner.java:146)
at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:228)
at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:216)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:102)
at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:56)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:108)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:314)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.2-2-gec30cee):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: No such reader id is available
ERROR ------------------------------------------------------------------------------------------

I'm not sure if it's the number of bam files submitted or if it is something with the specific position when the error occurs. Any help in this matter would be greatly appreciated!

Christen

UnifiedGenotyper false call with no support in BAM

$
0
0

I used GATK 3.2-2 to do indel realignment and multisample calling with UnifiedGenotyper (I cannot use HaplotypeCaller because it is incompatible with the type of data I am analyzing).

Among the ~96 samples sequenced, I have one sample with two nearby variant calls, an indel and a missense, that we checked by Sanger. The missense is real but we found no trace of the indel in the Sanger traces. When I look at indel call in the multisample vcf, it has good allele depth and GQ, but suspiciously has the same AD as the missense call. Additionally, when I look at the bam in IGV, I see no evidence for the indel whatsoever and the variant is not called in any other samples in this project.

indel:

chr13:101755523 CA>C
GT:AD:DP:GQ:PL
0/1:55,56:111:99:1388,0,1538

missense:

chr13:101755530 A>G
GT:AD:DP:GQ:PL
0/1:55,56:111:99:2170,0,2125

I went back and recalled just this one sample (single sample calling)… which resulted in the correct variants, i.e. the indel was not called at all, but the SNP, which does validate, is called.

I understand that this is not an easy region to call because of the 7xA repeat, but it’s not obvious to me why this happens only in multisample mode and I'd like to have a better understanding of what is going on.

IndelRealigner -nWayOut: Possible to specify output directory?

$
0
0

I'm putting together an in-house pipeline for processing of paired tumor-normal BAMs. I've set it up to make a single intervals file for each pair and then want to have separate BAMs for BQSR, so I thought I'd use IndelRealigner with -nWayOut. However, the realigned BAMs are output to whatever directory I've run the script from, rather than the same directory as the BAM files I provided to -I.

I tried to create a .map file with full pathnames to both the input and output files, but received the error:

##### ERROR MESSAGE: Bad input: Input-output bam filename map does not contain an entry for the input file 3528.dedup.bam

The corresponding entry in the .map file had the format:

/users/home/project/cocleaned/3528.dedup.bam /users/home/project/cocleaned/3528.realigned.bam

Is there a way to specify an output directory in conjunction with -nWayOut? If not, I can probably hackishly search for the realigned files, but I thought I'd check.

IndelRealigner losing some reads

$
0
0

Hi,

Recently I experienced a slightly annoying problem with IndelRealigner loosing some reads. It is usually just few reads missing from the output, but when I compare the output and input and extract the reads taht are missing after the IndelRealigner job, I cannot see what is wrong with them. An example of one such read is below:

M01823:187:000000000-AB050:1:1109:16397:19623 69 8 64405501 0 * = 64405501 0 TTTGCTTTCAAAAATACCTGTGCAGGTGGAGGTGTGCGTCTGCGTCTAACGGTGTGCGGTGCGAATTTCGACGATCGTTGCATTAACTTGCGAAACCCCTCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAAAAAAAATAAAACAAACAAAACGAACTACTACAGACAACGACAAAAACCAAAAAACAACATATAAACAAATAAACGAGCAACACAACACAAATAAAAGAGCAAGCACTACAC CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG885+3355<,8,,=,,,,,,3,,=4::?,,7,,7,*2+14<********/*/2***1/*0+++2++2/+++++*2*12*2*****2*;*/++2+++1:68***20++++* RG:Z:140919_M01823_0188_000000000-AB050_AACCCCTC-TGTTCTCT_L001 AS:i:0 XS:i:0

It's pair has been kept, but this read was removed.

It is a bit of nuisance, as in our workflow we check the number of reads in the files after various steps for sanity, so varying number of reads introduces problems. I would be grateful if you could adviice why some reads get ommitted by IndelRealigner so I could modufy our workflow accordingly. Or could it be a bug?

Thank you,
Dalia

parallel indelrealigner

$
0
0

Hello, I want to use gatk indelrealigner in parallel mode. I'm looking for a qscript, because -nt and -nct flags will not work. I use the actual queue version 3.3-0.

Joern

IndelRealigner produce huge bam

$
0
0

Hi. I am using IndelRealigner for local indel realignment. The bam used as input is 6.6GB, while the realigned bam is 22GB.

Did I miss anything there?

The pipeline I used is as below:

echo "Patient ${sample}: @create intervals for local realignment"
sudo java -Djava.io.tmpdir=${out_dir}/tmpdir \
    -Xmx${maxMem} -Xms${minMem} \
    -jar ${gatk} \
    -T RealignerTargetCreator \
    -I ${out_dir}/${input_next} \
    -o ${out_dir}/${input_next}.forRealigner.intervals \
    -R ${reference} \
    -L ${intervals} \
    --interval_padding 200 \
    -rf ${reads_filter} \
    -known ${kg_mills} \
    -known ${kg_indels} \
    -nt ${maxDataThread} \
    --allow_potentially_misencoded_quality_scores \
    2>${out_dir}/logs/${sample_prefix}_createIntervals.err


echo "Patient ${sample}: @local realignment"
sudo java -Djava.io.tmpdir=${out_dir}/tmpdir \
    -Xmx${maxMem} -Xms${minMem} \
    -jar $gatk \
    -T IndelRealigner \
    -I ${out_dir}/${input_next} \
    -o ${out_dir}/${sample_prefix}.dedup.realigned.bam \
    -R ${reference} \
    -targetIntervals ${out_dir}/${input_next}.forRealigner.intervals \
    -rf ${reads_filter} \
    -known ${kg_mills} \
    -known ${kg_indels} \
    -compress 0 \
    -LOD 0.4 \
    --consensusDeterminationModel USE_READS \
    --allow_potentially_misencoded_quality_scores \
    2> ${out_dir}/logs/${sample_prefix}_realignment.err

Thanks.

RealignerTargetCreator IndelRealigner : can I use a gzip output/input

$
0
0

Hi GATK team,
my jobs are currently running and I'm a little bit lazy to try this later: I saw that the .interval files produced by RealignerTargetCreator can be quite large. Can I use a ".interval.gz" extension on the command line of RealignerTargetCreator ? Can I use this *.gz file with IndelRealigner ?


Best approach for realignertargetcreator and indelrealigner

$
0
0

Hi,

I am trying to decide between two approaches for performing realignment around indels. I have ~600 samples that have been aligned to a very fragmented draft genome assembly.
What is best:
1. take each sample and create a list of targets, followed by realignment on each sample.
2. combine all samples into one large bam file and create a list of targets, followed by realignment on the same large bam file.

Also, would there be any advantages in terms of speed with either approach?

Cheers,

Steve

GATK realignment around local indel algorithm

$
0
0

Hi GATK team,

Though I have read the seminar slides for GATK indelrealignment, I still have no idear about how GATK does that for us, is there anyone can suggest? Say, give me a reference, or a brief ìntroduction.

Actually, what I care most is whether indelrealignment takes base quality into consideration.Thank you very much.

bless~

IndelRealigner and option -L

$
0
0

'just asking for confirmation: if I run IndelRealigner with option -L my.capture.bed does indelrealigner:

  • keep all the reads but only realign them in region of the bed ?
  • or only keep the reads in the given region (smaller BAM)?

Thanks

Should I realign around indels and recalibrate quality score before running Mutect?

$
0
0

Hello
Do you recommened realign around indels and recalibrate quality score before running Mutect?
Thanks!

Is it obligatory to go for Base Quality Score Recalibration (BQSR) after IndelRealignment step?

$
0
0

If I am not having information about known variants. Is it fine if I skip the BQSR step after indelRealignment step??

erroneous bam file from indel realigner

$
0
0

Hi Team,

I get an error with gatk in variant calling steps, using BAM file from realignment step.
The error indicated something wrong with the bai file.
So I tried to create it new. But then this comes up, saying there is something wrong with the bam (see below)
This bam was created with IndelRealigner (no errors)

Thanks!
Alexander

$ picard 1 BuildBamIndex INPUT=B57.3.bam
[Sun Mar 15 22:37:46 CET 2015] picard.sam.BuildBamIndex INPUT=B57.3.bam    VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Sun Mar 15 22:37:46 CET 2015] Executing as kaktus42@soroban on Linux 2.6.32-431.29.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_31-b13; Picard version: 1.129(b508b2885562a4e932d3a3a60b8ea283b7ec78e2_1424706677) IntelDeflater
[Sun Mar 15 22:41:19 CET 2015] picard.sam.BuildBamIndex done. Elapsed time: 3,55 minutes.
Runtime.totalMemory()=855638016
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.FileTruncatedException: Premature end of file
    at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:382)
    at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:127)
    at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:252)
    at java.io.DataInputStream.read(DataInputStream.java:149)
    at htsjdk.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:404)
    at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:380)
    at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:366)
    at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:199)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:660)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:634)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:628)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:598)
    at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:527)
    at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:501)
    at htsjdk.samtools.BAMIndexer.createIndex(BAMIndexer.java:287)
    at htsjdk.samtools.BAMIndexer.createIndex(BAMIndexer.java:271)
    at picard.sam.BuildBamIndex.doWork(BuildBamIndex.java:138)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:187)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

Error stack trace Indexoutofboundsexception

$
0
0

I have been trying to run IndelRealigner with the following commands ($tumorPfx etc are file names)

java -d64 -jar $gatkJar -R $hgReference -T IndelRealigner -rf BadCigar -I $tumorPfx.bam -known $G1000_Mills -known $G100\
0_Phase1_Indels -targetIntervals $tumorSample.intervals -o $tumorPfx.realn.bam

and have been getting the following output and error:

ERROR stack trace

java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:635)

On this website, I found a similar error with somebody trying to run HaplotypeCaller, albeit with Index and Size = 3, where it was remarked that it may be due to a bug or a java version issue. Is the same thing going on here?

Thank you, Max


Working with input files with Q64 and Q33 (IndelRealigner)

$
0
0

Dear GATK team,

I have two input fasta files from exome-seq. One is coded with Q64 and the other is coded with Q33 quality scores. I want to combine the two input fasta files and run bwa+GATK.

How do I combine them for IndelRealigner? I suppose that IndelRealigner needs all reads from both Q64 and Q33. Can I do IndelRealigner separately and then join them? Will this cause problems?

I have searched for many posts but can't find my answers. Please help me.

Thanks,
Woody

Is it worth to consider Realigned & Recaliberated bam file for further Pindel Analysis...?

$
0
0

Hi,

I have gone through the Realignment step and found Re aligner will change the CIGAR of alignment in bam file.
Most of the structaral variant detection tool dependent upon CIGAR field. So my question is it right to consider re calibrated bam file, does it has any advantage for SV Detection over Raw Sorted Bam file..?

Realignment intervals distribution

$
0
0

I was just wondering what you guys thought of my realignment intervals length distribution.
This is 30Mb from a single diploid sample without prior indel position information. Approximately 60,000 events , i.e. one every fifty bases seems like a lot.
How indicative of true indels is the data from TargetCreator and IndelRealigner? Guess I'll have to check with the ug-vcf calls...
Across the genome, distribution of 'all' events is uniform.
Does multi-sample realignment improve the accuracy or efficiency of the realignment process ?

Problem with Indel Target Realigner, extra contig added?

$
0
0

Hello,
I run into a problem after the pre-processing, it seems that extra contigs where added to my bam file compared to the reference I used, which make the indel realigner step impossible to do. I have checked the headers of my file and the reference is the same but my bam file as a hundreds of additional contigs. Not sure what happen.
The steps to get the bam where:
- Aligned with bwa mem
- Transform to bam and sort (Samtools)
- Dedup (picard)
- Add read group (picard)
- Index bam (samtools)
- Run Realigner target creator
When I check the header of my bam file it still show the right contigs but when running it complains of difference (additional) compare to my reference. I am currently re-testing the whole pipeline on a single sample but if you have any pointer to what could cause this, maybe a problem with the bam formating?
I am running GATK 3.3.0-g37228af
Java 1.7
I have attached the ouput log from the command.
Thanks,

Julien

PS: I attended your workshop in Cambridge!

IndelRealigner Not attempting realignment in interval

$
0
0

Hi,

first of all, sorry, if I'm not correct in this forum, but I didn't find a place where it realy seemed correct.

On the last data I've got I ren into information like:
"INFO 22:14:12,352 IndelRealigner - Not attempting realignment in interval chrY:27782626-27782640 because there are too many reads."

I've read about the option: "--maxReadsForRealignment" and set it to 100000. I had a look at the bam file and saw, that at some points I've got more then 150000 reads. The average reading depth is about 50. I'm quite new to the filed of NGS and thus am not sure what that means.

Exactly how does the program handle this? Sincer there only small parts, which are actually to high covered. Are only those few bases not realigend? Or is everything in the step skipped? As an example:
chrY:27782626-27782640 is only an interval of 14bp. The whole y-chromsom is covered by only two steps shown in the log file. What is not aligned? One part of the y-chromosom or just those 14bp?

Besides this technical questions: is it save to just increase the "--maxReadsForRealignment" until all intervals are realigned, or is there normaly a general failure (eg. PCR) behind this, so that it seems more save to ignore those intervals?

If some other users have encountered the same issue it would be nice to hear how the have handled this.

Thanks alot

Viewing all 267 articles
Browse latest View live