Details of piRNA data

Data collection:
piRNA data of piRBase are collected from literature and NCBI. We preferred processed piRNA sequence (txt or fasta file) rather than raw sequencing data (sra or fastq file).

All of these piRNA sequences collected in piRBase are obtained mainly through four kinds of experimental methods: small RNA sequencing, protein IP, Chromatography, protein CLIP.

We regarded piRNA sequences from a distinct library as one dataset in piRBase.

piRNA record:
For every distinct piRNA sequence, we give a name to the sequence, and recorded its detailed information including: aliases, accession, organism, sequence, length, dataset , pubmed and method.

Name is unique for each piRNA record, and two piRNA sequences of a given organism identical with each other are combined as one record.

Summary of piRNA data

OrganismNumber of datasetsNumber of piRNAsNumber of piRNAs obtained by small RNA sequencingNumber of piRNAs obtained by protein IPNumber of piRNAs obtained by protein CLIPNumber of piRNAs obtained by Chromatography
D. melanogaster2820,666,59720,647,49853,89800
C. elegans116,003016,00300
X. tropicalis146,653,3673,140,3354,054,79100

DNA methylation data

DNA methylation data of human brain, human testis, mouse brain, mouse testis, mouse spermatocytes, mouse spermatids, chicken testis, zebrafish testis and xenopus tropicalis testis are collected in piRBase. All of these tissues are reported to express piRNAs. Human brain and human testis data which are ENCODE data are downloaded from UCSC, others are all downloaded from GEO.

There are two forms of DNA methylation data in piRBase: percentages of DNA methylation levels at the single nucleotide scale and non-methylated islands, which are obtained by bisulfite sequencing and bio-cap respectively.

piRNA target data

piRNA targets data are collected from literature. Only mouse and fruitfly piRNA target information are included up to now.

For every distinct piRNA-mRNA pair, we recorded piRNA name, piRNA sequence, piRNA reads, piRNA target RefSeq accession, piRNA target gene symbol, piRNA target region (1 to 20 nt of piRNA), and piRNA function mechanism.

piRNA location

We mapped all of the piRNAs collected in piRBase to corresponding genome to get the origin of every piRNA sequence. The mapping tool we used is bowtie, and no more than one mismatch is allowed.

gene/repeat derived piRNA

piRNAs mapped to the region of refseq gene or repeat element annotated by repeat masker in the genome are picked out. We referred to these piRNAs as gene/repeat derived piRNAs.

H3K9me3 data

H3K9me3 ChIP-seq data of miwi2 Het and miwi2 KO mouse germ cells are collected in piRBase to analysis piRNA function in H3K9me3 mark establishment. These data are downloaded from GEO(GSE58332).