piRNA data of piRBase are collected from literature and NCBI. We preferred processed piRNA sequence (txt or fasta file) rather than raw sequencing data (sra or fastq file).
All of these piRNA sequences collected in piRBase are obtained mainly through four kinds of experimental methods: small RNA sequencing, protein IP, Chromatography, protein CLIP.
We regarded piRNA sequences from a distinct library as one dataset in piRBase.
For every distinct piRNA sequence, we give a name to the sequence, and recorded its detailed information including: aliases, accession, organism, sequence, length, dataset , pubmed and method.
Name is unique for each piRNA record, and two piRNA sequences of a given organism identical with each other are combined as one record.
|Organism||Number of datasets||Number of piRNAs||Number of piRNAs obtained by small RNA sequencing||Number of piRNAs obtained by protein IP||Number of piRNAs obtained by protein CLIP||Number of piRNAs obtained by Chromatography|
DNA methylation data of human brain, human testis, mouse brain, mouse testis, mouse spermatocytes, mouse spermatids, chicken testis, zebrafish testis and xenopus tropicalis testis are collected in piRBase. All of these tissues are reported to express piRNAs. Human brain and human testis data which are ENCODE data are downloaded from UCSC, others are all downloaded from GEO.
There are two forms of DNA methylation data in piRBase: percentages of DNA methylation levels at the single nucleotide scale and non-methylated islands, which are obtained by bisulfite sequencing and bio-cap respectively.
piRNA targets data are collected from literature. Only mouse and fruitfly piRNA target information are included up to now.
For every distinct piRNA-mRNA pair, we recorded piRNA name, piRNA sequence, piRNA reads, piRNA target RefSeq accession, piRNA target gene symbol, piRNA target region (1 to 20 nt of piRNA), and piRNA function mechanism.
We mapped all of the piRNAs collected in piRBase to corresponding genome to get the origin of every piRNA sequence. The mapping tool we used is bowtie, and no more than one mismatch is allowed.
piRNAs mapped to the region of refseq gene or repeat element annotated by repeat masker in the genome are picked out. We referred to these piRNAs as gene/repeat derived piRNAs.
H3K9me3 ChIP-seq data of miwi2 Het and miwi2 KO mouse germ cells are collected in piRBase to analysis piRNA function in H3K9me3 mark establishment. These data are downloaded from GEO(GSE58332).