Published genomes frequently contain erroneous gene models that represent issues associated with identification of open reading frames, start sites, splice sites, and related structural features. The source of these inconsistencies is often traced back to integration across text file formats designed to describe long read alignments and predicted gene structures. In addition, the majority of gene prediction frameworks do not provide robust downstream filtering to remove problematic gene annotations, nor do they represent these annotations in a format consistent with current file standards. These frameworks also lack consideration for functional attributes, such as the presence or absence of protein domains that can be used for gene model validation. To provide oversight to the increasing number of published genome annotations, we present a software package, the Gene Filtering, Analysis, and Conversion (gFACs), to filter, analyze, and convert predicted gene models and alignments. The software operates across a wide range of alignment, analysis, and gene prediction files with a flexible framework for defining gene models with reliable structural and functional attributes. gFACs supports common downstream applications, including genome browsers, and generates extensive details on the filtering process, including distributions that can be visualized to further assess the proposed gene space. gFACs is freely available and implemented in Perl with support from BioPerl libraries at https://gitlab.com/PlantGenomicsLab/gFACs.
近年来,随着高通量测序技术的发展和普及,基因组的大小以及组装注释的复杂度日益增长。尽管如此,GenBank数据库的近7800个真核细胞基因组中,仅有少数组装注释到染色体水平。而就这些真核细胞的基因组而言,超过85%的基因组包含错误的基因注释信息。这种错误的产生往往是由于整合不同格式基因注释信息的文本文件所导致的。此外,大多数基因预测流程缺少冗余基因过滤步骤,并且不提供主流标准输出形式的注释结果文件。同时,这些流程很少涉及功能属性注释,如那些可以用于考证基因模型准确性的蛋白质结构域信息等。为了对日益增多的基因组注释信息提供有效的监督,我们开发了一个针对基因注释文件和比对结果,集筛选、分析、转换功能于一体的软件包——gFACs。通过结合参考基因组的信息,这款软件可以过滤错误的基因模型,生成统计信息,并提供可以进行下游分析可视化的输出文件。值得注意的是,这款软件并不能代替基于从头注释或者相似性注释的基因预测模型,而是提供一个用于比较争议性注释信息的工具,从而提高基因注释信息的准确性。同时,gFACs提供常用的附加功能,如基因组浏览,以及生成有关筛选过程的详细信息。gFACs是由Perl中的BioPerl库提供基础支持的开源包,可供研究者免费下载使用,下载地址https://gitlab.com/PlantGenomicsLab/gFACs。
PDF全文下载地址:
http://gpb.big.ac.cn/articles/download/712
删除或更新信息,请邮件至freekaoyan#163.com(#换成@)
gFACs: Gene Filtering, Analysis, and Conversion to Unify Genome Annotations Across Alignment and Gen
本站小编 Free考研考试/2022-01-03
相关话题/gen
C3: Consensus Cancer Driver Gene Caller
Next-generationsequencinghasallowedidentificationofmillionsofsomaticmutationsinhumancancercells.Akeychallengeininterpretingcancergenomesistodistinguis ...中科院北京基因组研究所 本站小编 Free考研考试 2022-01-03m6A Regulates Neurogenesis and Neuronal Development by Modulating Histone Methyltransferase Ezh2
N6-methyladenosine(m6A),catalyzedbythemethyltransferasecomplexconsistingofMettl3andMettl14,isthemostabundantRNAmodificationinmRNAsandparticipatesindiv ...中科院北京基因组研究所 本站小编 Free考研考试 2022-01-03Chronic Food Antigen-specific IgG-mediated Hypersensitivity Reaction as A Risk Factor for Adolescent
Majordepressivedisorder(MDD)isthemostcommonnonfataldiseaseburdenworldwide.Systemicchroniclow-gradeinflammationhasbeenreportedtobeassociatedwithMDDprog ...中科院北京基因组研究所 本站小编 Free考研考试 2022-01-03Integrating Culture-based Antibiotic Resistance Profiles with Whole-genome Sequencing Data for 11,08
Emergingantibioticresistanceisamajorglobalhealththreat.Theanalysisofnucleicacidsequenceslinkedtosusceptibilityphenotypesfacilitatesthestudyofgenetican ...中科院北京基因组研究所 本站小编 Free考研考试 2022-01-03SeqSQC: A Bioconductor Package for Evaluating the Sample Quality of Next-generation Sequencing Data
Asnext-generationsequencing(NGS)technologyhasbecomewidelyusedtoidentifygeneticcausalvariantsforvariousdiseasesandtraits,anumberofpackagesforcheckingNG ...中科院北京基因组研究所 本站小编 Free考研考试 2022-01-03How Microbes Shape Their Communities? A Microbial Community Model Based on Functional Genes
Exploringthemechanismsofmaintainingmicrobialcommunitystructureisimportanttounderstandbiofilmdevelopmentormicrobiotadysbiosis.Inthispaper,weproposeafun ...中科院北京基因组研究所 本站小编 Free考研考试 2022-01-03GPA: A Microbial Genetic Polymorphisms Assignments Tool in Metagenomic Analysis by Bayesian Estimati
Identifyingantimicrobialresistant(AMR)bacteriainmetagenomicssamplesisessentialforpublichealthandfoodsafety.Next-generationsequencing(NGS)technologyhas ...中科院北京基因组研究所 本站小编 Free考研考试 2022-01-03Rice Genomics: over the Past Two Decades and into the Future
Domesticrice(OryzasativaL.)isoneofthemostimportantcerealcrops,feedingalargenumberofworldwidepopulations.Alongwithvarioushigh-throughputgenomesequencin ...中科院北京基因组研究所 本站小编 Free考研考试 2022-01-03Development of the “Third-Generation” Hybrid Rice in China
RiceisamajorcerealcropforChina.Thedevelopmentofthe“three-line”hybridricesystembasedoncytoplasmicmalesterilityinthe1970s(first-generation)andthe“two-li ...中科院北京基因组研究所 本站小编 Free考研考试 2022-01-03Recent Advances in Function-based Metagenomic Screening
Metagenomesfromunculturedmicroorganismsarerichresourcesfornovelenzymegenes.Themethodsusedtoscreenthemetagenomiclibrariesfallintotwocategories,whichare ...中科院北京基因组研究所 本站小编 Free考研考试 2022-01-03