Current efforts to define and implement health data standards are driven by issues related to the quality, cost and continuity of care, patient safety concerns, and desires to speed clinical research findings to the bedside. The President’s goal for national adoption of electronic medical records in the next decade, coupled with the current emphasis on translational research, underscore the urgent need for data standards in clinical research. This paper reviews the motivations and requirements for standardized clinical research data, and the current state of standards development and adoption–including gaps and overlaps–in relevant areas. Unresolved issues and informatics challenges related to the adoption of clinical research data and terminology standards are mentioned, as are the collaborations and activities the authors perceive as most likely to address them.
Flow cytometry (FCM) is an analytical tool widely used for cancer and HIV/AIDS research, and treatment, stem cell manipulation and detecting microorganisms in environmental samples. Current data standards do not capture the full scope of FCM experiments and there is a demand for software tools that can assist in the exploration and analysis of large FCM datasets. We are implementing a standardized approach to capturing, analyzing, and disseminating FCM data that will facilitate both more complex analyses and analysis of datasets that could not previously be efficiently studied. Initial work has focused on developing a community-based guideline for recording and reporting the details of FCM experiments. Open source software tools that implement this standard are being created, with an emphasis on facilitating reproducible and extensible data analyses. As well, tools for electronic collaboration will assist the integrated access and comprehension of experiments to empower users to collaborate on FCM analyses. This coordinated, joint development of bioinformatics standards and software tools for FCM data analysis has the potential to greatly facilitate both basic and clinical research—impacting a notably diverse range of medical and environmental research areas.
Widespread adoption of electronic health records (EHRs) and expansion of patient registries present opportunities to improve patient care and population health and advance translational research. However, optimal integration of patient registries with EHR functions and aggregation of regional registries to support national or global analyses will require the use of standards. Currently, there are no standards for patient registries and no content standards for health care data collection or clinical research, including diabetes research. Data standards can facilitate new registry development by supporting reuse of well-defined data elements and data collection systems, and they can enable data aggregation for future research and discovery. This article introduces standardization topics relevant to diabetes patient registries, addresses issues related to the quality and use of registries and their integration with primary EHR data collection systems, and proposes strategies for implementation of data standards in diabetes research and management.
Relatively little attention has been focused on standardization of data exchange in clinical research studies and patient care activities. Both are usually managed locally using separate and generally incompatible data systems at individual hospitals or clinics. In the past decade there have been nascent efforts to create data standards for clinical research and patient care data, and to some extent these are helpful in providing a degree of uniformity. Nevertheless these data standards generally have not been converted into accepted computer-based language structures that could permit reliable data exchange across computer networks. The National Cardiovascular Research Infrastructure (NCRI) project was initiated with a major objective of creating a model framework for standard data exchange in all clinical research, clinical registry, and patient care environments, including all electronic health records. The goal is complete syntactic and semantic interoperability. A Data Standards Workgroup was established to create or identify and then harmonize clinical definitions for a base set of standardized cardiovascular data elements that could be used in this network infrastructure. Recognizing the need for continuity with prior efforts...
To facilitate sharing of Omics data, many groups of scientists have been working to establish the relevant data standards. The main components of data sharing standards are experiment description standards, data exchange standards, terminology standards, and experiment execution standards. Here we provide a survey of existing and emerging standards that are intended to assist the free and open exchange of large-format data.
The relationship between data quality and data standards has not been clearly articulated. While some directly state that data standards increase data quality, others claim the opposite. Depending on the type of data standard and the aspects of data quality considered, both arguments may in fact be correct. We deconstruct a typology of data standards and apply a dimensional definition of data quality to clearly articulate the relationship between the two, providing a framework for data quality planning.
Approved for public release; distribution is unlimited.; Documents include Paper & Presentation.; Standard information exchange data models (IEDMs), such as the Joint Consultation Command and Control IEDM (JC3IEDM) managed by the Multilateral Interoperability Programme (MIP) and the National Information Exchange Model (NIEM) managed by the US Department of Homeland Security, often are expressed as XML Schema Definition (XSD) documents. This choice of model representation comes with the benefits of a widely adopted format and a well-supported XML toolset and libraries. Although XML, as a technology, has been an enabler in achieving model alignment and interoperability among C4I and M&S systems, several key issues have not been fully addressed. For instance, XML does not provide a standard means for representing semantics. This means that XML expressions generally cannot be interpreted by applications in a meaningful manner unless specific code has been added for this purpose. In addition, systems utilizing multiple IEDMs are faced with difficult mapping and model translation tasks that cannot easily be automated. Furthermore, the use of multiple IEDMs creates significant maintainability and scalability challenges associated with the use of the relevant standards and specifications. As the user-base of a data standard grows...
The practise of Systems Biology relies on interfaces. Interfaces
between the entities we study: the paradigm moved from a physical
object centric view toward a relationship-centric one; interfaces
between tools: From the retrieval of the primary data to the fine
analysis of a model's behaviour, one uses many tools, more or less
well connected; interfaces between individuals: To build any
non-trivial mechanistic model requires to merge existing work and
gather external expertise.
If we want these interfaces to be generic enough to allow for anybody
to leverage on existing toolkits, a fundamental requirement is the
existence of community-developed well supported standards, but also
open resources where to find the "lego" blocks. Over the last
half-decade, several efforts have been launched in that direction,
whether concerning encoding format, ontologies or databases. Some of
them are now well-established in the field and play a significant role
to improve its coherence but also to increase the size and the quality
of quantitative models.
Systems biology has arisen through the convergence of theoretical, computational, and mathematical modeling of systems and the need to understand the wealth of information being rapidly generated in biology. Systems biology by its nature requires collaborations between scientists with expertise in biology, chemistry, computer sciences, engineering, mathematics, and physics. Successful integration of these disciplines depends on bringing to bear both social and technological tools: namely, consortia that help forge collaborations and common understanding, software tools that permit analysis of vast and complex data, and agreed-upon standards that enable researchers to communicate and reuse each other's results in practical and unambiguous ways. In this presentation, I will discuss several international projects (SBML, SBGN, and BioModels.net) aimed at addressing the last issue.
An important prerequisite for effective sharing of computational models is reaching agreement on how to communicate them, both between software and between humans. The Systems Biology Markup Language (SBML) project is an effort to create a machine-readable format for representing computational models at the biochemical reaction level. By supporting SBML as an input and output format...
*See also the "related presentation":http://precedings.nature.com/documents/3145/version/1*
We present an infrastructure that leverages synergistic reporting standards and ontologies^1,2,3,4,5^ to create a common structured representation and storage mechanism for experimental metadata from biological and biomedical investigations ranging from simple single-assay studies to complex, methodologically diverse multi-assay studies.
The infrastructure’s components include: a data capture and editing tool (_ISAcreator_); validator (_ISAvalidator_); database (_BioInvestigation Index_); and converter (_ISAconverter_); and a BioConductor analysis package (_R-ISApackage_). The components are designed for local installation, and can work independently, or as unified system.
View the "public instance":http://www.ebi.ac.uk/bioinvindex running at EBI and/or "download the components":http://isatab.sf.net for your local use.
1. Taylor CF, Field D, Sansone SA,… Rocca-Serra P et al. (2008) The MIBBI Project. _Nature Biotechnology_ Aug;26(8):889-896. "http://www.mibbi.org":http://www.mibbi.org
2. Smith B...
*See also "related poster":http://precedings.nature.com/documents/3144/version/1*
Today’s researchers can perform biological and biomedical studies where the same material is run through a wide range of assays, comprising several technologies such as genomics, transcriptomics, proteomics and metabol/nomics (hereafter referred as ‘omics’). To enable others to correctly interpret the complex data sets that result, and the conclusions drawn, it is necessary to provide contextualizing experimental metadata at an appropriate level of granularity.
Standards initiatives normally cater to particular domains. However, several synergistic standards activities foster cross-domain harmonization of the three kinds of reporting standard (minimum information checklists, ontologies and file formats). Some 29 groups participate in the "MIBBI":http://www.mibbi.org project, which offers a one-stop shop for those exploring the range of extant ‘minimum information’ checklists, and which fosters integrative development^1^. More than 60 groups participate in the "OBO Foundry":http://www.obofoundry.org ^2^, which coordinates the orthogonal development of ontologies such as "OBI":http://obi-ontology.org for describing experimental (meta)data. And several groups participate in the development of "ISA-Tab":http://isatab.sf.net...
The goal of the INCF Digital Atlasing Program is to provide the vision and direction necessary to make the rapidly growing collection of multidimensional data of the rodent brain (images, gene expression, etc.) widely accessible and usable to the international research community. This Digital Brain Atlasing Standards Task Force was formed in May 2008 to investigate the state of rodent brain digital atlasing, and formulate standards, guidelines, and policy recommendations.
Our first objective has been the preparation of a detailed document that includes the vision and specific description of an infrastructure, systems and methods capable of serving the scientific goals of the community, as well as practical issues for achieving
the goals. This report builds on the 1st INCF Workshop on Mouse and Rat Brain Digital Atlasing Systems (Boline et al., 2007, _Nature Preceedings_, doi:10.1038/npre.2007.1046.1) and includes a more detailed analysis of both the current state and desired state of digital atlasing along with specific recommendations for achieving these goals.
Several data preservation, management and sharing policies have emerged in response to increased funding for high-throughput approaches in major genomics and functional genomics bioscience domain; and nowadays, several funding agencies also require inclusion of data-sharing plans in grant applications. But despite some commonalities, the policies are heterogeneous by nature, given the different types of communities served and the data types they generate. In parallel, an escalating number of community-driven standardization efforts (including biocurators, database developers and experimentalists, vendors etc), operate to develop minimal requirements checklists, ontologies, and file-formats to support the harmonization of the reporting process, so that different experiments and data can be easily shared, compared, and integrated. The proliferation of these standardization efforts is a positive sign of community engagement, but it also brings with it new sociological and technological challenges - creating interoperability and avoiding unnecessary overlaps and duplication of efforts that hampers their wider uptake. Let aside the ethical, commercialization, credit and other known issues arising from public data release, basic communication channels still need to be formally created and maintained...
Many bioinformatics databases published in journals are here this year and gone the next. There is generally (i) no requirement, mandatory or otherwise, by reviewers, editors or publishers for full disclosure of how databases are built and how they are maintained; (ii) no standardized requirement for data in public access databases to be kept as backup for release and access when a project ends, when funds expire and website terminates; (iii) the case of proprietary resources, there is no requirement for data to be kept in escrow for release under stated conditions such as when a published database disappears due to company closure. Consequently, much of the biological databases published in the past twenty years are easily lost, even though the publications describing or referencing these databases and webservices remain. Given the volume of publications today, even though it is practically possible for reviewers to re-create databases as described in a manuscript, there is usually insufficient disclosure and raw data for this to be done, even if there is sufficient time and resources available to perform this. Consequently, verification and validation is assumed, and claims of the paper accepted as true and correct at face value. A solution to this growing problem is to experiment with some kind of minimum standards of reporting such as the Minimum Information About a Bioinformatics Investigation (MIABi) and standardized requirements of data deposition and escrow for enabling persistence and reproducibility. With easy availability of cloud computing...
Research databases, clinical systems, and lab systems all have different standards, formats and drivers for data capture, operation, analysis and integration. For interdisciplinary nutritional researchers, however, there is a dependence on all of these areas and technologies. While building and integrating these systems can be difficult, using agile practices including short iterations, testing and continuous integration methods, and close engagement with all stakeholders to create useful systems for translational research. Interoperability also requires good data standards, including the use of structured data dictionaries and existing data standards such as HL7, UMLS, LOINC, ICD, and OBO foundry ontologies.
The Web has many forums for sharing personal data, but not for scientific data and not in a way that allows the data to be accessed by "machines as users." A Web of data could add tremendous value by integrating disparate disciplines or conduct data-driven queries. Doing this is very complex and requires more robust standards than currently exist. The intended user for most data is not a person; it is a software application that can manipulate the data into something useful for humans. Such software could be "search engines, analytic software, visualization tools, database back ends, and more." This need creates a much different requirement for standards than those that were developed for displaying web data to people. Data software needs a much greater understanding of context and that context has to be supplied alongside the data either through direct integration with the data or linking to a description of it in a persistent and accessible location. Data interoperability must be addressed at the beginning of developing systems because it is significantly harder and costlier to make these connections after both systems have separately implemented non-standardized data collections. Data interoperability must address three levels: legal (intellectual property rights)...
BBSRC recognizes the importance of contributing to the growing international efforts in data sharing. BBSRC is committed to getting the best value for the funds we invest and believes that making research data more readily available will reinforce open scientific inquiry and stimulate new investigations and analyses. BBSRC supports the view that data sharing should be led by the scientific community and driven by scientific need. It should also be cost effective and the data shared should be of the highest quality. Members of the community are expected and encouraged to practice and promote data sharing, determine standards and best practice, and create a scientific culture in which data sharing is embedded.
Research communities, funding agencies, and journals participate in the development of reporting standards for the bioscience domain to ensure that shared experiments are reported with enough information to be comprehensible and (in principle) reproducible, compared or integrated (Field, Sansone et al., Science, 2009). Similar trends exist in both the regulatory arena and commercial science.
Proliferation of standards is a positive sign of stakeholders’ engagement, but how much do we know about these standards? Which ones are mature and stable enough to use or recommend? Which tools and databases implement which standard? Etc...
The BioSharing catalogue (www.biosharing.org) aims to
1. centralize community-developed bioscience standards, linking to policies, other portals, open access resources and lists of tools and databases implementing the standards;
1.1 The International Society for Biocuration (ISB) and the BioSharing initiative have produced BioDBcore, a community-defined, uniform system for describing these bio-resources, in particular, indicating in a consistent manner which community-defined standards (minimal information checklists, terminologies and exchange formats) they implement(www.biodbcore.org).
2. develop and maintain a set of criteria for assessing the usability and popularity of the standards...
BioSharing works at the global level to build stable linkages between journals, funders, implementing data sharing policies, and well-constituted standardization efforts in the biosciences domain, to expedite the communication and the production of an integrated standards-based framework for the capture and sharing of high-throughput genomics and functional genomic bioscience data.
Presented at the CRIS2012 Conference in Prague.-- 6 pages.-- Full conference programme available at: http://www.cris2012.org/findByFilter.do?categoryId=1158; To understand how research and development leads to creation of knowledge and then to track the impact of that knowledge requires a comprehensive model of the research ecosystem that incorporates inputs, outputs, activities, and external factors, and the data to support longitudinal and network analysis. To date, most research has focused on those activities and outputs that are readily accessible, including publication output and follow-on citations, and patents and patent citations. While these outputs are robust and can be normalized by field of research, additional data are needed. Moreover, efforts to assemble systematic information on researchers, including their biographic information, institutions, support, and networks, are in a fledgling
stage. We discuss requirements around data linkages, data standards, and data privacy in creating a distributed data infrastructure to support quantitative analysis of the research workforce.