The DataSHaPER (DataSchema and Harmonization Platform for Epidemiological Research) is both a scientific approach and a suite of practical tools. Its primary aims are to facilitate the prospective harmonization of emerging biobanks, provide a template for retrospective synthesis and support the development of questionnaires and information-collection devices, even when pooling of data with other biobanks is not foreseen.
Its basic structure reflects a four step approach to harmonization:
In the context of the DataSHaPER, the term "variables" refers to the primary units of interest in a statistical analysis (e.g. current smoker [yes/no], or body mass index as a quantitative trait). An important distinction is drawn between such variables and the specific "assessment items" that are collected by a particular study (e.g. questions in a questionnaire or physical measures collected by a study). Crucially, it is variables that are harmonized between studies and it is this that provides for flexible yet robust harmonization, because a given variable may potentially be built using different assessment items in different studies.
Structurally, the DataSHaPER is a dynamically evolving entity with two primary components: the DataSchema Platform and the Harmonization Platform.
A DataSchema identifies and describes a thematic set of core variables that are of particular value in a specified scientific setting.
The core variables in each DataSchema are grouped under a four level nested hierarchy:
The DataSchema Platform contains a growing number of such DataSchemas, each with its own scientific purpose. The platform also contains associated support material including variable definitions, links to relevant standard classifications, and access to reference questionnaires and operating procedures that have been selected or developed to reliably generate the variables in each DataSchema.
Each DataSchema in the DataSchema platform can be partnered by corresponding Harmonization Units that provide a foundation for harmonizing studies relative to that particular schema. Ultimately, it will contain a growing number of harmonization units.
Access to the Harmonization Platform is limited to collaborative context. Please contact us to see how we can work together.