- Step 1: Data gathering
- Step 2: Data Harmonization
- Step 3: Data Access
- Step 4: Data Analysis
- The new data collection will follow validated guidelines/principles in terms of data collection and will be done by a longitudinal approach using mobile App questionnaires: A minimum of 214 people per administrative district (6.420 persons throughout Rwanda totalizing 154.080 survey-entries over 24 weeks) will be required for mobile App responses weekly for 6 months (24 weeks).
- A minimum sample of 10 persons per district will be reached out by the data collector (2 times: at the beginning of the study and the end) via validation phone call or face-to-face questionnaire if the COVID-19 situations in Rwanda allows.
- A sub-group of patients cured from COVID-19 will be specifically followed. If a followed subject has a medical file in participating hospitals, the two datasets will be linked with possibilities of linkage data request in future.
The respondents will be randomly sampled from national population registry of each district, thanks to National Institute of Statistics (NISR) Authority. The sample will proportionally include males and females based on number of inhabitants. There’s a risk of having not sufficient numbers of respondents and/or they don’t report regularly, that’s why a team of data collectors will call subjects once a week to complete the missing data and to enhance the response rate. Each participant will receive mobile fee connection and internet bundle each week to allow data collection. To mitigate the expected gap of the gender digital divide but also of selected persons without a mobile phone anymore, the consortium establish mitigation measures including-but not limited to, leveraging the community healthcare workers (CHWs). Each village in Rwanda have a CHW who is participating into various ministry of health (MoH) programs and they have all received the mobile phones from MoH. If we select a respondent without a mobile phone we will liaise with nearer CHW to reach out to him. We included into the budget the service pay to connect the involved CHWs. The other measures will be specified and tested into the sampling plans and practical data collection plans which will be developed at the beginning of the project. The questionnaires (which will be translated in 3 languages, Kinyarwanda, English and French in Mobile application) include 10 modules (at least 8 of them has to be fulfilled by the project):
- Demographics;
- Face mask use;
- Hand hygiene;
- Respect of social distancing measures and risk minimization measures;
- Recent risk situations exposures and COVID-19 measures.
On the outcome side, the collected data will include 6. Coronavirus like-Signs and symptoms; 7. Mental health indicators (based on General anxiety disorder-GAD); 8. Social economic impact (based on loss of income, or categories); 9. Covid-19 test results; and 10. if available the geofencing data (no personal data to be collected): Only the Ethical committee approved anonymous phone tracking enabled at individual device on voluntary basis. The sampling plans and practical data collection plans will be developed at the beginning of the project. The sampling and data collection plans will help to overcome biases especially integrate gender dimension to deal with gender digital divide gap, known worldwide but also in Rwanda.
- Mapping workshop: this a face-to-face (in person or via video conference) workshop, usually a full day, where the initial mapping from source data to OMOP CDM is discussed in detail.
- Structure mapping + final mapping doc: Based on the mapping workshop, documentation and notes, the structure mapping is finalized and documented in the mapping document. This forms the basis of the ETL design.
- Code mapping: depending on which source terminologies are used in the data source, mapping the local codes to the standard vocabularies used in OMOP CDM (LOINC, SNOMED, RxNorm, etc.) can be either a short, easy process or a long, involved one with multiple iterations.
- Implementation of ETL(Extract, transform and load database functions that are combined into one tool to pull data out of one database and place it into another database) : the ETL script(s) to transform the source data into the OMOP CDM database instance; normally done in Python.
- ETL testing: the ETL scripts are tested both on development data, and ideally also on the data source’s test data.
- ETL deployment: once the ETL scripts will be tested successfully, and packed and deployed using GitHub and Docker.
The data harmonization process will differ quite substantially for different data sources. In terms of architecture design, we propose the following conceptual framework:
- There are new techniques with regards to the creation of synthetic data and using data to help automate harmonization processes and training models: This approaches will be also used in our project from early beginning when the harmonized data from hospitals EHRs are not yet available, specially leveraging the OHDSI community available mock up data (like Synthea) to train different algorithms /models, before we use them on real data.
- The OMOP CDM schema will have the same OMOP CDM vocabulary version as the participating sites and will allow studies to be prepared and tested. If needed, a synthetic data set (e.g. Synthea) or available local data set can be loaded.
- There will also be result schemas that will be able to hold the Achilles output per data source site – this will allow a central view on the descriptive statistics for each site.
- The database will also be the place to gather aggregated results from the data source sites as part of defined studies.
The OHDSI Atlas instance is integrated with the PostgreSQL database (in use, open source). The central Atlas instance will, as mentioned above, allow cohort definitions and studies to be prepared, and to view descriptive statistics for each participating site. The R Studio and Jupyter instances will allow development and testing of R scripts as port of a study design, or to analyze data collected from data source sites as part of studies. The Arachne central server setup will allow central management of network studies, with tight integration with the OHDSI tools such as Atlas.