---
title: "Rationale"
bibliography: references.bib
csl: vancouver.csl
vignette: >
  %\VignetteIndexEntry{Rationale}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
---

This document explains the rationale behind the development of this
algorithm. Many of these text were taken from Anders Aasted Isaksen's
[PhD Thesis](https://aastedet.github.io/dissertation/) as well as the
validation paper [@Isaksen2023]. This document is a shorter and more
concise version of those documents. We cover the:

- Current state of how diabetes is identified in Danish healthcare
  registers.
- Challenges faced by researchers in this area, such as the limited
  transparency in how diabetes is exactly classified in these sources
  and how applying or using these approaches isn't very easy.
- How this algorithm and package contributes to discussions in this
  space about how diabetes in classified in Danish register research and
  how it is implemented.

## Identifying type 1 and 2 diabetes cases in Danish healthcare registers

### Danish register data infrastructure

Many individual-level data (e.g. civil registration, public healthcare
contacts, and drug prescriptions) are automatically collected on all
residents in Denmark and stored in nationwide Danish registers by
Statistics Denmark (`www.dst.dk`, URL often hits redirect limits, so we
can't link directly) and the
[Danish Health Data Authority](https://sundhedsdatastyrelsen.dk). These
agencies are legally allowed to give access to the register data for
research purposes, which provides (authorized) researchers a set of
common, extensive data sources to use for studies. Any researcher
associated with an approved Danish research institute (mainly Danish
universities) can apply for access, but fees and conditions apply.

Register data is generally accessed and processed by approved
researchers on remote servers operated by Statistics Denmark and the
Danish Health Data Authority. The same raw data used by all researchers,
coupled with a common virtual working environment, has the potential to
enable reproducible research. This means that any data processing
workflow could be transferable and reusable between research projects if
the underlying code is designed with reproducibility in mind and the
code is shared ("open-sourced") [@Marszalek2016]. While reproducibility
in research relates to transparent reporting of methods to enable others
to reproduce analyses and experiments, this also applies to a diabetes
classification program, which - if reproducible - could be reused by any
researcher with access to the necessary register data to dynamically
identify a study population of individuals with diabetes for their
research needs [@Dima2017].

### Current Danish register-based diabetes classifiers

In Denmark, the National Diabetes Register, established in 2006, was the
first resource readily available to researchers to use for identifying
diabetes cases through register data [@Carstensen2011]. However, it was
discontinued in 2012.

The next resource is the
[Register of Selected Chronic Diseases](https://www.esundhed.dk/Dokumentation/DocumentationExtended?id=29)
(RSCD), which was launched in 2014. It is currently the only publicly
available resource to identify diabetes cases through Danish register
data (by application to the Danish Health Data Authority).

## Challenges in current classifiers

General-purpose registers and other administrative databases often
provide the basis of diabetes epidemiology, but they rarely contain
validated diabetes-specific data, which may introduce bias in studies
using this data. It is important to have an accurate tool to identify
individuals with diabetes in the registers, as findings may differ with
various diabetes definitions [@Nielsen2014; @Rawshani2014]. Considerable
efforts have been made towards establishing such a tool for diabetes
research in several countries, including Denmark
[@Bak2021; @HallgrenElfgren2016; @Cooper2013].

In a general population, classification algorithms (classifiers) need to
not only identify type 1 diabetes as well as type 2 diabetes, but also
account for events that might lead to inclusion of non-cases, such as
the use of glucose-lowering drugs in the treatment of other conditions.
Currently, no type-specific diabetes classifier has been validated in a
general population, which leaves register-based studies in this area
vulnerable to biases.

In Denmark, a limitation (or flaw) of the RSCD is that it has not been
publicly validated and the source code behind the algorithm has not been
made publicly available. Notably, the algorithm lacks inclusion based on
elevated HbA1c levels [@DHDA2016]. Likewise, the National Diabetes
Register, since discontinued in 2012, had a validation study question
its validity and called for future registers to adopt inclusion based on
elevated HbA1c levels [@Green2014].

Since the launch of the RSCD, nationwide laboratory data on HbA1c
testing has become available in the Danish register ecosystem
[@DHDA2018], but this data is yet to be incorporated into available
diabetes classifiers.

## Diabetes classification algorithms

The currently available register-based diabetes classifiers have yet to
incorporate the emerging register data on routine HbA1c testing. Wishing
to take advantage of this data, we developed the Open Source Diabetes
Classifier (OSDC). Detailed discussion of the advantages and
disadvantages of it's design is found in Anders Aasted Isaksen's thesis,
in the chapter on
[discussing the methods](https://aastedet.github.io/dissertation/5-discussion-methods.html).

We aimed on developing this algorithm to:

1. Stimulate discussion within Denmark on the openness and ease of use
   of existing classifiers or diabetes registers, and on the need for an
   official process for updating or contributing to existing data
   sources on diabetes status. This algorithm and package may end up not
   being used by official institutions, but it can serve as a starting
   point on how to improve the current state of diabetes classification
   in Denmark or as an inspiration for how they might be designed.
2. Provide an open-source, code-based algorithm as an R package to
   classify type 1 and type 2 diabetes based on data from Danish
   registers. We implemented it as an R package so that researchers can
   easily build their own database of individuals with diabetes more
   quickly than waiting for an official source to be implemented.

## References