| Title: | Open Source Diabetes Classifier for Danish Registers |
|---|---|
| Description: | The algorithm first identifies a population of individuals from Danish register data with any type of diabetes as individuals with two or more inclusion events. Then, it splits this population into individuals with either type 1 diabetes or type 2 diabetes by identifying individuals with type 1 diabetes and classifying the remainder of the diabetes population as having type 2 diabetes. |
| Authors: | Signe Kirk Brødbæk [aut] (ORCID: <https://orcid.org/0009-0000-2208-7088>), Anders Aasted Isaksen [aut] (ORCID: <https://orcid.org/0000-0001-8457-5466>), Luke William Johnston [aut, cre] (ORCID: <https://orcid.org/0000-0003-4169-2616>), Steno Diabetes Center Aarhus [cph], Aarhus University [cph] |
| Maintainer: | Luke William Johnston <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.11.2 |
| Built: | 2026-06-02 10:31:59 UTC |
| Source: | https://github.com/steno-aarhus/osdc |
This nested list contains the logic details of the algorithm.
algorithm()algorithm()
Is a list with nested lists that have these named elements:
Optional. The register used for this logic
The title to use when displaying the logic in tables.
The logic itself.
Some additional comments on the logic.
A nested list with the algorithmic logic. Contains
fields register, title, logic, and comments.
See the vignette("algorithm") for the logic used to filter these
patients.
algorithm()$is_hba1c_over_threshold algorithm()$is_gld_code$logicalgorithm()$is_hba1c_over_threshold algorithm()$is_gld_code$logic
This function requires that each source of register data is represented as a single DuckDB object in R (e.g. a connection to Parquet files). Each DuckDB object must contain a single table covering all years of that data source, or at least the years you have and are interested in.
classify_diabetes( lpr, hsr, lab_forsker, bef, lmdb, stable_inclusion_start_date = "1998-01-01" )classify_diabetes( lpr, hsr, lab_forsker, bef, lmdb, stable_inclusion_start_date = "1998-01-01" )
lpr |
The unified LPR register, see |
hsr |
The unified health services registers (SYSI and SSSY), see
|
lab_forsker |
The register for laboratory results for research |
bef |
The BEF table from the civil register |
lmdb |
The LMDB table from the prescription register |
stable_inclusion_start_date |
Cutoff date after which inclusion events are considered true incident diabetes cases. Defaults to "1998-01-01", which is one year after the data on pregnancy events from the Patient Register are considered valid for dropping gestational diabetes-related purchases of glucose-lowering drugs. This default assumes that the user is using LPR and LMDB data from at least Jan 1 1997 onward. If the user only has access to LPR and LMDB data from a later date, this parameter should be set to one year after the beginning of the user's data coverage. |
The same object type as the input data, which would be a
duckplyr::duckdb_tibble() type object.
See the osdc vignette for a detailed description of the internal implementation of this classification function.
# Can't run this multiple times, will cause an error as the table # has already been created in the DuckDB connection. registers <- registers() |> names() |> simulate_registers() |> purrr::map(duckplyr::as_duckdb_tibble) |> purrr::map(duckplyr::as_tbl) lpr <- list( prepare_lpr2(registers$lpr_adm, registers$lpr_diag), prepare_lpr3f(registers$lpr3f_kontakter, registers$lpr3f_diagnoser), prepare_lpr3a(registers$lpr3a_kontakt, registers$lpr3a_diagnose) ) |> join_registers() hsr <- list(registers$sssy, registers$sysi) |> join_registers() classify_diabetes( lpr = lpr, hsr = hsr, lab_forsker = registers$lab_forsker, bef = registers$bef, lmdb = registers$lmdb )# Can't run this multiple times, will cause an error as the table # has already been created in the DuckDB connection. registers <- registers() |> names() |> simulate_registers() |> purrr::map(duckplyr::as_duckdb_tibble) |> purrr::map(duckplyr::as_tbl) lpr <- list( prepare_lpr2(registers$lpr_adm, registers$lpr_diag), prepare_lpr3f(registers$lpr3f_kontakter, registers$lpr3f_diagnoser), prepare_lpr3a(registers$lpr3a_kontakt, registers$lpr3a_diagnose) ) |> join_registers() hsr <- list(registers$sssy, registers$sysi) |> join_registers() classify_diabetes( lpr = lpr, hsr = hsr, lab_forsker = registers$lab_forsker, bef = registers$bef, lmdb = registers$lmdb )
This function generates a list of tibbles representing the Danish health
registers and the data necessary to run the algorithm. The dataset contains
23 individual cases (pnrs), each designed to test a specific logical branch
of the diabetes classification algorithm, including inclusion, exclusion,
censoring, and type classification rules.
The generated data is used in testthat tests to ensure the algorithm
behaves as expected under a wide range of conditions, but it is also intended
to be explored by users to better understand how the algorithm logic works.
edge_cases()edge_cases()
A named list of 9 tibble::tibble() objects, each representing a
different health register: bef, lmdb, lpr_adm, lpr_diag,
lpr3a_kontakt, lpr3a_diagnose, lpr3f_kontakter, lpr3f_diagnoser,
sysi, sssy, and lab_forsker.
edge_cases()edge_cases()
Join prepared registers
join_registers(register_list)join_registers(register_list)
register_list |
A list of the prepared registers, from e.g.
|
A single object with all rows from each register in register_list.
register_data <- simulate_registers(c( "lpr_adm", "lpr_diag", "lpr3f_kontakter", "lpr3f_diagnoser", "sssy", "sysi" )) join_registers(list( prepare_lpr2(register_data$lpr_adm, register_data$lpr_diag), prepare_lpr3f( register_data$lpr3f_kontakter, register_data$lpr3f_diagnoser ) )) join_registers(list(register_data$sysi, register_data$sssy))register_data <- simulate_registers(c( "lpr_adm", "lpr_diag", "lpr3f_kontakter", "lpr3f_diagnoser", "sssy", "sysi" )) join_registers(list( prepare_lpr2(register_data$lpr_adm, register_data$lpr_diag), prepare_lpr3f( register_data$lpr3f_kontakter, register_data$lpr3f_diagnoser ) )) join_registers(list(register_data$sysi, register_data$sssy))
This function generates a list of tibbles representing the Danish health registers and the data necessary to run the algorithm. The dataset contains individuals who should not be included in the final classified cohort.
non_cases()non_cases()
The generated data is used in testthat tests to ensure the algorithm
behaves as expected under a wide range of conditions, but it is also intended
to be explored by users to better understand how the algorithm logic works
and to be shown in the documentation.
A named list of 9 tibble::tibble() objects, each representing a
different health register: bef, lmdb, lpr_adm, lpr_diag,
lpr3a_kontakt, lpr3a_diagnose, lpr3f_kontakter, lpr3f_diagnoser, sysi, sssy, and lab_forsker.
non_cases()non_cases()
non_cases()
All cases, aside from what would exclude them from being classified as described in the metadata here, would otherwise be classified as having diabetes.
non_cases_metadata()non_cases_metadata()
A named list of character strings, where each name corresponds to a
non-case PNR in the dataset generated by non_cases().
non_cases_metadata()non_cases_metadata()
Prepare and join the two LPR2 registers to extract diabetes and pregnancy diagnoses.
prepare_lpr2(lpr_adm, lpr_diag)prepare_lpr2(lpr_adm, lpr_diag)
lpr_adm |
The LPR2 register containing hospital admissions. |
lpr_diag |
The LPR2 register containing diabetes diagnoses. |
The same type as the input data, as a duckplyr::duckdb_tibble(),
with the following columns:
pnr: The personal identification variable.
date: The date of all the recorded diagnosis (renamed from
d_inddto or dato_start).
is_primary_diagnosis: Whether the diagnosis was a primary diagnosis.
is_diabetes_code: Whether the diagnosis was any type of diabetes.
is_t1d_code: Whether the diagnosis was T1D-specific.
is_t2d_code: Whether the diagnosis was T2D-specific.
is_pregnancy_code: Whether the person has an event related to
pregnancy like giving birth or having a miscarriage at the given date.
is_endocrinology_dept: Whether the diagnosis was made by an
endocrinology medical department.
is_medical_dept: Whether the diagnosis was made by a
non-endocrinology medical department.
See the vignette("algorithm") for the logic used to filter these
patients.
Prepare and join the two LPR3A registers to extract diabetes and pregnancy diagnoses.
prepare_lpr3a(lpr3a_kontakt, lpr3a_diagnose)prepare_lpr3a(lpr3a_kontakt, lpr3a_diagnose)
lpr3a_kontakt |
The LPR3A register containing hospital contacts/admissions. |
lpr3a_diagnose |
The LPR3A register containing diabetes diagnoses. |
The same type as the input data, as a duckplyr::duckdb_tibble(),
with the following columns:
pnr: The personal identification variable.
date: The date of all the recorded diagnosis (renamed from
d_inddto or dato_start).
is_primary_diagnosis: Whether the diagnosis was a primary diagnosis.
is_diabetes_code: Whether the diagnosis was any type of diabetes.
is_t1d_code: Whether the diagnosis was T1D-specific.
is_t2d_code: Whether the diagnosis was T2D-specific.
is_pregnancy_code: Whether the person has an event related to
pregnancy like giving birth or having a miscarriage at the given date.
is_endocrinology_dept: Whether the diagnosis was made by an
endocrinology medical department.
is_medical_dept: Whether the diagnosis was made by a
non-endocrinology medical department.
See the vignette("algorithm") for the logic used to filter these
patients.
Prepare and join the two LPR3F registers to extract diabetes and pregnancy diagnoses.
prepare_lpr3f(lpr3f_kontakter, lpr3f_diagnoser)prepare_lpr3f(lpr3f_kontakter, lpr3f_diagnoser)
lpr3f_kontakter |
The LPR3F register containing hospital contacts/admissions. |
lpr3f_diagnoser |
The LPR3F register containing diabetes diagnoses. |
The same type as the input data, as a duckplyr::duckdb_tibble(),
with the following columns:
pnr: The personal identification variable.
date: The date of all the recorded diagnosis (renamed from
d_inddto or dato_start).
is_primary_diagnosis: Whether the diagnosis was a primary diagnosis.
is_diabetes_code: Whether the diagnosis was any type of diabetes.
is_t1d_code: Whether the diagnosis was T1D-specific.
is_t2d_code: Whether the diagnosis was T2D-specific.
is_pregnancy_code: Whether the person has an event related to
pregnancy like giving birth or having a miscarriage at the given date.
is_endocrinology_dept: Whether the diagnosis was made by an
endocrinology medical department.
is_medical_dept: Whether the diagnosis was made by a
non-endocrinology medical department.
See the vignette("algorithm") for the logic used to filter these
patients.
Register variables (with descriptions) required for the osdc algorithm.
registers()registers()
Outputs a list of registers and variables required by osdc. Each list item contains the official Danish name of the register, the start year, the end year, and the variables with their descriptions. Each register item is a list with 4 items:
The official name of the variable found in the register.
The official Danish description of the variable.
The translated English description of the variable.
The data type, e.g. "character" of the variable.
Many of the details within the registers() metadata come
from the full official list of registers from Statistics Denmark (DST):
https://www.dst.dk/extranet/forskningvariabellister/Oversigt%20over%20registre.html
registers()registers()
Simulate a fake data frame of one or more Danish registers
simulate_registers(registers, n = 1000)simulate_registers(registers, n = 1000)
registers |
The name of the register you want to simulate. |
n |
The number of rows to simulate for the resulting register. |
A list with simulated register data, as a tibble::tibble().
simulate_registers(c("bef", "sysi")) simulate_registers("bef")simulate_registers(c("bef", "sysi")) simulate_registers("bef")