Mining Causal Graphs from Patient Records

Mining Causal Graphs from Patient Records

We are looking for two smart and motivated students who want to work with us on a research project, starting as soon as possible. The positions will be for 8 hours a week for a year.

Project description

Electronic patient records are a rich resource of current practice in the health domain; practitioners meticulously record how they diagnosed and treated their patients. Often these records provide a more up­to­date overview of treatment patterns than medical guidelines, as medical practice often differs for good reasons from the idealised guidelines.

In this project, we will create a structured graph representation of medical practitioners’ actions in response to observing particular symptoms, following [Goodwin & Harabagiu, 2014] . This graph will allow us to analyse the types of treatments practitioners choose, and to compare these treatments to those proposed in guidelines. Such a comparison yields either a signal of undesired deviation from the guideline, or a prompt to update the guideline.

For our experiments we have access to anonymised routine healthcare data from the Julius General Practitioners’ Network Database, on close to 500,000 patients from more than 60 primary healthcare centers in the Utrecht region. For guidelines, we will use a selection of the publically available Dutch General Practitioner guidelines (­standaarden), with which we are familiar from earlier work.

In the first part of the project, we will apply the graph induction method from [Goodwin & Harabagiu, 2014] to the Dutch patient records. This will yield the first scientific result of the project by reproducing this novel result from the literature on a new corpus in a different language. The results will serve as a baseline for the next step.

In the second part of the project we will use medical background knowledge (available as linked open data) such as drug descriptions, adverse event reports, rare diseases reports etc. for improving the graph from the first step. The graph constructed with the help of medical background knowledge will be compared with the baseline from the first step.

The assistants will collaborate on two goals:

(1) investigate the applicability of current biomedical text mining approaches to Dutch using the patient records available in the project. The assistants will reuse existing technology developed in the Computational Lexicology and Terminology Lab as well as outside the VU; and

(2) investigate the effect of enriching a purely linguistic analysis with background knowledge. The assistants will use datasets available as linked open data, using technology developed in the KRR Group as well as outside the VU.

How to Apply

This project is a collaboration between Computational Lexicology and Terminology Lab (Dept. of Language and Communication) and the Knowledge Representation and Reasoning group (Dept. of Computer Science).

The supervising team will consist of Marieke van Erp (CLTL group) and Annette ten Teije (KRR group).

If you are interested, send an email to Marieke van Erp <> and Annette ten Teije <> by Wednesday 1 Oct 2014, listing:

– your undergraduate degree,
– the master courses you have taken and intend to take,
– your list of grades (at least for the master courses, if possible also for yor undergraduate courses),
– a brief motivation, and
– an indication of your availability (starting date).


[Goodwin & Harabagiu, 2014] Travis Goodwin and Sanda M. Harabagiu (2014) Clinical Data­Driven Probabilistic Graph Processing. In: Proceedings of the Language Resources and Evaluation Conference (LREC), 2014.

Leave a Reply

Your email address will not be published. Required fields are marked *