Thursday, June 4, 2026

Zojuist heb ik onderstaande brief verzonden aan de voorzitter van de Parlementaire Enquêtecommissie Corona. Het document (pdf) met de bijlagen en referenties is hier te downloaden.

OPEN BRIEF

Aan de Parlementaire Enquêtecommissie Corona

Aan:  De heer D. de Kort, voorzitter

      Parlementaire Enquêtecommissie Corona

      Tweede Kamer der Staten-Generaal

      <email>


Van:  Wouter Aukema, data-analist

      www.aukema.org | X: @waukema | <email>

Datum:  4 juni 2026

Betreft:  Voorstel tot heroproeping mevr. Marion Koopmans


Geachte heer De Kort, geachte commissieleden,

Met deze brief doe ik de commissie het voorstel prof. dr. Marion Koopmans opnieuw op te roepen voor verhoor.

Aanleiding zijn een aantal publiek verifieerbare feiten en documenten die naar mijn oordeel tijdens het verhoor van 29 mei 2026 onvoldoende of geheel niet aan de orde zijn gekomen. Deze feiten hebben betrekking op haar betrokkenheid bij de ontwikkeling, verspreiding en wetenschappelijke onderbouwing van het PCR-testprotocol dat in de vroege fase van de COVID-19-pandemie wereldwijd werd ingezet.

De bevindingen zijn afkomstig uit uitsluitend publiek toegankelijke en volledig verifieerbare bronnen: officiële WHO-documenten, PDF-metadata, wetenschappelijke publicaties, de ondertekende OMT-belangenverklaring en mijn eigen peer-reviewed data-analyse van alle 1.595 Eurosurveillance-publicaties over 2015–2020. De volledige onderbouwing, tijdlijn, verhoorvragen en bronnenlijst zijn opgenomen in de bijlagen bij deze brief.

De kernvraag is niet of deze bevindingen reeds tot conclusies leiden, maar of zij voldoende aanleiding vormen om aanvullende vragen onder ede te stellen. Naar mijn oordeel is dat het geval.

Vijf kernobservaties die tijdens het verhoor onbesproken bleven

1.  PCR-assays ontworpen vóór de officiële genoomsequentie

Het WHO-testprotocol van 13 januari 2020 — waarop mevrouw Koopmans expliciet als co-auteur staat — vermeldt letterlijk: "We designed candidate diagnostic RT-PCR assays before release of the first sequence of the Wuhan virus." De eerste officiële genoomsequentie werd gepubliceerd op 10 januari 2020. Deze passage roept vragen op over de informatiebasis van het protocol en over wat dit netwerk wist vóór die datum. Dit is tijdens het verhoor niet behandeld.

2.  Commerciële uitrol startte vóór enige peer review

Op 16 januari 2020 kondigde Drosten publiekelijk aan dat zijn team reeds orders ontving voor het commerciële controlemolecuul van TIB-Molbiol — vijf dagen vóór indiening van de Eurosurveillance-paper. De tijdlijn is: protocol v1 op 13 januari gedeeld via WHO, eerste orders op 16 januari ontvangen, protocol v2 aangemaakt op 17 januari, paper ingediend op 21 januari, gepubliceerd op 23 januari. Dit roept de vraag op in welke fase de commerciële en operationele uitrol zich bevond voordat de wetenschappelijke publicatie was beoordeeld — en wat een afwijzing van de paper voor die uitrol zou hebben betekend.

3.  Reviewduur van één dag: uitzonderlijk kort

Mijn analyse van alle 1.595 Eurosurveillance-publicaties over 2015–2020 toont dat de Corman-Drosten paper — een volwaardig Research-artikel met IMRAD-structuur — werd ingediend op 21 januari en geaccepteerd op 22 januari: één dag. De gemiddelde reviewduur voor vergelijkbare Research-artikelen bedraagt 86–200 dagen; zelfs Rapid Communications kennen gemiddeld circa 20 dagen. Dit is de enige publicatie van de 1.595 die binnen 24 uur werd geaccepteerd. Deze uitzonderlijke doorlooptijd rechtvaardigt nadere vragen over de gevolgde reviewprocedure.

4.  Verwijzing naar een WHO-document dat gelijktijdig werd aangemaakt

De Corman-Drosten paper verwijst als referentie 5 naar WHO Situation Report 1, gedateerd 21 januari 2020. Analyse van de PDF-metadata van dit WHO-document toont aan dat het op diezelfde dag om 21:30 uur werd aangemaakt. De paper citeert derhalve een document dat op het moment van indiening nog niet publiek beschikbaar was. Dit roept de vraag op hoe de auteurs reeds toegang hadden tot dit WHO-document, en of er sprake was van coördinatie tussen de wetenschappelijke publicatie en de WHO-beleidscommunicatie.

5.  Niet-vermelde betrokkenheid in de OMT-belangenverklaring

De OMT-belangenverklaring van mevrouw Koopmans, ondertekend op 23 januari 2020 en publiek gearchiveerd via de Wayback Machine, vermeldt bij persoonlijke financiële belangen, extern gefinancierd onderzoek en intellectuele belangen telkens: n.v.t. Niet vermeld zijn haar co-auteurschap van WHO-protocol v1 (13 januari), WHO-protocol v2 (17 januari), haar lopende bijdrage aan de Corman-Drosten paper, en de relatie met TIB-Molbiol via co-auteur Olfert Landt. Het RIVM-Addendum vereist uitdrukkelijk dat alle relevante belangen worden vermeld. Op basis van de verstrekte verklaring oordeelde het RIVM: "geen belemmeringen voor deelname aan commissie." Niet onaannemelijk is dat het RIVM anders zou hebben geoordeeld indien deze informatie wél was vermeld.

Waarom een aanvullend verhoor zinvol is

De hierboven genoemde punten raken aan drie onderwerpen die centraal staan in het werk van de commissie:

  • De totstandkoming van wetenschappelijk advies in de vroege fase van de pandemie;
  • De onafhankelijkheid van betrokken deskundigen ten opzichte van de protocollen waarover zij adviseerden;
  • De relatie tussen wetenschappelijke publicatie, commerciële uitrol en institutionele validatie van het diagnostische protocol.

Juist omdat mevrouw Koopmans een prominente rol heeft gespeeld in zowel nationale als internationale adviesstructuren, is het van belang dat eventuele onduidelijkheden volledig worden opgehelderd. De commissie heeft de unieke bevoegdheid en de maatschappelijke plicht om getuigen onder ede te horen en deze vragen te stellen.

En omdat gebleken is dat mevrouw Koopmans zo'n prominente rol heeft gespeeld in de pandemische respons, dient buiten twijfel te worden gesteld dat de verklaringen die zij onder ede heeft afgelegd op waarheid berusten. Vandaar dit voorstel om mevrouw Koopmans opnieuw onder ede te horen.

Bijlagen

Bij deze brief zijn de volgende bijlagen gevoegd:

Bijlage A — Bevindingen en Motivatie (10 bevindingen)

Bijlage B — Voorgestelde vragen

Mijn onderzoeksdata, visualisaties en bronnen zijn publiek toegankelijk via www.aukema.org, X (@waukema) en Tableau Public. Ik ben graag beschikbaar voor nadere toelichting en bereid de commissie te ondersteunen bij de interpretatie van de data.

Deze brief is bewust als open brief opgesteld en wordt gelijktijdig gepubliceerd op aukema.org en X. De reden is dat de bevindingen publiek verifieerbaar zijn en het algemeen belang dienen. Transparantie over zowel de inhoud als het bestaan van dit voorstel acht ik in het belang van de waarheidsvinding.

Hoogachtend,

Wouter Aukema

Data-analist en onafhankelijk onderzoeker

www.aukema.org | X: @waukema | <email>

<adres> Hoenderloo



Tuesday, May 20, 2025

Contacts for EMA and National Competent Authories

Requests to EMA can be entered here: https://www.ema.europa.eu/en/about-us/contacts-european-medicines-agency/send-question-european-medicines-agency
Based on this EMA source: https://www.ema.europa.eu/en/documents/other/detailed-guide-regarding-eudravigilance-data-management-activities-european-medicines-agency_en.pdf
Member State Sender Typical first two sections of Identifier ICSR WWIDs Contact address for ICSRs
Austria BASGAGES AT-BASGAGES- nebenwirkung@basg.gv.at
Belgium AFIGP BE-FAMHP- adr@fagg-afmps.be
Bulgaria BDA BG-BDA- pharmacovig@bda.bg
Croatia ALMP HR-HALMED- nuspojave@halmed.hr
Cyprus CYPPVPR CY-PPVPR- phv@phs.moh.gov.cy
Czech Republic CZSUKL CZ-CZSUKL- el.icsr@sukl.cz
Denmark DKMAEUDRA DK-DKMA- ICSRquality@dkma.dk
Estonia SAM EE-SAM- pharmacovig@ravimiamet.ee
Finland FINAMW FI-FIMEA- fimea.ev@fimea.fi
France AFSSAPS FR-AFSSAPS- anpv@ansm.sante.fr
Germany (BfArM) BFARM DE-ADRED- uaw@bfarm.de
Germany (BfArM) BFARM DE-AMK- uaw@bfarm.de
Germany (BfArM) BFARM DE-BFARM- uaw@bfarm.de
Germany (BfArM) BFARM DE-CADRBFARM- uaw@bfarm.de
Germany (BfArM) BFARM DE-DCGMA- uaw@bfarm.de
Germany (BfArM) BFARM DE-EMBRYOTOX- uaw@bfarm.de
Germany (PEI) PEI DE-AMK- pharmacovigilance1@pei.de
Germany (PEI) PEI DE-DCGMA- pharmacovigilance1@pei.de
Germany (PEI) PEI DE-PEI- pharmacovigilance1@pei.de
Greece GREOF GR-GREOF- ev@eof.gr
Hungary OGYIP HU-OGYI- adr.box@ogyei.gov.hu
Iceland ADALIMCA01 IS-IMA- Aukaverkun@lyfjastofnun.is
Ireland IMB IE-HPRA- medsafety@hpra.ie
Italy MINISAL02 IT-MINISAL02- farmacovigilanza@aifa.gov.it
Latvia LRZBP2005 LV-SAM- info@zva.gov.lv
Liechtenstein KARZNEI LI-
Lithuania SMCAP LT-SMCA- NepageidaujamaR@vvkt.lt
Luxembourg DPM LU-ALMPS- pharmacovigilance@ms.etat.lu
Malta ADM MT-ADM- postlicensing.medicinesauthority@gov.mt
Netherlands CBGMEB NL-LRB- info@lareb.nl
Norway NOMAADVRE NO- adr@noma.no
Norway NOMAADVRE NOMAADVRE- adr@noma.no
Poland URPLWEBP PL-URPL- ndl@urpl.gov.pl
Portugal INFARMED PT-INFARMED- farmacovigilancia@infarmed.pt
Romania NMA RO-NMA- farmacovigilenta@anm.ro
Slovakia SUKLSK SK-SUKLSK- neziaduce.ucinky@sukl.sk
Slovenia ARSZMP SI-JAZMP- h-farmakovigilanca@jazmp.si
Spain AGEMED ES-AEMPS- fvicsr@aemps.es
Sweden SEMPA SE-MPA- Central.Biv@lakemedelsverket.se

Thursday, May 16, 2024

3 Steps to Navigate all Adverse Events Reports registered at EMA

Three steps to navigate all reported adverse events published by the European Medicines Agency from from 2002 until March 27, 2023 for Case Safety Reports submitted by patients and health care providers from both in- and outside the European Economic Area.

1. Install a 14 day free trial version of Tableau Desktop on your computer.

https://www.tableau.com/products/desktop/download

(Do not use the Tableau Public version as it has limited functionality and cannot handle the EMA Dashboard file)


2. Download the eudravigilance_march-2023.twbx file to your computer via:

https://drive.google.com/file/d/16RRvQJZ6VEyAF_Gm4DDPtE200esXTlSM/view?usp=share_link


3. Start Tableau and open eudravigilance_march-2023.twbx

(This is a big dashboard, please allow some time for loading and processing)



Friday, April 19, 2024

EMA Eudravigilance - Dashboard and collector tool


In the recent interview with Jim Ferguson, we discussed a Tableau dashboard I had generated from pharmacovigilance data from the European Medicines Agency (https://www.adrreports.eu/).



For those who wish to download the data, below is a Python script I developed with Twan van der Poel (https://vanderpoelsoftware.com/).

The tool downloads everything, both serious and non-serious line listings and both products and substances.

Later I will post the shell scripts and SQL statements to process the CSV files into PostgreSQL

If you have questions, pls feel free to leave a comment.

Download python script here: https://drive.google.com/file/d/1Zs8HiB3W-bd77OspdTrnhb8WiXsskfTM/view?usp=sharing

The Tableau dashboard is work in progress but can be downloaded here: https://drive.google.com/file/d/16RRvQJZ6VEyAF_Gm4DDPtE200esXTlSM/view?usp=share_link

(download a 14d trial of Tableau to open the dashboard)

import errno, json, hashlib, logging, pprint, urllib.request, re, requests, os, ssl, sys, time, threading, queue

from alive_progress import alive_bar;
from concurrent.futures import ThreadPoolExecutor
from itertools import product
from pathlib import Path
from multiprocessing.pool import ThreadPool
from urllib.parse import urlparse, unquote, urlencode
from requests.utils import requote_uri
from termcolor import colored
"""
# EMA Crawler
Crawls data from adrreports.eu and converts it into
data CSV and TXT files. The script operates from
the working directory and will create a "temp/", "output/" and "downloads/" subdirectory.
The "temp/" subdirectory will contain remote cache files and
an "output/" directory, the latter will contain the output files.
If downloads are performed the downloads are stored in a "downloads/" directory, this
directory holds hash references to keep track of what it's origins.
* Authors: In alphabetical order:
*          Twan van der Poel <twan@vanderpoelsoftware.com>
*          Wouter Aukema <waukema@gmail.com>
* Usage:   python3 ema-crawler.py
* Version: 1.98
# Changelog
## 1.98
* Used new session for each executor.map iteration
* Further integration of multithreading
* Temporary removal of hash-subdirectory
* Added stopwatch
## 1.97
* Dynamic non-configurable download directory
* Recycled session for each .get call
* Removed last bits of BIN_CHROMEDRIVER
* Dedicated function for remote calls
* Further integration of remote HTTP errors
* Resolved several encoding issues
## 1.96
* Simplified target directory based on files' md5 hash
* Wrapped executor in progress-bar
* Added parameter VERBOSE_DEBUGGING
## 1.95
* Added setting for verbose chromedriver logging
* Removed Selenium approach
* Integrated requests.Session()
## 1.94
* Integrated dynamic download storage
* Feature which moves downloads to their final location
* Added setting USE_DOWNLOADS_CACHE
## 1.93
* Integrated generic fatal_shutdown handler
## 1.92
* Headless support for Chromedriver
* Integrated storage filenames in list of downloads
* Integrated filters to capture downloads of more than 250K of lines properly
* Added parameter "MAX_PRODUCT_COUNT" for testing small chunks
* Dynamic construction of URL (incl. filters)
* Cache integration in download directory
* Dynamic construction of filter combinations for Covid related HLC's
## Todo
* Replace hard-coded ID's with detection of 250k limitation
"""
# Settings
MAX_PRODUCT_COUNT    = None
MAX_THREADS          = 20
USE_DOWNLOADS_CACHE  = False
VERBOSE_DEBUGGING    = False
connect_timeout: int = 40
download_timeout: int = 100

# Variables used to split up large downloads
variables = {
    'Seriousness':                                   'Serious',
    'Regulatory Primary Source Country EEA/Non EEA': 'European Economic Area',
    'Primary Source Qualification':                  'Healthcare Professional',
    'Sex':                                           'Female',
}
# Years used to split up large downloads
years = {
    '2017': '2017',
    '2018': '2018',
    '2019': '2019',
    '2020': '2020',
    '2021': '2021',
    '2022': '2022',
    '2023': '2023',
    '2024': '2024',
}
# Timing
executionStart = time.time()
print("Version 1.98")
# Function which handles fatal shutdowns
def fatal_shutdown(errorMessage, identifier):
    print(colored('+--------------------------------------------------+', 'red'))
    print(colored('| EMA crawler crashed, please read the error below |', 'red'))
    print(colored('+--------------------------------------------------+', 'red'))
    print(colored(errorMessage, 'yellow')+" "+colored("("+identifier+")", "white"))
    sys.exit()
# Function which debugs verbose logging (if enabled)
def verbose_debug(logMessage):
    if not VERBOSE_DEBUGGING:
        return
    print(logMessage)
# Helper to exit execution
def exit():
    sys.exit()
# Determine downloads directory
downloadsPath = ("%s%sdl" % (os.getcwd(), os.sep))
# Default EMA DPA base URL
emaBaseUrl = 'https://dap.ema.europa.eu/analytics/saw.dll?Go'
# Prepare thread storage
threadStorage = threading.local()
# Pre-perform validation
if MAX_PRODUCT_COUNT is not None and not isinstance(MAX_PRODUCT_COUNT, int):
    fatal_shutdown("MAX_PRODUCT_COUNT must be one of; numeric, None", "V01")
if not isinstance(MAX_THREADS, int) or MAX_THREADS < 1:
    fatal_shutdown("MAX_THREADS must be numeric, at least 1", "V02")
if not os.path.exists(downloadsPath):
    os.mkdir(downloadsPath)
if not os.path.exists(downloadsPath):
    fatal_shutdown("Downloads directory does not exist (and could not be created)", "V03")
# Ensure working directories
basepath = os.path.dirname(os.path.abspath(sys.argv[0]))+os.sep
workdirs = ['temp'+os.sep, 'output'+os.sep, 'downloads'+os.sep]
for workdir in workdirs:
    fullpath = basepath+workdir
    if not os.path.exists(fullpath):
        os.mkdir(fullpath)
    if not os.access(fullpath, os.W_OK):
        fatal_shutdown(("Working directory %s is not writable" % fullpath), "P01")
# Define important directories
CACHEDIR = basepath+workdirs[0]
OUTPUTDIR = basepath+workdirs[1]
# Function which creates a directory
def create_directory(directory):
    try:
        os.mkdir(directory)
    except OSError as exc:
        if exc.errno != errno.EEXIST:
            raise
        pass
# Function which initiates the thread executor
def initiate_thread_executor(resources):
    # Determine index
    threadStorage.threadIndex = resources.get(False)
    # Open initial session and authenticate
    threadStorage.localSession = requests.Session()
    # Prepare params
    params = {
        'Path':      '/shared/PHV DAP/DAP/Run Line Listing Report',
        'Format':    'txt',
        'Extension': '.csv',
        'Action':    'Extract',
        'P0':         0
    }
    # Initial request
    ema_get_remote(threadStorage.localSession, params)
# Function which fetches an URL and caches it by a hash
def fetch_remote_url(url):
    hash = hashlib.md5(url.encode()).hexdigest()
    cacheFile = CACHEDIR+hash+".html"
    ctx = ssl.create_default_context()
    ctx.check_hostname = False
    ctx.verify_mode = ssl.CERT_NONE
    if os.path.exists(cacheFile):
        verbose_debug("Fetching local from %s" % url)
        contents = Path(cacheFile).read_text()
        return contents.encode('utf-8')
    else:
        verbose_debug("Fetching remote from %s" % url)
        try:
            with urllib.request.urlopen(url, context=ctx) as response:
                content = response.read()
            if not content:
                fatal_shutdown("Unable to fetch remote data from '%s'" % url, "R01")
        except:
            verbose_debug("Remote data is empty '%s'" % url)
            content = "".encode('utf-8')
        f = open(cacheFile, "wb")
        f.write(content)
        f.close()
        return fetch_remote_url(url)
# Function which creates BI filtered URL's
def create_filtered_url(combination):
    filters = combination['filters']
    params = {}
    params['Path']      = '/shared/PHV DAP/DAP/Run Line Listing Report'
    params['Format']    = 'txt'
    params['Extension'] = '.csv'
    params['Action']    = 'Extract'
    paramCounter = 0
    if len(filters) > 0:
        keyName = ('P%d' % paramCounter)
        params[keyName] = len(filters)
        paramCounter = paramCounter + 1
        for filter in filters:
            keyName = ('P%d' % paramCounter)
            params[keyName] = filter[0]
            paramCounter = paramCounter + 1
            keyName = ('P%d' % paramCounter)
            familyName = ('"Line Listing Objects"."%s"' % filter[1])
            params[keyName] = familyName
            paramCounter = paramCounter + 1
            keyName = ('P%d' % paramCounter)
            params[keyName] = (("1 %s" % filter[2]))
            paramCounter = paramCounter + 1
    baseUrl = emaBaseUrl
    url = baseUrl
    for property in params:
        value = params[property]
        params[property] = value
        url += ('&%s=%s' % (property, params[property]))
    return {
        'baseUrl': baseUrl,
        'params': params,
        'url': url
    }
# Function which calls remote with the right session and headers
def ema_get_remote(localSession, localParams):
    # Prepare statics
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
    }
    errorPattern = r'<div class="ErrorMessage">(.*?)</div>'
    # Perform request
    response = localSession.get(emaBaseUrl, params=localParams, headers=headers, timeout=(connect_timeout, download_timeout))
    # Non 200 code?
    if response.status_code != 200:
        print(localParams)
        fatal_shutdown(f"Non remote 200 HTTP code, got {response.status_code}", "EGR01")
    # Explicitly specify the encoding if needed
    responseText = response.content.decode('utf-16', errors='replace')
    # Extract remote error?
    match = re.search(errorPattern, responseText, re.DOTALL)
    if match:
        print(localParams)
        fatal_shutdown(("Remote URL got error-response: %s" % match.group(1)), "EGR02")
    # Here you go, sir.
    return responseText
# Function which process compiled (downloadable) items
def process_compiled_item(item):
    # Prepare target
    hash = hashlib.md5(item['DownloadUrl'].encode()).hexdigest()
    #targetDirectory = "%s%s%s" % (downloadsPath, os.sep, hash)
    targetDirectory = "%s%s" % (downloadsPath, os.sep)
    targetFilename = "%s%s%s" % (targetDirectory, os.sep, item['OutputFile'])
    # Apply cache?
    if USE_DOWNLOADS_CACHE and os.path.exists(targetFilename):
        if not VERBOSE_DEBUGGING:
            bar()
        return
    # Create target directory
    create_directory(targetDirectory)
    # Debug
    verbose_debug("\nProcessing download")
    verbose_debug("-------------------")
    verbose_debug("-> HLC:  "+item['HighLevelCode'])
    verbose_debug("-> URL:  "+item['DownloadUrl'])
    verbose_debug("-> Into: "+targetDirectory)
    verbose_debug("-> File: "+targetFilename)
    # Perform the request
    responseText = ema_get_remote(threadStorage.localSession, item['Params'])
    # Write response body to file
    with open(targetFilename, 'w', encoding='utf-8') as file:
        file.write(responseText)
    verbose_debug("-> READY")
    if not VERBOSE_DEBUGGING:
        bar()
# Function which generates a matrix
def generate_matrix(variables):
    options = [[]]
    for name, value in variables.items():
        newOptions = []
        for option in options:
            newOptions.append(option + [f"{name}={value}"])
            newOptions.append(option + [f"{name}!={value}"])
        options = newOptions
    return options
# Fetch product- and substances list
collection = []
productCounter = 0
breakIteration = False
baseUrls = {}
baseUrls['products']   = 'https://www.adrreports.eu/tables/product/'
baseUrls['substances'] = 'https://www.adrreports.eu/tables/substance/'
chars = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0-9']
for type in baseUrls:
    baseUrl = baseUrls[type]
    baseType = type[:-1].capitalize()
    for char in chars:
        indexUrl = "%s%s.html" % (baseUrl, char)
        content = fetch_remote_url(indexUrl)
        pattern = r"<a\s+.*?</a>"
        matches = re.findall(pattern, str(content))
        for match in matches:
            # Extract CA
            centrallyAuthorized = "1"
            if type == 'substances':
                centrallyAuthorized = "0"
                if re.search(r"\*", match):
                    centrallyAuthorized = "1"
            # Extract URL
            words = match.strip().split('"')
            url = words[1]
            # Extract code
            parsed = urlparse(unquote(url))
            words = parsed.query.split('+')
            code = words[-1]
            # Extract name
            parsed = match.split(">")
            parsed = parsed[1].split("<")
            name = parsed[0]
            # Construct filters:
            #   First parameter is the total count of parameters
            #   Then, groups of 3 parameters are added
            #   Parameters above are respectively: operator, var1, var2 or value
            combinations = []
            # Default definition
            defaultFilters = []
            defaultFilters.append(['eq', baseType+' High Level Code', code])
            defaultFilters.append(['eq', 'Seriousness', 'Serious'])
            # Prepare outputfile
            filterString = ""
            # For Covid related HLC's we chop up the result because of a 250K line listing limit)
            if type == 'substances' and code in ['40983312', '40995439', '42325700', '21755']:
                # Generate matrix
                matrix = generate_matrix(variables)
                # Construct combo
                comboCounter = 0
                for year in years:
                    for square in matrix:
                        filterString = ""
                        filters = defaultFilters.copy()
                        filters.append(['eq', 'Gateway Year', year])
                        for point in square:
                            if re.search('!=', point):
                                variable, value = point.split('!=')
                                operator = 'neq'
                                filterString += "_Not"
                            else:
                                variable, value = point.split('=')
                                operator = 'eq'
                            filters.append([operator, variable, value])
                            filterString += ("_%s" % value)
                        comboCounter = comboCounter + 1
                        outputFile = ("%s_%s_%s%s.csv" % (type, code, year, filterString))
                        targetfile = downloadsPath + '/' + outputFile
                        if not os.path.exists(targetfile):
                            #print(f"{outputFile} exists")
                            #print(f"Working on {outputFile}")
                            combinations.append({'filters': filters, 'outputFile': outputFile, 'code': code, 'type': type})
            else:
                # Default ID match filter
                filters = defaultFilters.copy()
                outputFile = ("%s_%s.csv" % (type, code))
                targetfile = downloadsPath + '/' + outputFile
                if not os.path.exists(targetfile):
                    #print(f"Working on {outputFile}")
                    combinations.append({'filters': filters, 'outputFile': outputFile, 'code': code, 'type': type})
            # Raise product-counter
            productCounter += 1
            # Run every combination
            for combination in combinations:
                # Create URL
                result = create_filtered_url(combination)
                # Replace placeholders
                downloadUrl = result['url']
                downloadUrl = downloadUrl.replace('$hlc', code)
                downloadUrl = downloadUrl.replace('$type', type)
                # Collection storage
                item = {}
                item['HighLevelCode'] = code
                item['ProductName'] = name
                item['CentrallyAuthorized'] = centrallyAuthorized
                item['Type'] = type
                item['BaseUrl'] = result['baseUrl']
                item['IndexUrl'] = indexUrl
                item['DownloadUrl'] = downloadUrl
                item['OutputFile'] = combination['outputFile']
                item['Filters'] = combination['filters']
                item['Params'] = result['params']
                collection.append(item)
                #print(f"HLC: {item['HighLevelCode']}")
                #hier wordt dus de lijst met HLCs opgetuigd
            # Max product counter
            if MAX_PRODUCT_COUNT and productCounter == MAX_PRODUCT_COUNT:
                verbose_debug(("--> Warning: MAX_PRODUCT_COUNT reached = %d" % MAX_PRODUCT_COUNT))
                breakIteration = True
                break
        if breakIteration:
            break
    if breakIteration:
        break
# Preformat output
output = {}
for item in collection:
    type = item['Type']
    if not type in output.keys():
        output[type] = []
    output[type].append([str(item['HighLevelCode']), str(item['ProductName']), str(item['CentrallyAuthorized'])])
# Generate indexed output files
for type in output:
    resultFile = "%s%s.txt" % (OUTPUTDIR, type)
    verbose_debug("Generating output file %s" % resultFile)
    if os.path.exists(resultFile):
        os.remove(resultFile)
    lines = output[type]
    content = ""
    for line in lines:
        content += "\t".join(line)+"\n"
    f = open(resultFile, "wb")
    f.write(content.encode('utf-8'))
    f.close()
# Debug
resultFile = "%surls.txt" % (OUTPUTDIR)
if os.path.exists(resultFile):
    os.remove(resultFile)
verbose_debug("Generating URL output file %s" % resultFile)
content = ""
for item in collection:
    content += "\n"+item['OutputFile']+"\n"+item['DownloadUrl']+"\n"
f = open(resultFile, "wb")
f.write(content.encode('utf-8'))
f.close()
# Run executor
if VERBOSE_DEBUGGING:
    resources = queue.Queue(MAX_THREADS)
    for threadIndex in range(MAX_THREADS):
        resources.put(threadIndex, False)
    with ThreadPool(MAX_THREADS, initiate_thread_executor, [resources]) as pool:
        pool.map(process_compiled_item, collection)
else:
    collectionLength = len(collection)
    with alive_bar(collectionLength) as bar:
        resources = queue.Queue(MAX_THREADS)
        for threadIndex in range(MAX_THREADS):
            resources.put(threadIndex, False)
        with ThreadPool(MAX_THREADS, initiate_thread_executor, [resources]) as pool:
            pool.map(process_compiled_item, collection)
# Fin
if MAX_PRODUCT_COUNT != None:
    print(f"Requested {MAX_PRODUCT_COUNT} files")
runtime = round(time.time() - executionStart)
print(f"Execution finished in ~ {runtime} seconds")

Friday, June 25, 2021

Loading VAERS data into Postgres

For those who want to load VAERS data into Postgres, here's how I did it.

In case you have technical questions -> twitter: @waukema. 

/*

DROP TABLE vaers.symptoms;


CREATE TABLE vaers.symptoms

(

  id serial NOT NULL,

  vaers_id char(7),

  symptom1 character varying,

  symptomversion1 numeric,

  symptom2 character varying,

  symptomversion2 numeric,

  symptom3 character varying,

  symptomversion3 numeric,

  symptom4 character varying,

  symptomversion4 numeric,

  symptom5 character varying,

  symptomversion5 numeric

)

WITH (

  OIDS=FALSE

);



--msdos windows server 2008 postgres 9.3 using gnuwin32 utils:

--cat 2021VAERSSYMPTOMS.csv|psql -h localhost -d postgres -a -c "set client_encoding=latin1; copy vaers.symptoms (VAERS_ID, SYMPTOM1, SYMPTOMVERSION1, SYMPTOM2, SYMPTOMVERSION2, SYMPTOM3, SYMPTOMVERSION3, SYMPTOM4, SYMPTOMVERSION4, SYMPTOM5, SYMPTOMVERSION5) from stdin delimiter ',' CSV HEADER"


create table vaers.allsymptoms as

(

select 

vaers_id,

symptom1 as symptom,

symptomversion1 as symptomversion

from 

vaers.symptoms

where

symptom1 is not null

union all

select 

vaers_id,

symptom2 as symptom,

symptomversion2 as symptomversion

from 

vaers.symptoms

where

symptom2 is not null

union all

select 

vaers_id,

symptom3 as symptom,

symptomversion3 as symptomversion

from 

vaers.symptoms

where

symptom3 is not null

union all

select 

vaers_id,

symptom4 as symptom,

symptomversion4 as symptomversion

from 

vaers.symptoms

where

symptom4 is not null

union all

select 

vaers_id,

symptom5 as symptom,

symptomversion5 as symptomversion

from 

vaers.symptoms

where

symptom5 is not null

);


create index ix_allsymptoms_vaers_id on vaers.allsymptoms using btree (vaers_id);

create index ix_allsymptoms_symptom on vaers.allsymptoms using btree (symptom);


drop table vaers.vax ;

create table vaers.vax 

(

id serial,

VAERS_ID varchar,

VAX_TYPE varchar,

VAX_MANU varchar,

VAX_LOT varchar,

VAX_DOSE_SERIES varchar,

VAX_ROUTE varchar,

VAX_SITE varchar,

VAX_NAME varchar

)

WITH (

  OIDS=FALSE

);


--msdos postgres 9.3

--cat 2021VAERSSYMPTOMS.csv|psql -h localhost -d postgres -a -c "set client_encoding=latin1; copy vaers.symptoms (VAERS_ID, SYMPTOM1, SYMPTOMVERSION1, SYMPTOM2, SYMPTOMVERSION2, SYMPTOM3, SYMPTOMVERSION3, SYMPTOM4, SYMPTOMVERSION4, SYMPTOM5, SYMPTOMVERSION5) from stdin delimiter ',' CSV HEADER"

--cat 2021VAERSVAX.csv|psql -h localhost -d postgres -a -c "copy vaers.vax (VAERS_ID, VAX_TYPE, VAX_MANU, VAX_LOT, VAX_DOSE_SERIES, VAX_ROUTE, VAX_SITE, VAX_NAME) from stdin delimiter ',' CSV HEADER"


create index ix_vax_vaers_id on vaers.vax using btree (vaers_id);


drop table if exists vaers.data;

create table vaers.data

(

id serial,

VAERS_ID varchar,

RECVDATE varchar,

STATE varchar,

AGE_YRS varchar,

CAGE_YR varchar,

CAGE_MO varchar,

SEX varchar,

RPT_DATE varchar,

SYMPTOM_TEXT varchar,

DIED varchar,

DATEDIED varchar,

L_THREAT varchar,

ER_VISIT varchar,

HOSPITAL varchar,

HOSPDAYS varchar,

X_STAY varchar,

DISABLE varchar,

RECOVD varchar,

VAX_DATE varchar,

ONSET_DATE varchar,

NUMDAYS varchar,

LAB_DATA varchar,

V_ADMINBY varchar,

V_FUNDBY varchar,

OTHER_MEDS varchar,

CUR_ILL varchar,

HISTORY varchar,

PRIOR_VAX varchar,

SPLTTYPE varchar,

FORM_VERS varchar,

TODAYS_DATE varchar,

BIRTH_DEFECT varchar,

OFC_VISIT varchar,

ER_ED_VISIT varchar,

ALLERGIES varchar

)

WITH (

  OIDS=FALSE

);


--msdos postgres 9.3

--cat 2021VAERSDATA.csv|psql -h localhost -d postgres -a -c "set client_encoding=latin1; copy vaers.data (VAERS_ID,RECVDATE,STATE,AGE_YRS,CAGE_YR,CAGE_MO,SEX,RPT_DATE,SYMPTOM_TEXT,DIED,DATEDIED,L_THREAT,ER_VISIT,HOSPITAL,HOSPDAYS,X_STAY,DISABLE,RECOVD,VAX_DATE,ONSET_DATE,NUMDAYS,LAB_DATA,V_ADMINBY,V_FUNDBY,OTHER_MEDS,CUR_ILL,HISTORY,PRIOR_VAX,SPLTTYPE,FORM_VERS,TODAYS_DATE,BIRTH_DEFECT,OFC_VISIT,ER_ED_VISIT,ALLERGIES) from stdin delimiter ',' CSV HEADER"

create index ix_data_vaers_id on vaers.data using btree (vaers_id);

*/



select 

vv.VAX_MANU as manufaturer,

vv.VAX_NAME as vax_name,

VAX_TYPE,

RECVDATE,

count(distinct vd.vaers_id) as num_records

from 

vaers.data vd

left join vaers.allsymptoms vs on vs.vaers_id = vd.vaers_id

left join vaers.vax vv on vv.vaers_id = vd.vaers_id

where 

vs.symptom = 'Myocard%' 

--and not vd.died = 'Y'

group by

vv.VAX_MANU,

vv.VAX_NAME,

VAX_TYPE,

RECVDATE

;


-- example:


WITH D as

(

select 

vd.recvdate,

vd.state,

vd.age_yrs,

vd.sex,

vd.recovd,

vd.died,

vd.l_threat,

vd.hospital,

vd.disable,

vd.ofc_visit,

vs.symptom,

vv.vax_manu,

vv.vax_type,

vd.vaers_id

from 

vaers.data vd

left join vaers.allsymptoms vs on vs.vaers_id = vd.vaers_id

left join vaers.vax vv on vv.vaers_id = vd.vaers_id

where

vs.symptom_text ilike '%myocardi%'

)


select 

*

from D

limit 10