Pages

Tuesday, April 20, 2021

Analysis of ICSR reports at ema.europa.eu

Technical documentation to the article titled:

"The Costs of Covid-19 Vaccinations – Should we Rethink the Policy?"

This post provides documention for the analysis process that was performed with the abovementioned article (this link), as well as some additional visualisations produced from the data downloaded from EMA and ECDC.

Vaccination Volume Data from ECDC

The data for COVID-19 vaccination volumes can be found here:

https://www.ecdc.europa.eu/en/publications-data/data-covid-19-vaccination-eu-eea   


Individual Case Safety Reports from EMA

1. To download all ICSR reports, follow the steps below.

Open http://www.adrreports.eu/en/search_subst.html, navigate to 'C' and search for COVID-19.




 





To download ICSR reports for ALL forms of medication, one would need to visis the 4,000+ entries listed. As a workaround, I have a shortcut described below.

Since EMA does not allow the public to download all ICSR reports for all forms of medication in one go, I came up with a work around, by manipulating the url for entering their Oracle back end. By simply not specifying the identifier for a parrticular substance, all ICSR reports on all forms of medication get shown.

Shortcut to all Substances (incl. COVID19-vaccines)

Generate and download the ICSR Line listings for COVID, Measles and Influenza vaccines as well as the shortcut for all forms of medication, and repeat steps 2, 3 and 4 explained below, for each of the vaccines:

2. Export the ICSR reports to TAB-delimited text files as follows:

Navigate to the tab labelled 'Line Listing'.







For Geographic Region, select 'European Economic Area'.

Next, click on 'Run Line Listing Report'. This will open a new window and present you with the ICSR reports in tabular format. This may take a while.

Navigate all the way to the bottom and start the export as shown in the image below:











The export will be saved to the default Downloads folder of your web-browser.

3. Convert and rename all exported files to UTF-8 encoding (e.g. using Notepad++ for Windows).

In the instructions below, I used a naming convention as follows: MMMDD_export_[substance].csv

4. Load the exported file into PostgreSQL

Create a schema and table in PostgreSQL (I use an old version 9.1 on an old Windows Server 2008)

CREATE SCHEMA ema;
CREATE TABLE ema.sta_icsr
(
  source varchar,
  elocnr character varying,
  reporttype character varying,
  evgatewayrecdate character varying,
  primarysourcequalification character varying,
  primarysourcecountryforregulatorypurposes character varying,
  literaturereference character varying,
  patientagegroup character varying,
  patientagegroupperreporter character varying,
  parentchildreport character varying,
  patientsex character varying,
  reactionlist character varying,
  suspectdruglist character varying,
  concomitant character varying,
  icsrform character varying
)
WITH (
  OIDS=FALSE);

Next, load all ICSR exports into the table. I used the following DOS command:

for /F %f in ('dir/b APR13_*.csv') do tail -n +2 %f|sed -e s#^^#%f\t#|sed -e s#\xED\xAF\x80#t#g -e s#\xED\xB6\xA9#t#g -e s#\xed\xb6\x9f#i#g -e s#\\#\\\\#g|psql -h localhost -U postgres -a -c "copy ema.sta_icsr from stdin"

Description of the DOS command: For each export file (%f), skip header, add a column identifying the filename, replace some unreadable characters, prepend backslashes with a backslash and tell psql to execute the copy command.

5. De-duplication of records

To de-duplicate, we need a table that holds only unique entries. Also, we need to give priority / precedence to specific substances, before we load the rest (all substances thru that manipulated url).

create table ema.dwh_geneloc (
id serial,
source varchar,
elocnr varchar
)
without oids;

insert into ema.dwh_geneloc (source, elocnr)
select
source, elocnr 
from 
ema.sta_icsr
where 
not (elocnr) in (select elocnr from ema.dwh_geneloc group by elocnr)
and source like '%pfizer%'
;
insert into ema.dwh_geneloc (source, elocnr)
select
source, elocnr 
from 
ema.sta_icsr
where 
not (elocnr) in (select elocnr from ema.dwh_geneloc group by elocnr)
and source like '%astrazeneca%'
;
insert into ema.dwh_geneloc (source, elocnr)
select
source, elocnr 
from 
ema.sta_icsr
where 
not (elocnr) in (select elocnr from ema.dwh_geneloc group by elocnr)
and source like '%moderna%'
;
insert into ema.dwh_geneloc (source, elocnr)
select
source, elocnr 
from 
ema.sta_icsr
where 
not (elocnr) in (select elocnr from ema.dwh_geneloc group by elocnr)
and source like '%janssen%'
;
insert into ema.dwh_geneloc (source, elocnr)
select
source, elocnr 
from 
ema.sta_icsr
where 
not (elocnr) in (select elocnr from ema.dwh_geneloc group by elocnr)
and source like '%MMR%'
;
insert into ema.dwh_geneloc (source, elocnr)
select
source, elocnr 
from 
ema.sta_icsr
where 
not (elocnr) in (select elocnr from ema.dwh_geneloc group by elocnr)
and source like '%Influenza-split-virion-inactivated%'
;
insert into ema.dwh_geneloc (source, elocnr)
select
source, elocnr 
from 
ema.sta_icsr
where 
not (elocnr) in (select elocnr from ema.dwh_geneloc group by elocnr)
and source like '%influenza-live-nasal%'
;
insert into ema.dwh_geneloc (source, elocnr)
select
source, elocnr 
from 
ema.sta_icsr
where 
not (elocnr) in (select elocnr from ema.dwh_geneloc group by elocnr)
and source like '%all_all%'
;
6. Create some indexes to speed up query performance

create index ix_staicsr_source on ema.sta_icsr using btree (source);
create index ix_staicsr_elocnr on ema.sta_icsr using btree (elocnr);
create index ix_geneloc_source on ema.dwh_geneloc using btree (source);
create index ix_geneloc_elocnr on ema.dwh_geneloc using btree (elocnr);

7. Run query and export results to a text file
select
i.source,
split_part(split_part(i.source, 'apr13_', 2), '_', 1) as product,
i.elocnr,
date(to_timestamp(EVGwayRecDate, 'yyyy-mm-dd')) as recvd_date,
patientagegroup,
patientsex,
case 
when primarysourcequalification like 'Non Health%' then false
else true
end as hcp_reported,
replace(split_part(unnest(string_to_array(reactionlist, ',<BR><BR>')), E'\x20\(', 1), '"', '') as Adr_Text,
split_part(replace(split_part(unnest(string_to_array(reactionlist, ',<BR><BR>')), E'\x20\(', 2), '"', ''), ' - ', 2) as Adr_Outcome
from
ema.dwh_geneloc e
join ema.sta_icsr i on e.source = i.source and e.elocnr = i.elocnr

Save the query output to disk, so that you can pick ip up with a BI-tool, such as Tableau.

8. To select only Fatal reactions, this query will do the job:
WITH all_icsr AS
(
select
i.source,
split_part(split_part(i.source, 'apr13_', 2), '_', 1) as product,
i.elocnr,
date(to_timestamp(EVGwayRecDate, 'yyyy-mm-dd')) as recvd_date,
patientagegroup,
patientsex,
case 
when primarysourcequalification like 'Non Health%' then false
else true
end as hcp_reported,
replace(split_part(unnest(string_to_array(reactionlist, ',<BR><BR>')), E'\x20\(', 1), '"', '') as Adr_Text,
split_part(replace(split_part(unnest(string_to_array(reactionlist, ',<BR><BR>')), E'\x20\(', 2), '"', ''), ' - ', 2) as Adr_Outcome
from
ema.dwh_geneloc e
join ema.sta_icsr i on e.source = i.source and e.elocnr = i.elocnr
)

select
    *
from
    all_icsr
where
    adr_outcome ilike '%fatal%'

9. Beware of difference between ICSR and ADR

Within EMA, an ICSR report relates to a patient / person.
An ICSR report can record multiple reactions and corresponding outcomes.

In counting the number of deaths per 100K vaccines, you should therefore not count number of reactions with fatal outcome, as this could result in multiple fatal reactions per patient.

In the article, we have used the distinct count of unique ICSR reports. 
The query below illustrates the difference between the two measures:

with all_reports as 
(
select
i.elocnr,
split_part(split_part(i.source, 'apr13_', 2), '_', 1) as product,
split_part(replace(split_part(unnest(string_to_array(reactionlist, ',<BR><BR>')), E'\x20\(', 2), '"', ''), ' - ', 2) as adr_outcome
from
ema.dwh_geneloc e
join ema.sta_icsr i on e.source = i.source and e.elocnr = i.elocnr
)
select 
product,
count(*) as num_adr,
count(distinct elocnr) as num_icsr
from 
all_reports
where 
not product = 'all'
adr_outcome ilike '%fatal%'
group by
product