Research Data

SHAPE-ID participates in the Horizon 2020 Open Data Research Pilot and is committed to making research data openly available under the FAIR data principles (findable, accessible, interoperable and reusable) where possible. Project data published so far is described below and is accessible through our Zenodo community.

SHAPE-ID Literature Review query strings for Web of Science and Scopus

Authors: Wciślik, Piotr, Maryl, Maciej, Vienni Baptista, Bianca, & Schriber, Lucien

Work Package: 2

Published: 7 September 2020

DOWNLOAD LINK

Background and methodology:

The document contains query strings which were used to generate bibliographic datasets in Scopus and Web of Science (WoS) databases for the qualitative and quantitative literature review in the framework of the SHAPE-ID project.

Given that WoS and Scopus offer homologous search and filtering functionalities, same method was used to query both databases. The method consisted in four steps, each consisting of defining constraints that enabled to filter-down the resources in order to arrive at an optimal data set.

STEP 1. First, we created the long list of literature. To do that, we combined project-relevant sets of keywords (Sets A-D) into doubles and triplets using proximity search operators in both databases and searching in title, abstract and keyword fields, according to the following schema:

DOUBLES: TRIPLETS:

B NEAR/1 A (B NEAR/3 C) NEAR/3 A

C NEAR/1 A (D NEAR/3 C) NEAR/3 A

D NEAR/1 A (B NEAR/3 D) NEAR/3 A

This method enabled us to query the databases in a more precise way, using key-phrases consisting of two interrelated words or “doubles” (e.g. “transdisciplinary approach”) , and three interrelated words or “triplets” (e.g. “interdisciplinary research policy ”), instead of keywords that can appear without specific interrelation (e.g. when using boolean operator AND instead of proximity operator, we would get results even if words “interdisciplinary”, “research” and “policy” appear in random places in the same abstract, title and keywords).

STEP 2. Next, within that long list, we looked (again in title, abstract and keywords) for items that either discuss understandings of interdisciplinarity and transdisciplinarity (keyword set E), or relate to factors or indicators of success or failure (sets E and G correlated through proximity search), according to the following schema:

E OR (F NEAR/1 G).

STEP 3. We further limited the dataset to only those items that have keywords from set A (interdisciplinarity and transdisciplinarity in different variants) in the title – considered as a marker of strong relationship to the subject matter.

STEP 4. Finally, we added the he following constraints:

Temporal scope: 1990-present

Document Language: English

Document Type: Articles, Books and Book Chapters, Editorials

Description of the file:

This is a text file containing keyword strings used to query WoS and Scopus databases.

SHAPE-ID Literature Review dataset: bibliography on IDR/TDR

Authors: Wciślik, Piotr, Maryl, Maciej, Vienni Baptista, Bianca, & Schriber, Lucien

Work Package: 2

Published: 7 September 2020

DOWNLOAD LINK

Background and methodology:

The dataset consists of 5040 records of publication metadata (author, abstract, title, keywords, tags etc.), produced for the purposes of the systematic literature review in the framework of the SHAPE-ID project.

In the course of the review Project team queried Web of Science (WoS), Scopus and JSTOR databases for records on interdisciplinarity and transdisciplinarity (IDR/TDR). In the case of WoS and Scopus, complex search strings were created to reflect the main research questions of the Literature Review: different understandings of IDR/TDR and factors and indicators of success or failure of integration of IDR/TDR in research and research policy. JSTOR database offers less advanced data-analytical tools, but the project team decided to include items that have interdisciplinarity or transdisciplinarity in the title, to counterbalance the reported biases against Arts, Humanities and Social Sciences in Scopus and WoS. These three data sources were complemented with bibliographies prepared during the preliminary scoping analysis of IDR/TDR literature. The query results were compiled in reference managers Zotero and Endnote. During data processing the records were normalized and duplicates were removed.

Based on systematic review, a sample of the literature had been selected for qualitative analysis. At the same time, the bibliographic metadata was analysed with computationally assisted quantitative methods.

Description of the file:

This is a csv file exported from the Zotero database, and formatted according to the Zotero metadata model. It contains a collection of 5040 bibliographic records compiled for the purpose of the SHAPE-ID Literature Review.

SHAPE-ID Literature Review dataset: subject co-occurrence matrix

Authors: Wciślik, Piotr, Maryl, Maciej, Vienni Baptista, Bianca, & Schriber, Lucien

Work Package: 2

Published: 7 September 2020

DOWNLOAD LINK

Background and methodology:

The subject co-occurrence matrix represents the pairs of All Science Journal Classification (ASJC) disciplines that co-occur in journals represented in the SHAPE-ID Literature Review dataset, prepared for the purposes of quantitative analysis.

We take disciplinary affiliations of journals as a proxy of disciplinary characteristics of the journal articles in the Literature Review dataset, mindful of the fact that a particular article might deviate from the disciplinary affiliation of the journal in which it was published. However, since there was no data readily available on item level, and manual disciplinary encoding of all the items in the bibliography was beyond the scope of this study, the method used is the best approximation of the presence of discourse on interdisciplinarity and transdisciplinarity, in and between disciplines.

In the matrix, each co-occurrence value is weighted by the number of journals that feature the given pair of disciplines, and by the number of articles represented in the dataset that feature in these journals. E.g. if Journals J1 and J2 each featured disciplines D1 and D2, and if 4 articles from J1 and 7 articles from J2 are represented in the SHAPE-ID Literature Review dataset, the co-occurrence value is 11.

The pairings cross-referencing a single discipline (e.g. 1202 History in both first row and first column) correspond to the co-occurence value of mono-disciplinary journals.

Description of the file:

This is a csv file containing a 308×308 cell matrix with ASJC disciplines in first rows and columns, and co-occurrence value in the remaining cells.

SHAPE-ID Literature Review dataset: journal occurrences with ASJC codes

Authors: Wciślik, Piotr, Maryl, Maciej, Vienni Baptista, Bianca, & Schriber, Lucien

Work Package: 2

Published: 7 September 2020

DOWNLOAD LINK

Background and methodology:

The dataset consists of a list of 2202 journal titles represented in the SHAPE-ID Literature Review bibliography, prepared for the purposes of quantitative analysis.

The list of journals is based on 3955 journal articles in the bibliography dataset that had an International Standard Serial Number (ISSN). To each journal title the project team attributed:

a weight factor based on how many articles from the given journal featured in bibliography dataset;
at least one All Science Journal Classification (ASJC) code, representing different scientific disciplines;
a country of publication.

For 1853 of those journal titles, the attribution was automatised (we matched the ISSNs of journal titles in our sample against the Scopus Sources list from February 2019). For the remaining 349 titles the attribution was accomplished manually, based on the information available in SCOPUS, Web of Science, JSTOR, Information Matrix for the Analysis of Journals (MIAR) and ISSN databases.

Description of the file:

This is a csv file containing a list of 2202 journal titles represented in the SHAPE-ID Literature Review bibliography, with country of publication and ASJC codes assigned.

The file is formatted as follows:

Column A: ISSN of the journal

Column B: information on how country and ASJC codes were attributed. Value “N” indicates automatic attribution based on match with Scopus list of sources. Other values indicate manual attribution. Values WOS, SCOPUS, JSTOR indicate source of information. Valu “Y” indicates that information was compiled based on multiple sources.

Column C: numeric values correspond to the weight factor, i.e. number of time articles from each journal featured in the SHAP-ID Literature Review bibliography.

Column D: SHAPE-ID Zotero bibliography identifier.

Column E: Journal title

Column F: The country of publication

Columns G-AD: ASJC codes (numeric and word values) associated with journal entries.

specifically relevant for integrating AHSS in IDR/TDR.

Inter- and transdisciplinary projects in FP7 and H2020 (May 2019)

Authors: Maryl, Maciej & Wciślik, Piotr

Work Package: 2

Published: 7 September 2020

DOWNLOAD LINK

Background and methodology:

The metadata of interdisciplinary (IDR) and transdisciplinary (TDR) projects conducted under the European Union framework programs (FP7 & Horizon 2020) were collected from the Cordis database (https://cordis.europa.eu/). SHAPE-ID research team used periodic data dumps, stored in EU open data portal (https://data.europa.eu/euodp/en/data/dataset/cordisfp7projects and https://data.europa.eu/euodp/en/data/dataset/cordisH2020projects).

The data dump from May 2019 was used, so the FP7 database is complete while H2020 projects were still being added periodically.

CORDIS files were subsequently queried for interdisciplinar* or transdicsiplinar*, matched against title or abstract (“objective”). This procedure allowed for creating two subsets:

FP7_projects_May2019_IDR_TDR.csv 1750 FP7 projects. Out of 1699 IDR projects, interdisciplinar* featured in 40 titles and 1679 abstracts. Out of 56 TDR projects transdisciplinar* featured in 2 project titles and 54 abstracts.

1912 H2020 projects (as of May 2019). Out of 1837 IDR projects, interdisciplinar* featured in 57 titles and 1820 abstracts. Out of 85 TDR projects transdisciplinar* featured in 2 project titles and 85 abstracts.

Description of the files

CSV files contain the same fields as CORDIS database data dumps: id, acronym, status, programme, topics, framework Programme, title, startDate, endDate, projectUrl, objective, totalCost, ecMaxContribution, call, fundingScheme, coordinator, coordinatorCountry, participants, participantCountries, subjects.

Additional fields:

IDR – project features interdiciplinary research (1 = yes, 0 = no)

TDR – project features transdiciplinary research (1 = yes, 0 = no)

Title_Interdisciplinar* – frequency of “interdisciplinar*” in the project title

Objective_interdisciplinar*- frequency of “interdisciplinar*” in the project objective

Title_transdisciplinar* – frequency of “interdisciplinar*” in the project title

Obj_transdisiplinar* – frequency of “interdisciplinar*” in the project objective

Reference data (countries, funding schemes/types of action, subjects (SIC codes)) can be found in this dataset: https://data.europa.eu/euodp/en/data/dataset/cordisref-data

vant for integrating AHSS in IDR/TDR.