Add your Insight
01 May 2021
Analytical advances through open science: Employing a reference dataset to foster best-practice data validation, analysis, and reporting
Today’s study of digital journalism entails a plethora of new forms of research data. For example, traditional survey data can be accompanied by large-scale trace data, full-text availability allows for transnational comparative analyses of news articles, and social-media engagement opens up for sophisticated insights into news consumption patterns around the globe. This allows for adequately capturing diverse online phenomena, such as multi-channel dissemination, segmented audiences, or social-media fragmentation, both between national media markets and within them.
Yet, despite significant developments in data capturing and open-science endeavors (Dienlin et al., 2020; Haim, 2020; van Atteveldt et al., 2020), such new forms of combined and large-scale datasets pose a series of challenges also to our field’s analytics. That is, common procedures, statistics, and coefficients of survey, observational, or content-analytic methods cannot simply be applied to such data due to a wide variety of reasons (e.g., validity, reliability, assumption violations, unknown populations). Moreover, open-science principles of pre-registration and sharing (data, materials, trained models) to foster replication may violate privacy or copyright law, particularly when it comes to trace data or news content, or pose considerable ethical challenges. Consequently, such new forms of data are easily kept behind closed doors. Empirical digital journalism research thus not only lacks guidance for adequate validation and analyses but also for adequate sharing and replicability of modern data.
This special issue of Digital Journalism invites scholars to present, provide, and discuss best practices of analyses while implementing open-science principles for the study of online news. While being open to a plethora of submissions, seminal ideas for contributions include to …
• investigate ways to analyze and compare multilingual news content
• discuss pathways to empirically describe comparisons of news consumption across time, settings, levels, or media systems
• simulate outcomes of different approaches to measure news-use fragmentation
• introduce or compare validity measures for third-party datasets on online news
• rethink ways to conduct and report combined manual and automated content analyses
• rethink ways to conduct and report combinations of survey and trace data
• propose modes of comparable analyses for the study of engagement data
• enrich or combine various openly available data sources
To make this special issue also a reference example for open science, we provide an extensive reference dataset for the study of online news, specifically put together for this special issue (Puschmann & Haim, 2020). Authors of accepted proposals will be invited to develop their outline and hypotheses to register them with the guest editors. Then, authors submit their complete manuscript which, together with pre-registered materials, will undergo full blind review in accordance with the journal’s peer-review procedure.
Information about dataset
We have put together the unique useNews dataset (Puschmann & Haim, 2020). The dataset can (but does not have to) serve as reference dataset for submissions. It consists of data from three sources.
First, original survey data from the Reuters Digital News Reports of both 2019 and 2020 (Newman et al., 2019, 2020) provides online news outlets used by at least 10 percent of respondents for each of 12 countries (i.e, Australia, Austria, Brazil, Germany, Japan, the Netherlands, Norway, Romania, South Korea, Spain, the UK, and the US) in both years; this selection subsumes 9 languages, a broad variety of global regions and media systems, and is accompanied by well-known variables from the report, such as sociodemographic information, political orientation, or willingness to pay.
Second, 1.74 million news articles published by 76 of these news outlets were collected between August 2018 and August 2019 and a similar amount is currently being collected for August 2019 until August 2020 through the MediaCloud API, an open-source platform for media analysis jointly provided by the MIT Center for Civic Media and the Berkman Klein Center for Internet & Society at Harvard University. Data includes the publication date, author, and topical keywords as well as the articles’ textual content as document term matrices (DTMs; full texts are unavailable due to legal provisions), enabling analyses of the textual content of articles.
Third, for each individual article URL, an array of engagement metrics is included from CrowdTangle, a subsidiary of Facebook. Specifically, useNews includes aggregate numerical engagement data on the number of likes, shares, and comments, as well as reactions of love, wow, haha, sad, and angry.
Looking to Publish your Research?
We aim to make publishing with Taylor & Francis a rewarding experience for all our authors. Please visit our Author Services website for more information and guidance, and do contact us if there is anything we can help with!
Proposals should include an abstract of 500 words (not including references) as well as a full list of author(s) with affiliation(s) and abbreviated bio(s). Please submit your proposal as one file (PDF) with your names clearly stated on the first page. Send your proposal to [email protected] by December 1, 2020. Notifications of proposal acceptance will be sent by December 20, 2020. Authors of accepted proposals are invited to register their outline and hypotheses with the guest editors. This additional step of intervention is also to identify duplicate endeavors, for which the guest editors might suggest cooperation between contributors or ask respective contributors to take “replicative note” of the other submission. Finally, authors are expected to submit their original article for full blind review in accordance with the journal’s peer-review procedure by April 20, 2021. Article submissions should target a length of 7,000-9,000 words.
View the latest tweets from djeditorialteam