publications | Marco A. Stranisci

2025

What Are They Filtering Out? A Survey of Filtering Strategies for Harm Reduction in Pretraining Datasets

Marco Antonio Stranisci and Christian Hardmeier

arXiv preprint arXiv:2503.05721, 2025
Are you sure? Measuring models bias in content moderation through uncertainty

Alessandra Urbinati, Mirko Lai, Simona Frenda, and 1 more author

In Findings of the Association for Computational Linguistics: EMNLP 2025, Nov 2025

Abs DOI

Automatic content moderation is crucial to ensuring safety in social media. Language Model-based classifiers are increasingly adopted for this task, but it has been shown that they perpetuate racial and social biases. Even if several resources and benchmark corpora have been developed to challenge this issue, measuring the fairness of models in content moderation remains an open issue. In this work, we present an unsupervised approach that benchmarks models on the basis of their uncertainty in classifying messages annotated by people belonging to vulnerable groups. We use uncertainty, computed by means of the conformal prediction technique, as a proxy to analyze the bias of 11 models (LMs and LLMs) against women and non-white annotators and observe to what extent it diverges from metrics based on performance, such as the F1 score. The results show that some pre-trained models predict with high accuracy the labels coming from minority groups, even if the confidence in their prediction is low. Therefore, by measuring the confidence of models, we are able to see which groups of annotators are better represented in pre-trained models and lead the debiasing process of these models before their effective use.
That is Unacceptable: the Moral Foundations of Canceling

Soda Marem Lo, Oscar Araque, Rajesh Sharma, and 1 more author

In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul 2025

Abs DOI

Canceling is a morally-driven phenomenon that hinders the development of safe social media platforms and contributes to ideological polarization. To address this issue we present the Canceling Attitudes Detection (CADE) dataset, an annotated corpus of canceling incidents aimed at exploring the factors of disagreements in evaluating people’s canceling attitudes on social media. Specifically, we study the impact of annotators’ morality in their perception of canceling, showing that morality is an independent axis for the explanation of disagreement on this phenomenon. Annotator’s judgments heavily depend on the type of controversial events and involved celebrities. This shows the need to develop more event-centric datasets to better understand how harms are perpetrated in social media and to develop more aware technologies for their detection.
Subjectivity in stereotypes against migrants in italian: An experimental annotation procedure

Soda Marem Lo, Marco A Stranisci, Alessandra Teresa Cignarella, and 5 more authors

In Proceedings of the 11th Italian Conference on Computational Linguistics (CLiC-it 2025), CEUR Workshop Proceedings, Cagliari, Italy, Jul 2025
Beyond the Metrics: an Investigation into the Reliability of Evaluation Metrics for Domain Specific Graph-based Question Answering

Lia Draetta, Marco Antonio Stranisci, Flaviana Corallo, and 4 more authors

Jul 2025
Curated datasets for literary tourism: a case study in knowledge graph creation

Miriam Begliuomini, Marius Crisan, Enrico Daga, and 5 more authors

Jul 2025

2024

How do we counter dangerous speech in Italy?

Vittoria Tonini, Simona Frenda, Marco Antonio Stranisci, and 2 more authors

In CEUR Workshop Proceedings, Jul 2024
The Vulnerable Identities Recognition Corpus (VIRC) for Hate Speech Analysis

Ibai Guillén-Pacho, Arianna Longo, Marco Antonio Stranisci, and 3 more authors

In CEUR WORKSHOP PROCEEDINGS, Jul 2024
Preface to the Eighth Workshop on Natural Language for Artificial Intelligence (NL4AI)

G Bonetta, CD Hromei, L Siciliani, and 2 more authors

In Preface to the Eighth Workshop on Natural Language for Artificial Intelligence (NL4AI), Jul 2024
NL4AI 2024: Overview of the Eighth Workshop on Natural Language for Artificial Intelligence (NL4AI 2024)

Giovanni Bonetta, Claudiu Daniel Hromei, Lucia Siciliani, and 2 more authors

In CEUR WORKSHOP PROCEEDINGS, Jul 2024
Dealing with Controversy: An Emotion and Coping Strategy Corpus Based on Role Playing

Enrica Troiano, Sofie Labat, Marco Antonio Stranisci, and 3 more authors

In Findings of the Association for Computational Linguistics: EMNLP 2024, Nov 2024

Abs DOI

There is a mismatch between psychological and computational studies on emotions. Psychological research aims at explaining and documenting internal mechanisms of these phenomena, while computational work often simplifies them into labels. Many emotion fundamentals remain under-explored in natural language processing, particularly how emotions develop and how people cope with them. To help reduce this gap, we follow theories on coping, and treat emotions as strategies to cope with salient situations (i.e., how people deal with emotion-eliciting events). This approach allows us to investigate the link between emotions and behavior, which also emerges in language. We introduce the task of coping identification, together with a corpus to do so, constructed via role-playing. We find that coping strategies realize in text even though they are challenging to recognize, both for humans and automatic systems trained and prompted on the same task. We thus open up a promising research direction to enhance the capability of models to better capture emotion mechanisms from text.
Semantic-Aware Methods for the Analysis of Bias and Underrepresentation in Language Resources

Marco Antonio Stranisci and others

Nov 2024
Dissecting biases in relation extraction: A cross-dataset analysis on people’s gender and origin

Marco Antonio Stranisci, Pere-Lluı́s Huguet Cabot, Elisa Bassignana, and 1 more author

In Proceedings of the 5th Workshop on Gender Bias in Natural Language Processing (GeBNLP), Nov 2024
Human Robot Interaction through an ontology-based dialogue engine

Alessandro Saracco, Alberto Lillo, Marco Stranisci, and 1 more author

In Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, Nov 2024
O-dang at HODI and HaSpeeDe3: A knowledge-enhanced approach to homotransphobia and hate speech detection

Chiara Di Bonaventura, Arianna Muti, and Marco Antonio Stranisci

In EVALITA Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian Final Workshop: Parma, Italy, September 7-8th, 2023, Nov 2024

2023

Debunker Assistant: a support for detecting online misinformation

Arthur Thomas Edward Capozzi Lupi, Alessandra Teresa Cignarella, Simona Frenda, and 3 more authors

In Proceedings of the 9th Italian Conference on Computational Linguistics (CLiC-it 2023), Nov 2023
The world literature knowledge graph

Marco Antonio Stranisci, Eleonora Bernasconi, Viviana Patti, and 3 more authors

In International Semantic Web Conference, Nov 2023
WikiBio: a Semantic Resource for the Intersectional Analysis of Biographical Events

Marco Antonio Stranisci, Rossana Damiano, Enrico Mensa, and 3 more authors

In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul 2023

Abs DOI

Biographical event detection is a relevant task that allows for the exploration and comparison of the ways in which people’s lives are told and represented. This may support several real-life applications in digital humanities and in works aimed at exploring bias about minoritized groups. Despite that, there are no corpora and models specifically designed for this task. In this paper we fill this gap by presenting a new corpus annotated for biographical event detection. The corpus, which includes 20 Wikipedia biographies, was aligned with 5 existing corpora in order to train a model for the biographical event detection task. The model was able to detect all mentions of the target-entity in a biography with an F-score of 0.808 and the entity-related events with an F-score of 0.859. Finally, the model was used for performing an analysis of biases about women and non-Western people in Wikipedia biographies.
Chapter An experimental annotation task to investigate annotators’ subjectivity in a Misogyny dataset

Alice Tontodimamma, Elisa Ignazzi, Stefano Anzani, and 3 more authors

In ASA 2022 Data-Driven Decision Making, Jul 2023
User-generated world literatures: a comparison between two social networks of readers

Marco Antonio Stranisci, Viviana Patti, Rossana Damiano, and 1 more author

In CEUR WORKSHOP PROCEEDINGS, Jul 2023

2022

Comparison of two annotation schemes to derive offensiveness scores in HurtLex.

Alice Tontodimamma, Stefano Anzani, Valerio Basile, and 3 more authors

In JADT 2022: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON STATISTICAL ANALYSIS OF TEXTUAL DATA, Jul 2022
APPReddit: a Corpus of Reddit Posts Annotated for Appraisal

Marco Antonio Stranisci, Simona Frenda, Eleonora Ceccaldi, and 3 more authors

In Proceedings of the Thirteenth Language Resources and Evaluation Conference, Jun 2022

Abs

Despite the large number of computational resources for emotion recognition, there is a lack of data sets relying on appraisal models. According to Appraisal theories, emotions are the outcome of a multi-dimensional evaluation of events. In this paper, we present APPReddit, the first corpus of non-experimental data annotated according to this theory. After describing its development, we compare our resource with enISEAR, a corpus of events created in an experimental setting and annotated for appraisal. Results show that the two corpora can be mapped notwithstanding different typologies of data and annotations schemes. A SVM model trained on APPReddit predicts four appraisal dimensions without significant loss. Merging both corpora in a single training set increases the prediction of 3 out of 4 dimensions. Such findings pave the way to a better performing classification model for appraisal prediction.
Guidelines and a Corpus for Extracting Biographical Events

Marco Antonio Stranisci, Enrico Mensa, Rossana Damiano, and 2 more authors

In Proceedings of the 18th Joint ACL-ISO Workshop on Interoperable Semantic Annotation within LREC2022, Jun 2022
O-dang! the ontology of dangerous speech messages

Marco Antonio Stranisci, Simona Frenda, Mirko Lai, and 5 more authors

In Proceedings of the 2nd Workshop on Sentiment Analysis and Linguistic Linked Data, Jun 2022
Analysing moral beliefs for detecting hate speech spreaders on Twitter

Mirko Lai, Marco Antonio Stranisci, Cristina Bosco, and 2 more authors

In International Conference of the Cross-Language Evaluation Forum for European Languages, Jun 2022

2021

HaMor at the profiling hate speech spreaders on Twitter

Mirko Lai, Marco Antonio Stranisci, Cristina Bosco, and 3 more authors

In CEUR Workshop Proceedings, Jun 2021
Representing the under-represented: A dataset of post-colonial, and migrant writers

Marco Antonio Stranisci, Viviana Patti, and Rossana Damiano

In 3rd Conference on Language, Data and Knowledge (LDK 2021), Jun 2021
Mapping Biographical events to ODPs through Lexico-Semantic Patterns?

Marco Antonio Stranisci, Valerio Basile, Rossana Damiano, and 2 more authors

In CEUR WORKSHOP PROCEEDINGS, Jun 2021
Recognizing hate with nlp: The teaching experience of the# deactivhate lab in italian high schools

Simona Frenda, Alessandra Teresa Cignarella, Marco Antonio Stranisci, and 3 more authors

In Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021), Jun 2021
The expression of moral values in the twitter debate: a corpus of conversations

Marco Stranisci, Michele De Leonardis, Cristina Bosco, and 1 more author

IJCoL. Italian Journal of Computational Linguistics, Jun 2021
Hate speech e dangerous speech in Twitter

Marco Stranisci, Cristina Bosco, Alessandra Cignarella, and 2 more authors

RILA: Rassegna Italiana di Linguistica Applicata: 3, 2021, Jun 2021

2020

Haspeede 2@ evalita2020: Overview of the evalita 2020 hate speech detection task

Manuela Sanguinetti, Gloria Comandini, Elisa Di Nuovo, and 6 more authors

Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, Jun 2020
Overview of the evalita 2020 second hate speech detection task (haspeede 2)

Manuela Sanguinetti, Gloria Comandini, Elisa Di Nuovo, and 6 more authors

Proceedings of the 7th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA 2020), Online. CEUR. org, Jun 2020
“contro l’odio”: A platform for detecting, monitoring and visualizing hate speech against immigrants in Italian social media

Arthur TE Capozzi, Mirko Lai, Valerio Basile, and 8 more authors

IJCoL. Italian Journal of Computational Linguistics, Jun 2020

2019

HATECHECKER: a tool to automatically detect hater users in online social networks

Cataldo Musto, Angelo Sansonetti, Marco Polignano, and 2 more authors

In Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019), Jun 2019
Computational linguistics against hate: Hate speech detection and visualization on social media in the “Contro L’Odio” project

Arthur TE Capozzi, Mirko Lai, Valerio Basile, and 8 more authors

In Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-It 2019), Jun 2019
Annotating hate speech: Three schemes at comparison

Fabio Poletto, Valerio Basile, Cristina Bosco, and 2 more authors

In Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019), Jun 2019

2018

An italian twitter corpus of hate speech against immigrants

Manuela Sanguinetti, Fabio Poletto, Cristina Bosco, and 2 more authors

In Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018), Jun 2018

2017

Tools and resources for detecting hate and prejudice against immigrants in social media

Cristina Bosco, Viviana Patti, Marcello Bogetti, and 5 more authors

In Proceedings of AISB Annual Convention 2017, Jun 2017
Hate speech annotation: Analysis of an italian twitter corpus

Fabio Poletto, Marco Stranisci, Manuela Sanguinetti, and 3 more authors

In Ceur workshop proceedings, Jun 2017

2016

Annotating sentiment and irony in the online italian political debate on# labuonascuola

Marco Antonio Stranisci, Cristina Bosco, Delia Irazú Hernández Farı́as, and 1 more author

In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), Jun 2016

2015

Analyzing and annotating for sentiment analysis the socio-political debate on# labuonascuola

Marco Stranisci, Cristina Bosco, Viviana Patti, and 2 more authors

In Proceedings of the Second Italian Conference on Computational Linguistics CLiC-it 2015, Jun 2015

2011

Le metafore del Partito Democratico (2007-2011)

MARCO ANTONIO STRANISCI

Jun 2011