Publications

WikiBio: a Semantic Resource for the Intersectional Analysis of Biographical Events

Published in Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

In this paper we fill this gap by presenting a new corpus annotated for biographical event detection. The corpus, which includes 20 Wikipedia biographies, was compared with five existing corpora to train a model for the biographical event detection task. The model was able to detect all mentions of the target-entity in a biography with an F-score of 0.808 and the entity-related events with an F-score of 0.859. Finally, the model was used for performing an analysis of biases about women and non-Western people in Wikipedia biographies.

Recommended citation: Marco Antonio Stranisci, Rossana Damiano, Enrico Mensa, Viviana Patti, Daniele Radicioni, and Tommaso Caselli. 2023. WikiBio: a Semantic Resource for the Intersectional Analysis of Biographical Events. In Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada. Association for Computational Linguistics. in press

Analysing Moral Beliefs for Detecting Hate Speech Spreaders on Twitter

Published in Proceedings of the 13th International Conference of the CLEF Association, CLEF 2022, 2022

The Hate and Morality (HAMOR) submission for the Profiling Hate Speech Spreaders on Twitter task at PAN 2021 ranked as the 19th position - over 67 participating teams - according to the averaged accuracy value of 73% over the two languages - English (62%) and Spanish (84%). The method proposed four types of features for inferring users attitudes just from the text in their messages: HS detection, users morality, named entities, and communicative behaviour

Recommended citation: Mirko Lai, Marco Antonio Stranisci, Cristina Bosco, Rossana Damiano, and Viviana Patti. 2022. Analysing Moral Beliefs for Detecting Hate Speech Spreaders on Twitter. In Experimental IR Meets Multilinguality, Multimodality, and Interaction: 13th International Conference of the CLEF Association, CLEF 2022, Bologna, Italy, Proceedings (pp. 149-161). Cham: Springer International Publishing https://link.springer.com/chapter/10.1007/978-3-031-13643-6_12

APPReddit: a Corpus of Reddit Posts Annotated for Appraisal

Published in Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Despite the large number of computational resources for emotion recognition, there is a lack of data sets relying on appraisal models. According to Appraisal theories, emotions are the outcome of a multi-dimensional evaluation of events. In this paper, we present APPReddit, the first corpus of non-experimental data annotated according to this theory.

Recommended citation: Marco Antonio Stranisci, Simona Frenda, Eleonora Ceccaldi, Valerio Basile, Rossana Damiano, and Viviana Patti. 2022. APPReddit: a Corpus of Reddit Posts Annotated for Appraisal. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3809–3818, Marseille, France. European Language Resources Association https://aclanthology.org/2022.lrec-1.406.pdf

O-Dang! The Ontology of Dangerous Speech Messages

Published in Proceedings of the 2nd Workshop on Sentiment Analysis and Linguistic Linked Data, 2022

In this paper we present O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and interoperable Knowledge Graph (KG) for the collection of linguistic annotated data

Recommended citation: Marco Antonio Stranisci, Mirko Lai, Simona Frenda, Oscar Araque, Alessandra Teresa Cignarella, Valerio Basile, Viviana Patti, and Cristina Bosco. 2022. O-Dang! The Ontology of Dangerous Speech Messages. In Proceedings of the 2nd Workshop on Sentiment Analysis and Linguistic Linked Data, pp. 2-8. European Language Resources Association. http://www.lrec-conf.org/proceedings/lrec2022/workshops/SALLD-2/pdf/2022.salld2-1.2.pdf

Representing the Under-Represented: a Dataset of Post-Colonial, and Migrant Writers

Published in Proceedings of the 3rd Conference on Language, Data and Knowledge (LDK 2021), 2021

In today's media and in the Web of Data, non-Western people still suffer a lack of representation. In our work, we address this issue by presenting a pipeline for collecting and semantically encoding Wikipedia biographies of writers who are under-represented due to their non-Western origins, or their legal status in a country.

Recommended citation: Marco Antonio Stranisci, Viviana Patti, and Rossana Damiano. 2021. Representing the Under-Represented: a Dataset of Post-Colonial, and Migrant Writers. 3rd Conference on Language, Data and Knowledge (LDK 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik. https://iris.unito.it/bitstream/2318/1820239/1/OASIcs-LDK-2021-7.pdf

Haspeede 2@ evalita2020: Overview of the evalita 2020 hate speech detection task

Published in Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2020), 2020

The Hate Speech Detection (HaSpeeDe 2) task is the second edition of a shared task on the detection of hateful content in Italian Twitter messages.

Recommended citation: Manuela, Sanguinetti, Comandini Gloria, Elisa Di Nuovo, Simona Frenda, Marco Antonio Stranisci, Cristina Bosco, Caselli Tommaso, Viviana Patti, and Russo Irene. 2020. Haspeede 2@ evalita2020: Overview of the evalita 2020 hate speech detection task. In Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2020), pp. 1-9. CEUR. https://iris.unito.it/bitstream/2318/1764608/1/paper162.pdf

“Contro L’Odio”: A Platform for Detecting, Monitoring and Visualizing Hate Speech against Immigrants in Italian Social Media

Published in IJCOL 6-1 | 2020, 2020

The paper describes the Web platform built within the project “Contro l’Odio”, for monitoring and contrasting discrimination and hate speech against immigrants in Italy. It applies a combination of computational linguistics techniques for hate speech detection and data visualization tools on data drawn from Twitter.

Recommended citation: Arthur Thomas Edward Capozzi, Mirko Lai, Valerio Basile, Fabio Poletto, Manuela Sanguinetti, Cristina Bosco, Viviana Patti […] Marco Antonio Stranisci. 2020. “contro l’odio”: a platform for detecting, monitoring and visualizing hate speech against immigrants in Italian social media. IJCoL. Italian Journal of Computational Linguistics 6, no. 6-1 (2020): 77-97. https://journals.openedition.org/ijcol/659

Annotating hate speech: Three schemes at comparison

Published in Proceedings of the 6th Italian Conference on Computational Linguistics (CLIC-it 2019), 2019

Annotated data are essential to train and benchmark NLP systems. The reliability of the annotation, i.e. low interannotator disagreement, is a key factor, especially when dealing with highly subjective phenomena occurring in human language. Hate speech (HS), in particular, is intrinsically nuanced and hard to fit in any fixed scale, therefore crisp classification schemes for its annotation often show their limits. We test three annotation schemes on a corpus of HS, in order to produce more reliable data.

Recommended citation: Fabio Poletto, Valerio Basile, Cristina Bosco, Viviana Patti, and Marco Antonio Stranisci. 2019. Annotating hate speech: Three schemes at comparison. In CEUR WORKSHOP PROCEEDINGS, vol. 2481, pp. 1-8. CEUR-WS. https://iris.unito.it/bitstream/2318/1716344/1/paper56.pdf

An Italian Twitter Corpus of Hate Speech against Immigrants

Published in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018

The paper describes a recently-created Twitter corpus of about 6,000 tweets, annotated for hate speech against immigrants, and developed to be a reference dataset for an automatic system of hate speech monitoring

Recommended citation: Manuela Sanguinetti, Fabio Poletto, Cristina Bosco, Viviana Patti, and Marco Antonio Stranisci. 2018. An Italian Twitter Corpus of Hate Speech against Immigrants. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA). https://aclanthology.org/L18-1443.pdf

Annotating Sentiment and Irony in the Online Italian Political Debate on #labuonascuola

Published in Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), 2016

In this paper we present the TWitterBuonaScuola corpus (TW-BS), a novel Italian linguistic resource for Sentiment Analysis, developed with the main aim of analyzing the online debate on the controversial Italian political reform “Buona Scuola” (Good school), aimed at reorganizing the national educational and training systems. We describe the methodologies applied in the collection and annotation of data

Recommended citation: Marco Antonio Stranisci, Cristina Bosco, Delia Irazú Hernández Farías, and Viviana Patti. 2016. Annotating Sentiment and Irony in the Online Italian Political Debate on #labuonascuola. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2892–2899, Portorož, Slovenia. European Language Resources Association (ELRA). https://aclanthology.org/L16-1462.pdf