Skip to content


Dear colleagues

We are pleased to announce that the BigData Lab of the University of Isfahan has presented some large-scale datasets for Question Answering, Machine Reading Comprehension, and Answer Selection for the Persian language. We are very proud to share these datasets with our colleagues. These datasets are accessible from the links below:

PersianQuAD: The Native Question Answering Dataset for the Persian Language

Jamshid Mozafari, Arefeh Kazemi, & Mohammad Ali Nematbakhsh

Developing Question Answering systems (QA) is one of the main goals in Artificial Intelligence. With the advent of Deep Learning (DL) techniques, QA systems have witnessed significant advances. Although DL performs very well on QA, it requires a considerable amount of annotated data for training. Many annotated datasets have been built for the QA task; most of them are exclusively in English. In order to address the need for a high-quality QA dataset in the Persian language, we present PersianQuAD, the native QA dataset for the Persian language. read more


PerAnSel: A Novel Deep Neural Network-Based System for Persian Question Answering

Jamshid Mozafari, Arefeh Kazemi, Parham Moradi, & Mohammad Ali Nematbakhsh

Question answering (QA) systems have attracted considerable attention in recent years. They receive the user’s questions in natural language and respond to them with precise answers. Most of the works on QA were initially proposed for the English language, but some research studies have recently been performed on non-English languages. Answer selection (AS) is a critical component in QA systems. To the best of our knowledge, there is no research on AS for the Persian language. read more


ParSQuAD: Persian Question Answering Dataset based on Machine Translation of SQuAD 2.0

Negin Abadani, Jamshid Mozafari, Afsaneh FatemiMohamadali NematbakhshArefeh Kazemi

Recent developments in Question Answering (QA) have improved state-of-the-art results, and various datasets have been released for this task. Since substantial English training datasets are available for this task, the majority of works published are for English Question Answering. However, due to the lack of Persian datasets, less research has been done on the latter language, making comparisons difficult. This paper introduces the Persian Question Answering Dataset (ParSQuAD) based on the machine translation of the SQuAD 2.0 dataset. read more