WikiHowNFQA

About WikiHowNFQA

The WikiHowNFQA dataset is derived from WikiHow, a popular online platform that provides how-to guides on a wide range of topics. The dataset is structured to include a question, a set of related documents, and a human-authored answer. The questions are non-factoid, requiring comprehensive, multi-sentence answers. The related documents provide the necessary information to generate an answer.

WikiHowNFQA is designed for researchers and presents a unique opportunity to tackle the challenges of creating comprehensive answers from multiple documents, and grounding those answers in the real-world context provided by the supporting documents.

Dataset Structure

The WikiHowNFQA dataset is composed of instances, each containing a question, a set of related documents, and a human-authored answer. The WikiHowNFQA dataset is divided into two parts:

QA Part

This part contains questions, answers, and links to web archive snapshots of related HTML pages. It is accessible for downloading on Hugging Face. Each dataset instance includes:

article_id: An integer identifier for the article corresponding to article_id from WikHow API.
question: The non-factoid instructional question.
answer: The human-written answer to the question corresponding human-written answer article summary from WikiHow website.
related_document_urls_wayback_snapshots: A list of URLs to web archive snapshots of related documents corresponding references from WikiHow article.
split: The split of the dataset that the instance belongs to ('train', 'validation', or 'test').
cluster: An integer identifier for the cluster that the instance belongs to.
related_documents_content: Parsed content from related urls, see Document Content Part below.

Document Content Part

This part contains parsed HTML content from related documents. Each instance includes:

article_id: The unique identifier of the article on the WikiHow website.
original_url: The original URL of the web page containing the article.
archive_url: The URL of a snapshot of the web page from archive.org. The snapshot is the version closest to when the article was created or modified.
parsed_text: The plain text parsed from the URL in the form of text passages without any HTML text and page structures.
parsed_md: The text parsed in MD format, which preserves formatting such as tables and lists when extracting text content from the web page.

Dataset Instances

Instance example 1

Article ID: 704295

Question: How To Seal Concrete Floors

Answer: To seal concrete floors, use an epoxy sealer if you want something durable that comes in a variety of colors. For indoor concrete floors that won't be exposed to oil or grease, use an acrylic sealer, which is easy to apply. If you want to seal over concrete floors that already have a seal, try a polyurethane sealer. To seal concrete floors without changing their appearance, you can use a silane or siloxane sealer, which won't alter the color or finish of the floors.

Related Document URLs:

Split: train

Cluster: 2407

Instance example 2

Article ID: 1366001

Question: How To Dry Figs

Answer: To dry figs, start by cutting them in half with a knife. Then, place the figs on an oven-safe rack with ventilation holes so the cut sides are facing up. Next, set your oven to the lowest temperature setting and put the figs inside of it. Let the figs dry in the oven with the door propped open for up to 36 hours. Once the figs are dry, let them cool completely before serving or storing them for later.

Related Document URLs:

Split: test

Cluster: -1

Instance example 3

Article ID: 930014

Question: How To Convince Your Parents to Let You Spend the Night

Answer: If you want to convince your parents to let you spend the night at a friend\'s house, wait until they\'re in a good mood before you approach them. Before you pop the question, start with some details by saying something like, "My friend\'s birthday is tomorrow and she wants to have a sleepover to celebrate. Would it be okay if I spent the night at her house tomorrow?" From here, your parents will most likely ask for more details, so be sure to find out where you\'ll be staying and how much parental supervision there will be. To convince a reluctant parent, try agreeing to a specific time when you\'ll be back, so they won\'t worry as much.

Related Document URLs:

Split: train

Cluster: 633

Leaderboard

Model	Automatic Evaluation				Human Evaluation
Model	Rouge-1	Rouge-2	Rouge-L	BertScore	Prefer Model	Prefer Gold	Tie
DPR + BART	39.8	12.4	23.0	0.881	13	52	35
text-davinci-003	32.2	8.5	19.7	0.873	18	53	29
DPR + text-davinci-003	35.4	9.2	20.2	0.868	56	15	29

Download

Download WikihowNFQA dataset from huggingface

WikiHowNFQA

About WikiHowNFQA

Dataset Structure

QA Part

Document Content Part

Dataset Instances

Instance example 1

Related Document Content:

Instance example 2

Related Document Content:

Instance example 3

Related Document Content:

Leaderboard

Download