diabiz
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
diabiz [2022/01/19 14:14] – [DiaBiz] madamczyk | diabiz [2023/09/27 09:49] (current) – pezik | ||
---|---|---|---|
Line 4: | Line 4: | ||
**DiaBiz corpus** is a dialog corpus comprising **recordings** and annotated **transcriptions** of **phone-based customer-agent interactions** in several key business domains. | **DiaBiz corpus** is a dialog corpus comprising **recordings** and annotated **transcriptions** of **phone-based customer-agent interactions** in several key business domains. | ||
- | The data has been **manually transcribed**, | + | A general overview of the corpus can be found in this paper: |
+ | * Pęzik, Piotr, Gosia Krawentek, Sylwia Karasińska, | ||
- | {{: | ||
- | **1. Manual correction of time-aligned transcription** | ||
+ | |||
+ | Also see the accompanying poster here: | ||
+ | * [[https:// | ||
=== The corpus comprises: === | === The corpus comprises: === | ||
- | * 3,766 conversations amounting to 385 hours and over 3 million words | + | * 4,036 conversations amounting to nearly 410 hours and over 3.2 million words |
- | * dialogues between 5 professional | + | * dialogues between 5 call-center agents and 191 participants as customers |
- | * data from 8 business domains with high commercial demand for conversational analytics and automation solutions | + | * data from 9 business domains with high commercial demand for conversational analytics and automation solutions |
- | * dialogues based on 200 real-life interaction scenarios | + | * dialogues based on 251 real-life interaction scenarios |
Line 23: | Line 25: | ||
==== The domains covered: ==== | ==== The domains covered: ==== | ||
- | ^ Domain ^ | + | ^ Domain ^ |
| Banking | 907 | 773, | | Banking | 907 | 773, | ||
| Car rental | 246 | 189, | | Car rental | 246 | 189, | ||
Line 32: | Line 34: | ||
| Telecommunications | 700 | 416, | | Telecommunications | 700 | 416, | ||
| Tourism | 451 | 674, | | Tourism | 451 | 674, | ||
- | | **Total** | **3,766** | **3,091,141** | **385:33:32** | | + | | Retail | 270 | 133, |
+ | | **Total** | **4,036** | **3,224,843** | **409:57:32** | | ||
+ | |||
+ | |||
+ | The data was automatically automatically **transcribed** and **time-aligned** and subsequently manually **corrected** and **annotated**. | ||
+ | |||
+ | |||
+ | {{: | ||
Line 53: | Line 62: | ||
=====Availability===== | =====Availability===== | ||
- | Click [[https:// | + | All the samples and supplementary materials available on this webpage are copyrighted. They are only included |
- | The current version of the recording catalog is available | + | Click [[https:// |
+ | |||
+ | The current version of the recording catalog is available [[https:// | ||
+ | |||
+ | For more information about the DiaBiz license for both commercial and scientific use, please contact piotr.pezik@uni.lodz.pl. | ||
- | For more information, | ||
=====Project Team==== | =====Project Team==== | ||
* Piotr Pęzik | * Piotr Pęzik | ||
Line 69: | Line 81: | ||
* Angelika Peljak-Łapińska | * Angelika Peljak-Łapińska | ||
* Anna Cichosz | * Anna Cichosz | ||
+ | * Anna Kwiatkowska | ||
* Mikołaj Deckert | * Mikołaj Deckert | ||
* Paulina Rybińska | * Paulina Rybińska | ||
Line 79: | Line 92: | ||
* Zuzanna Deckert | * Zuzanna Deckert | ||
* Piotr Górniak | * Piotr Górniak | ||
+ | * Konrad Kaczyński | ||
+ | * Łukasz Jałowiecki | ||
+ | |||
+ | |||
+ | =====DiaBiz EN===== | ||
+ | |||
+ | [[https:// | ||
+ | |||
=====Acknowledgments==== | =====Acknowledgments==== |
diabiz.1642598045.txt.gz · Last modified: 2022/01/19 14:14 by madamczyk