diabiz
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionLast revisionBoth sides next revision | ||
diabiz [2022/01/19 14:14] – [DiaBiz] madamczyk | diabiz [2023/07/10 10:38] – pezik | ||
---|---|---|---|
Line 4: | Line 4: | ||
**DiaBiz corpus** is a dialog corpus comprising **recordings** and annotated **transcriptions** of **phone-based customer-agent interactions** in several key business domains. | **DiaBiz corpus** is a dialog corpus comprising **recordings** and annotated **transcriptions** of **phone-based customer-agent interactions** in several key business domains. | ||
- | The data has been **manually transcribed**, | + | A general overview of the corpus can be found in this paper: |
+ | * Pęzik, Piotr, Gosia Krawentek, Sylwia Karasińska, | ||
- | {{: | ||
- | |||
- | **1. Manual correction of time-aligned transcription** | ||
+ | Also see the accompanying poster here: | ||
+ | * [[https:// | ||
=== The corpus comprises: === | === The corpus comprises: === | ||
- | * 3,766 conversations amounting to 385 hours and over 3 million words | + | * 4,036 conversations amounting to nearly 410 hours and over 3.2 million words |
- | * dialogues between 5 professional | + | * dialogues between 5 call-center agents and 191 participants as customers |
- | * data from 8 business domains with high commercial demand for conversational analytics and automation solutions | + | * data from 9 business domains with high commercial demand for conversational analytics and automation solutions |
- | * dialogues based on 200 real-life interaction scenarios | + | * dialogues based on 251 real-life interaction scenarios |
Line 23: | Line 23: | ||
==== The domains covered: ==== | ==== The domains covered: ==== | ||
- | ^ Domain ^ | + | ^ Domain ^ |
| Banking | 907 | 773, | | Banking | 907 | 773, | ||
| Car rental | 246 | 189, | | Car rental | 246 | 189, | ||
Line 32: | Line 32: | ||
| Telecommunications | 700 | 416, | | Telecommunications | 700 | 416, | ||
| Tourism | 451 | 674, | | Tourism | 451 | 674, | ||
- | | **Total** | **3,766** | **3,091,141** | **385:33:32** | | + | | Retail | 270 | 133, |
+ | | **Total** | **4,036** | **3,224,843** | **409:57:32** | | ||
+ | |||
+ | |||
+ | The data was automatically automatically **transcribed** and **time-aligned** and subsequently manually **corrected** and **annotated**. | ||
+ | |||
+ | |||
+ | {{: | ||
Line 53: | Line 60: | ||
=====Availability===== | =====Availability===== | ||
- | Click [[https:// | + | All the samples and supplementary materials available on this webpage are copyrighted. They are only included to illustrate the content of the DiaBiz database and should not be used for any other purposes without explicit permission from the University of Lodz representatives. |
+ | |||
+ | Click [[https:// | ||
+ | |||
+ | The current version of the recording catalog is available [[https:// | ||
- | The current version of the recording catalog is available [[https:// | + | For more information about the DiaBiz license for both commercial and scientific use, please contact piotr.pezik@uni.lodz.pl. |
- | For more information, | ||
=====Project Team==== | =====Project Team==== | ||
* Piotr Pęzik | * Piotr Pęzik | ||
Line 69: | Line 79: | ||
* Angelika Peljak-Łapińska | * Angelika Peljak-Łapińska | ||
* Anna Cichosz | * Anna Cichosz | ||
+ | * Anna Kwiatkowska | ||
* Mikołaj Deckert | * Mikołaj Deckert | ||
* Paulina Rybińska | * Paulina Rybińska | ||
Line 79: | Line 90: | ||
* Zuzanna Deckert | * Zuzanna Deckert | ||
* Piotr Górniak | * Piotr Górniak | ||
+ | * Konrad Kaczyński | ||
+ | * Łukasz Jałowiecki | ||
=====Acknowledgments==== | =====Acknowledgments==== |
diabiz.txt · Last modified: 2023/09/27 09:49 by pezik