http://ridows.irisa.fr/ojs/index.php/ridows/issue/feed Information Retrieval, Document and Semantic Web 2019-03-22T12:56:23+00:00 Vincent Claveau vincent.claveau@irisa.fr Open Journal Systems <p>The diversity in forms of documents (multimedia, multilingual, with or without a structure) and in their uses encourages different communities to mingle more and more.</p> <p><em>Information Retrieval, Document and Semantic Web</em> is a meeting point for these scientific or industrial communities who are interested in information research, the semantic web, the analysis of documents (texts, images, sounds, videos, etc.) or in the collection of documents.</p> <p>&nbsp;</p> http://ridows.irisa.fr/ojs/index.php/ridows/article/view/14 Construction(s) et contradictions des données de recherche en SHS 2019-03-22T11:54:09+00:00 Marie-Laure Malingre marie-laure.malingre@univ-rennes2.fr Morgane Mignon morgane.mignon@mshb.fr Cécile Sebban cecile.pierre@univ-rennes2.fr Alexandre Serres alexandre.serres@univ-rennes2.fr <p>In the last decade, political injunctions to curate and share research data have increased significantly. A survey conducted in 2017 in Rennes 2, a french Humanities and Social Sciences university, enabled us to question the habits and representations of the researchers in this matter, but also the term of “data” itself. Contrary to the idea that data are given, which is implicit in the french word “données”, the notion of “data” is far from being self-evident and actually proves to be complex and multifaceted. This article aims at showing that a triple redefinition and construction of research data is at stake in the discourses of researchers and institutional stakeholders: it operates at epistemological, intellectual and political levels. These concepts of data conflict with existing practices in the field.</p> 2019-02-25T00:00:00+00:00 Copyright (c) 2019 Information Retrieval, Document and Semantic Web http://ridows.irisa.fr/ojs/index.php/ridows/article/view/16 Automatic analysis of old documents: taking advantage of an incomplete, heterogeneous and noisy corpus 2019-03-22T12:51:22+00:00 Gaël Lejeune gael.lejeune@paris-sorbonne.fr Karine Abiven karine.abiven@paris-sorbonne.fr <p>In this article we try to tackle some problems arising with noisy and heterogeneous data in the domain of digital<br>humanities. We investigate a corpus known as the mazarinades corpus which gathers around 5,500 documents in French from the 177th<br>century. First of all, we show that this set of documents is not strictly speaking a corpus since its coverage has not been thoroughly<br>defined. Then, we advocate that it is possible to get interesting results even in the case of such an incomplete, heterogeneous and noisy<br>dataset by strictly limiting the amount of pre-treatments necessary fro processing texts. Finally, we present some results on a case study<br>on document dating where we aim to complete missing metadata in the mazarinades corpus. We exploit a method based on character<br>strings analysis which is robust to noisy data and can even take advantage of this noise for improving the quality of the results.</p> 2019-02-25T00:00:00+00:00 Copyright (c) 2019 Information Retrieval, Document and Semantic Web http://ridows.irisa.fr/ojs/index.php/ridows/article/view/12 Value and Variety Driven Approach for Extended Data Warehouses Design 2019-02-25T17:12:57+00:00 Nabila Berkani n_berkani@esi.dz Ladjel Bellatreche ladjel.bellatreche@ensma.fr Selma Khouri s_khouri@esi.dz <p>In a very short time (1999-present), the data warehouse ( DW ) technology has gone through all the phases of a&nbsp;technological product’s life : introduction on the market, growth, maturity and decline, signaled by the appearance of Big Data. In the big&nbsp;data landscape, the arrival of Linked Open Data (LOD) transforms the Big Data threat into an opportunity for DW s, because they bring&nbsp;added value and knowledge that we do not find in the internal sources feeding a DW . However, the consideration of LOD s increases the&nbsp;variety of sources, which must be managed effectively. In this paper, we present a new value and variety driven approach for DW design&nbsp;that we apply to a case study of the SHS domain.</p> 2019-02-25T11:17:43+00:00 Copyright (c) 2019 Information Retrieval, Document and Semantic Web http://ridows.irisa.fr/ojs/index.php/ridows/article/view/15 Data correction for transcription in crowdsourcing. A feedback from RECITAL platform 2019-03-22T10:55:40+00:00 Benjamin Hervy benjamin.hervy@univ-nantes.fr Pierre Pétillon pierre.petillon@etu.univ-nantes.fr Hugo Pigeon hugo.pigeon@etu.univ-nantes.fr Guillaume Raschia guillaume.raschia@univ-nantes.fr <p>Crowdsourcing have been widely deployed to cover some challenges in digital humanities, like in the transcription of old handwritten documents. Such approach is especially useful to tackle existing limits in automatic handwriting recognition techniques. Crowdsourcing allows workers to help experts in extraction and classification of information, when the workload is daunting. Yet, it yields some specific challenges related to the quality of produced data. In this paper, we discuss data quality in a research project called CIRESFI which aims at transcribing Italian Comedy financial archives through the RECITAL web platform. We finally propose some leads to tackle these issues.</p> 2019-03-22T10:55:40+00:00 Copyright (c) 2019 Information Retrieval, Document and Semantic Web http://ridows.irisa.fr/ojs/index.php/ridows/article/view/17 Harness the hetorogeneity in textual data 2019-03-22T12:53:22+00:00 Jacques Fize jacques.fize@cirad.fr Mathieu Roche mathieu.roche@cirad.fr Maguelonne Teisseire maguelonne.teisseire@irstea.fr <p>Over the last decades, there has been an increasing use of information systems, resulting in an exponential increase in textual data. Although the volumetric dimension of these textual data has been resolved, its heterogeneous dimension remains a challenge for the scientific community. The management of the heterogeneity in data offers many opportunities through an access to a richer information. In our work, we design a process for mapping heterogeneous textual data, based on their spatiality. In this article, we present the results returned by this process on data produced in Madagascar as part of the BVLAC project, led by CIRAD. Based on a set of 4 quality criteria, we obtain good spatial correspondence between these documents.</p> 2019-02-25T00:00:00+00:00 Copyright (c) 2019 Information Retrieval, Document and Semantic Web http://ridows.irisa.fr/ojs/index.php/ridows/article/view/11 Earth Observation Datasets for Change Detection in Forests 2019-03-22T12:56:23+00:00 Julius Akinyemi akinyemi@media.mit.edu Josiane Mothe josiane.mothe@irit.fr Nathalie Neptune nathalie.neptune@irit.fr <p>Various datasets can be used to automatically detect changes occurring in forests. This article reviews datasets both global and local that may be used to automatically classify land cover and detect changes particularly in forests. These same datasets may be used to evaluate image segmentation and annotation methods. This contribution focuses on data allowing the analysis of deforestation and reforestation phenomena.</p> 2019-02-25T00:00:00+00:00 Copyright (c) 2019 Information Retrieval, Document and Semantic Web