Représentation vectorielle de documents pour l’indexation de notices bibliographiques

Morgane Marchand; Geoffroy Fouquier; Emmanuel Marchand; Guillaume Pitel

Morgane Marchand eXenSa
Geoffroy Fouquier eXenSa
Emmanuel Marchand eXenSa
Guillaume Pitel eXenSa

Abstract

This article presents the eXenSa contribution to the 2016 DEFT Workshop. The proposed task consists in indexing bibliographic records with keywords chosen by professional indexers. We propose a statistical approach which combines graphical and semantic approches. The first approach defines a document keywords as thesaurus terms graphically similars to terms contained in the title or the abstract of this document. The second approach assigns to document the keywords associated with semantically similar
documents in training corpora. Both approach use models generated using NC-ISC, a stochastic matrix factorisation algorithm. Oursystem obtains the best F-score on half of the four test corpuses and ranks second for the two others.

Author Biography

Morgane Marchand, eXenSa

eXenSa, 41 rue Périer, Montrouge, France

Document vector embeddings for bibliographic records indexing

Abstract

Author Biography