Document vector embeddings for bibliographic records indexing

  • Morgane Marchand eXenSa
  • Geoffroy Fouquier eXenSa
  • Emmanuel Marchand eXenSa
  • Guillaume Pitel eXenSa

Abstract

This article presents the eXenSa contribution to the 2016 DEFT Workshop. The proposed task consists in indexing bibliographic records with keywords chosen by professional indexers. We propose a statistical approach which combines graphical and semantic approches. The first approach defines a document keywords as thesaurus terms graphically similars to terms contained in the title or the abstract of this document. The second approach assigns to document the keywords associated with semantically similar
documents in training corpora. Both approach use models generated using NC-ISC, a stochastic matrix factorisation algorithm. Oursystem obtains the best F-score on half of the four test corpuses and ranks second for the two others.

Author Biography

Morgane Marchand, eXenSa

eXenSa, 41 rue Périer, Montrouge, France

Published
2018-07-18