Graph-based Word Embeddings
António Branco is the Head of NLX - Natural Language and Speech Group of the University of Lisbon, Faculty of Sciences, Department of Informatics, where he is an Associate Professor with Habilitation.
He is the Director General of PORTULAN CLARIN - Research Infrastructure for the Science and Technology of Language, belonging to the Portuguese national Roadmap for Research Infrastructures of Strategic Relevance, and part of the European Research Consortium (ERIC) of the CLARIN infrastructure.
He is member of the Executive Board of META-NET European Network of Excellence for Language Technology. He was Executive Director (2015-2016) of the Mind-Brain College of the University of Lisbon, and Chair (2016-2018) of the Steering
Committee of the international network PROPOR Computational Processing of the Portuguese Language.
He works in the fields of Artificial Intelligence and Cognitive Science, with special focus on the Science and Technology of Language. He published over 180 papers, with more than 80 coauthors, and participated in over 25 research projects, having coordinated 10 of them, including 2 European research consortia.
Vectorial representations of meaning can be supported by empirical data from diverse sources and obtained with diverse embedding approaches.
This paper aims at screening this experimental space and reports on an assessment of word embeddings supported (i) by data in raw texts vs.
in lexical graphs, (ii) by lexical information encoded in association vs. inference-based graphs, and obtained (iii) by edge reconstruction vs. matrix factorisation vs. random walk-based graph embedding methods. The results observed with these experiments indicate that the best solutions with graph-based word embeddings are very competitive, consistently outperforming mainstream text-based ones.