Tytuł: New methods for metadata extraction from scientific literature Wariant tytułu: Nowe metody wydobywania metadanych z literatury naukowej Autorzy: Dominika Beata Tkaczyk Partner: Instytut Badań Systemowych PAN w Warszawie Opis: Spreading the ideas and announcing new discoveries and findings in the scientific world is typically realized by publishing and reading scientific literature. Within the past few decades we have witnessed digital revolution, which moved scholarly communication to electronic media and also resulted in a substantial increase in its volume. Nowadays keeping track with the latest scientific achievements poses a major challenge for the researchers. Scientific information overload is a severe problem that slows down scholarly communication and knowledge propagation across the academia. Modern research infrastructures facilitate studying scientic literature by providing intelligent search tools, proposing similar and related documents, building and visualizing interactive citation and author networks, assessing the quality and impact of the articles using citation-based statistics, and so on. In order to provide such high quality services the system requires the access not only to the text content of stored documents, but also to their machine-readable metadata. Since in practice good quality metadata is not always available, there is a strong demand for a reliable automatic method of extracting machine-readable metadata directly from source documents. Our research addresses these problems by proposing an automatic, accurate and flexible algorithm for extracting wide range of metadata directly from scientific articles in born-digital form. Extracted information includes basic document metadata, structured full text and bibliography section. Designed as a universal solution, proposed algorithm is able to handle a vast variety of publication layouts with high precision and thus is well-suited for analyzing heterogeneous document collections. This was achieved by employing supervised and unsupervised machine-learning algorithms trained on large, diverse datasets. The evaluation we conducted showed good performance of proposed metadata extraction algorithm. The comparison with other similar solutions also proved our algorithm performs better than competition for most metadata types. Proposed method is a reliable and accurate solution to the problem of extracting the metadata from documents. It allows modern research infrastructures to provide intelligent tools and services supporting the process of consuming the growing volume of scientic literature by the readers, which results in facilitating the communication among the scientists and the overall improvement of the knowledge propagation and the quality of the research in the scientic world. Słowa kluczowe: "eksploracja danych"@pl, "analiza dokumentów"@pl, "wydobywanie metadanych"@pl, "uczenie maszynowe"@pl, "Machine Learning"@en Typ zasobu: praca dyplomowa Dyscyplina naukowa: dziedzina nauk technicznych / informatyka (2011) Grupa docelowa: naukowcy, studenci, przedsiębiorcy Szkodliwe treści: Nie Promotor: Marek Antoni Niezgódka (10195) Język zasobu: Polski Czas powstania: 2015 Lokalizacja: Warszawa Miejsce powstania: Warszawa Liczba stron: 180 Prawa/licencja: CC BY-SA 4.0 Deponujący: Anna Wasilewska Data udostępnienia: 15-10-2018 Link do zasobu (portal): https://zasobynauki.pl/zasoby/new-methods-for-metadata-extraction-from-scientific-literature,21567/ Link do zasobu (repozytorium): https://id.e-science.pl/records/21567