Link do zasobu (portal):

Link do zasobu (repozytorium):



Pola oznaczone gwiazdką (*) są wymagane

Typ zasobu: praca dyplomowa

New methods for metadata extraction from scientific literature

Metadane zasobu

Tytuł New methods for metadata extraction from scientific literature
Wariant tytułu: Nowe metody wydobywania metadanych z literatury naukowej
Osoby Autorzy: Dominika Beata Tkaczyk
Partner: Instytut Badań Systemowych PAN w Warszawie
Opis Spreading the ideas and announcing new discoveries and findings in the scientific world is typically realized by publishing and reading scientific literature. Within the past few decades we have witnessed digital revolution, which moved scholarly communication to electronic media and also resulted in a substantial increase in its volume. Nowadays keeping track with the latest scientific achievements poses a major challenge for the researchers.
Scientific information overload is a severe problem that slows down scholarly communication and knowledge propagation across the academia.
Modern research infrastructures facilitate studying scientic literature by providing intelligent search tools, proposing similar and related documents, building and visualizing interactive citation and author networks, assessing the quality and impact of the articles using citation-based statistics, and so on. In order to provide such high quality services the system requires the access not only to the text content of stored documents, but also to their machine-readable metadata. Since in practice good quality metadata is not always available, there is a strong demand for a reliable automatic method of extracting machine-readable metadata directly from source documents.
Our research addresses these problems by proposing an automatic, accurate and flexible algorithm for extracting wide range of metadata directly from scientific articles in born-digital form. Extracted information includes basic document metadata, structured full text and bibliography section.
Designed as a universal solution, proposed algorithm is able to handle a vast variety of publication layouts with high precision and thus is well-suited for analyzing heterogeneous document collections. This was achieved by employing supervised and unsupervised machine-learning algorithms trained on large, diverse datasets. The evaluation we conducted showed good performance of proposed metadata extraction algorithm. The comparison with other similar solutions also proved our algorithm performs better than competition for most metadata types.
Proposed method is a reliable and accurate solution to the problem of extracting the metadata from documents.
It allows modern research infrastructures to provide intelligent tools and services supporting the process of consuming the growing volume of scientic literature by the readers, which results in facilitating the communication among the scientists and the overall improvement of the knowledge propagation and the quality of the research in the scientic world. (Angielski)
Słowa kluczowe "wydobywanie informacji"@pl, "Machine Learning"@en, "uczenie maszynowe"@pl, "data mining"@en, "metadata mining"@en, "document analysis"@en, "wydobywanie metadanych"@pl, "analiza dokumentów"@pl
Klasyfikacja Typ zasobu: praca dyplomowa
Dyscyplina naukowa: dziedzina nauk technicznych / informatyka
Grupa docelowa: naukowcy, studenci, przedsiębiorcy
Informacja o zawartości szkodliwych treści : Nie
Charakterystyka Miejsce powstania: Warszawa
Czas powstania: 2015
Liczba stron: 180
Promotor: Marek Antoni Niezgódka
Język zasobu: Polski
Lokalizacja: Warszawa
Licencja CC BY-SA 4.0
Informacje techniczne Deponujący: Anna Wasilewska
Data udostępnienia : 15-10-2018
Kolekcje Kolekcja Instytutu Badań Systemowych PAN w Warszawie

elementy wygenerowane automatycznie

Cytowanie zasobu


Dominika Beata Tkaczyk. New methods for metadata extraction from scientific literature. [praca dyplomowa] Dostępny w Atlasie Zasobów Otwartej Nauki, . Licencja: CC BY-SA 4.0, https://creativecommons.org/licenses/by-sa/4.0/legalcode.pl. Data dostępu: DD.MM.RRRR.

Pliki (2)