Kereső
Bejelentkezés
Kapcsolat
|
|
A magyar webtér aratásával kapcsolatos kurátori feladatok = Curatorial Tasks Related to the Harvesting of the Hungarian Web Domain |
| Tartalom: | https://real.mtak.hu/229680/ |
|---|---|
| Archívum: | REAL |
| Gyűjtemény: |
Status = Published
Subject = Z Bibliography. Library Science. Information Resources / könyvtártudomány: Z665 Library Science. Information Science / könyvtártudomány, információtudomány Type = Book Section Subject = Q Science / természettudomány: QA Mathematics / matematika: QA76.625 Internet Science / internettudomány |
| Cím: |
A magyar webtér aratásával kapcsolatos kurátori feladatok = Curatorial Tasks Related to the Harvesting of the Hungarian Web Domain
|
| Létrehozó: |
Kalcsó, Gyula
|
| Kiadó: |
Hungarnet Egyesület
|
| Dátum: |
2025
|
| Téma: |
QA76.625 Internet Science / internettudomány
Z665 Library Science. Information Science / könyvtártudomány, információtudomány
|
| Tartalmi leírás: |
According to a government decree, the national library’s essential task is to carry out a
harvest as complete as possible of the Hungarian web domain twice a year and to keep a
register of the sites known. This complex task is carried out by the web archiving team of
the Digital Philology and Web Archiving Department of the Digital Humanities Centre of
the Hungarian National Széchényi Library.
This paper will describe the most important curatorial activities related to this mandated
task. It will illustrate the process of registering websites and the methodology for collecting seed URLs. Since the launch of the Hungarian Web Archive in 2017, the number of
registered sites has grown significantly. New URLs have been identified from our own harvests, recommendations have been received, and cooperation has been achieved with the
Internet Archive being the main source of new URLs.
The seed URL lists need to be maintained before the two annual harvests, which is a complex process involving many steps. The first step is to extract the URLs from the previous
captures and sort out those that are not yet known. We automatically retrieve the HTTP
status code to determine which sites are live, then retrieve the value of the title tag in the
HTML head tag, and see whether the site has a robots.txt file. Based on the structure of the
URLs and the information obtained, we can classify the new URLs into the appropriate
list. The status codes, the title data as well as the robots.txt are checked for the previously
harvested URLs as well, the inactive sites are removed from the lists, and the URLs are classified into the appropriate seed list.
|
| Nyelv: |
magyar
|
| Típus: |
Book Section
PeerReviewed
info:eu-repo/semantics/bookPart
|
| Formátum: |
text
|
| Azonosító: |
Kalcsó, Gyula (2025) A magyar webtér aratásával kapcsolatos kurátori feladatok = Curatorial Tasks Related to the Harvesting of the Hungarian Web Domain. In: Oktatási, kutatási és közgyűjteményi infrastruktúrák és tartalmak: digitális transzformáció felsőfokon : NETWORKSHOP 2025 : 34. Országos Informatikai Konferencia : 2025. május 13–15. Széchenyi István Egyetem, Győr. Hungarnet Egyesület, Budapest, pp. 194-201. ISBN 978-615-6792-15-0
|
| Kapcsolat: |
MTMT:36450065 10.31915/NWS.2025.21
|
| Létrehozó: |
cc_by
|