A magyar webtér aratásával kapcsolatos kurátori feladatok = Curatorial Tasks Related to the Harvesting of the Hungarian Web Domain - TUdományos DOkumentumok Közös Keresője

in English |
magyarul

Betűméret: Súgó

Kereső

Bejelentkezés

Regisztráció

Kapcsolat

MTA KIK
HUN-REN SZTAKI DSD

A magyar webtér aratásával kapcsolatos kurátori feladatok = Curatorial Tasks Related to the Harvesting of the Hungarian Web Domain

Metaadatok

Tartalom:	https://real.mtak.hu/229680/
Archívum:	REAL
Gyűjtemény:	Status = Published Subject = Z Bibliography. Library Science. Information Resources / könyvtártudomány: Z665 Library Science. Information Science / könyvtártudomány, információtudomány Type = Book Section Subject = Q Science / természettudomány: QA Mathematics / matematika: QA76.625 Internet Science / internettudomány
Cím:	A magyar webtér aratásával kapcsolatos kurátori feladatok = Curatorial Tasks Related to the Harvesting of the Hungarian Web Domain
Létrehozó:	Kalcsó, Gyula
Kiadó:	Hungarnet Egyesület
Dátum:	2025
Téma:	QA76.625 Internet Science / internettudomány Z665 Library Science. Information Science / könyvtártudomány, információtudomány
Tartalmi leírás:	According to a government decree, the national library’s essential task is to carry out a harvest as complete as possible of the Hungarian web domain twice a year and to keep a register of the sites known. This complex task is carried out by the web archiving team of the Digital Philology and Web Archiving Department of the Digital Humanities Centre of the Hungarian National Széchényi Library. This paper will describe the most important curatorial activities related to this mandated task. It will illustrate the process of registering websites and the methodology for collecting seed URLs. Since the launch of the Hungarian Web Archive in 2017, the number of registered sites has grown significantly. New URLs have been identified from our own harvests, recommendations have been received, and cooperation has been achieved with the Internet Archive being the main source of new URLs. The seed URL lists need to be maintained before the two annual harvests, which is a complex process involving many steps. The first step is to extract the URLs from the previous captures and sort out those that are not yet known. We automatically retrieve the HTTP status code to determine which sites are live, then retrieve the value of the title tag in the HTML head tag, and see whether the site has a robots.txt file. Based on the structure of the URLs and the information obtained, we can classify the new URLs into the appropriate list. The status codes, the title data as well as the robots.txt are checked for the previously harvested URLs as well, the inactive sites are removed from the lists, and the URLs are classified into the appropriate seed list.
Nyelv:	magyar
Típus:	Book Section PeerReviewed info:eu-repo/semantics/bookPart
Formátum:	text
Azonosító:	https://real.mtak.hu/229680/1/NETWORKSHOP_2025_Kalcso_v2.pdf Kalcsó, Gyula (2025) A magyar webtér aratásával kapcsolatos kurátori feladatok = Curatorial Tasks Related to the Harvesting of the Hungarian Web Domain. In: Oktatási, kutatási és közgyűjteményi infrastruktúrák és tartalmak: digitális transzformáció felsőfokon : NETWORKSHOP 2025 : 34. Országos Informatikai Konferencia : 2025. május 13–15. Széchenyi István Egyetem, Győr. Hungarnet Egyesület, Budapest, pp. 194-201. ISBN 978-615-6792-15-0
Kapcsolat:	https://real.mtak.hu/229680/ https://doi.org/10.31915/NWS.2025.21 MTMT:36450065 10.31915/NWS.2025.21
Létrehozó:	cc_by