Képek és metaadataik gyűjteményezése scraping technológiával közösségi képmegosztó oldalról - TUdományos DOkumentumok Közös Keresője

in English |
magyarul

Betűméret: Súgó

Kereső

Bejelentkezés

Regisztráció

Kapcsolat

MTA KIK
HUN-REN SZTAKI DSD

Képek és metaadataik gyűjteményezése scraping technológiával közösségi képmegosztó oldalról

Metaadatok

Tartalom:	http://ocs.mtak.hu/index.php/nws/nws2024/paper/view/178
Archívum:	NETWORKSHOP
Gyűjtemény:	Tanulmányok
Cím:	Képek és metaadataik gyűjteményezése scraping technológiával közösségi képmegosztó oldalról
Létrehozó:	Gyula Kalcsó; Országos Széchényi Könyvtár Digitális Bölcsészeti Központ<br />
Kiadó:	NETWORKSHOP
Dátum:	2024-12-19 15:37:41
Tartalmi leírás:	A cikk egy kísérleti projektet mutat be, amelynek során egy közösségi képmegosztó oldalról közel félmillió digitális fényképet és azok metaadatait mentette el webarchiváló csapatunk, és nagy részét betöltötte a könyvtár Digitális Képarchívumának (DKA) adatbázisába. Azért választottuk a scrapinget, mert az eredeti oldal webarchiválási módszerekkel történő mentése az alkalmazott technológiák miatt túl nagy kihívást jelent, a képeket és metaadataikat pedig a DKA-ban kívántuk archiválni, ahol nincs szükség az oldal megjelenésének és funkcionalitásának megőrzésére.A cikk a megőrzés teljes folyamatára kitér, a jogi kérdések tisztázásától kezdve a megőrzendő metaadatok kiválasztásán át a technikai megvalósítás lépéseiig és a mentett tartalom adatbázisba töltéséig.A cikk bemutatja, hogyan válogattuk és mentettük a releváns metaadatokat a scraping-technológia segítségével, és milyen adatkészlet-formátumokat választottunk ezek tárolására. Ezeket a JSON-fájlokat használtuk fel arra, hogy az adatokat betöltsük a DKA adatbázisába. Ehhez a lementett metaadatokat meg kellett feleltetni a köztaurusz tárgyszavainak, és ennek megfelelően konvertálni az adatkészleteket.Kulcsszavak: képgyűjteményezés, webarchiválás, scraping, közösségi oldalak mentése Collecting images and metadata from a social image sharing site using scraping technologyThe paper will present a pilot project in which our web archiving team saved nearly half a million digital photographs and their metadata from a social image sharing site and uploaded most of them to the library’s Digital Image Archive (DIA) database. We chose scraping as saving the original site using web archiving methods was too challenging due to the technologies used, and we wanted to preserve the images and their metadata in the DIA, where there was no need to preserve the site’s appearance and functionality.The paper will cover the entire preservation process, from clarifying legal issues and selecting metadata to be preserved to the technical implementation steps and the process of uploading the preserved content into the database.This paper will describe how we selected and saved relevant metadata using scraping technology and the data set formats in which we chose to store them. These JSON files were used to import the data into the DIA database. To do this, the saved metadata had to be mapped to the thesaurus of NSZL and the data sets had to be converted accordingly.Keywords: image collecting, web archiving, scraping, capturing social media https://doi.org/10.31915/NWS.2024.20
Nyelv:	magyar
Típus:	Peer-reviewed Paper
Formátum:	application/pdf
Azonosító:	http://ocs.mtak.hu/index.php/nws/nws2024/paper/view/178
Forrás:	NETWORKSHOP; Networkshop 2024
Létrehozó:	Authors who submit to this conference agree to the following terms:<br/> <strong>a)</strong> Authors retain copyright over their work, while allowing the conference to place this unpublished work under a <a href="http://creativecommons.org/licenses/by/3.0/">Creative Commons Attribution License</a>, which allows others to freely access, use, and share the work, with an acknowledgement of the work's authorship and its initial presentation at this conference.<br/> <strong>b)</strong> Authors are able to waive the terms of the CC license and enter into separate, additional contractual arrangements for the non-exclusive distribution and subsequent publication of this work (e.g., publish a revised version in a journal, post it to an institutional repository or publish it in a book), with an acknowledgement of its initial presentation at this conference.<br/> <strong>c)</strong> In addition, authors are encouraged to post and share their work online (e.g., in institutional repositories or on their website) at any point before and after the conference.