1291 lines
52 KiB
Plaintext
1291 lines
52 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "lightweight-detroit",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Mapping *Pan-Latin Textile Fibres Vocabulary* from spreadsheet to SKOS resources\n",
|
||
"\n",
|
||
"This Notebook implements a simple parser used to transform the Pan-Latin Textile Fibres Vocabulary, developed within the Realiter network, and published as spreadsheets, into SKOS resources. The parser reads the spreadsheets and transforms the content in SKOS data following a set of mapping rules, the result is stored in two Turtle files.\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"id": "modified-vegetarian",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import pandas as pd\n",
|
||
"import rdflib\n",
|
||
"import itertools\n",
|
||
"import yaml\n",
|
||
"import datetime"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "hundred-singles",
|
||
"metadata": {},
|
||
"source": [
|
||
"The file *config.yaml* contains the external information used in the parsing, including the position of the spreadsheets. Set the correct values before running the Notebook."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"id": "stupid-lewis",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"try:\n",
|
||
" with open(\"config-lessico.yaml\", 'r') as stream:\n",
|
||
" try:\n",
|
||
" conf=yaml.safe_load(stream)\n",
|
||
" except yaml.YAMLError as exc:\n",
|
||
" print(exc)\n",
|
||
"except FileNotFoundError:\n",
|
||
" print('Warning config.yaml file not present! Please store it in the same directory as the notebook')\n",
|
||
"#print (conf)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "generic-thong",
|
||
"metadata": {},
|
||
"source": [
|
||
"The following cells defines the *Namespaces* used in the parsing"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"id": "oriental-structure",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from rdflib.namespace import DC, DCAT, DCTERMS, OWL, \\\n",
|
||
" RDF, RDFS, SKOS, \\\n",
|
||
" XMLNS, XSD, XMLNS\n",
|
||
"from rdflib import Namespace\n",
|
||
"from rdflib import URIRef, BNode, Literal\n",
|
||
"\n",
|
||
"pltextile = Namespace(conf['Namespaces']['TEXTILETERM'])\n",
|
||
"dc11=Namespace(\"http://purl.org/dc/elements/1.1/\");\n",
|
||
"dct = Namespace(\"http://purl.org/dc/terms/\")\n",
|
||
"iso369=Namespace(\"http://id.loc.gov/vocabulary/iso639-3\");"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "sacred-shopper",
|
||
"metadata": {},
|
||
"source": [
|
||
"Download **Lessico** spreadsheet and show it to check if the operation has been executed correctly"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"id": "systematic-saudi",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"url=conf['Source']['LESSICOSOURCE']\n",
|
||
"df_data=pd.read_csv(url)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"id": "sunrise-reunion",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>it</th>\n",
|
||
" <th>DEF</th>\n",
|
||
" <th>ca</th>\n",
|
||
" <th>es</th>\n",
|
||
" <th>es [ARG]</th>\n",
|
||
" <th>es [ARG/MEX]</th>\n",
|
||
" <th>es [MEX]</th>\n",
|
||
" <th>fr</th>\n",
|
||
" <th>fr [CA]</th>\n",
|
||
" <th>gl</th>\n",
|
||
" <th>pt</th>\n",
|
||
" <th>ro</th>\n",
|
||
" <th>en</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>abaca (s.m.)\\nfibra di abaca (s.f.)\\ncanapa di...</td>\n",
|
||
" <td>Fibra ottenuta dalle foglie della Musa textilis.</td>\n",
|
||
" <td>abacà (n.m.)\\nfibra d’abacà (n.f.)\\ncànem de M...</td>\n",
|
||
" <td>abacá (s.m.)\\nfibra de abacá (s.f.)\\ncáñamo de...</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>abacá de Manila (s.m.)</td>\n",
|
||
" <td>abaca (n.m.)\\nchanvre de Manille (n.m.)\\ntagal...</td>\n",
|
||
" <td>fibre d’abaca (n.f.)\\nmanille (n.f.)</td>\n",
|
||
" <td>abacá (s.m.)\\ncánabo de Manila (s.m.)</td>\n",
|
||
" <td>abacá (s.m.)\\nmanila (s.f.)\\ncânhamo-de-manila...</td>\n",
|
||
" <td>abaca (s.f.)</td>\n",
|
||
" <td>abaca\\nabaca fibre\\nManila hemp</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>acetato (s.m.)\\nfibra di acetato (s.f.)</td>\n",
|
||
" <td>Fibra prodotta a partire dall’acetato di cellu...</td>\n",
|
||
" <td>raió (n.m.)\\nfibra d’acetat (n.f.)</td>\n",
|
||
" <td>acetato (s.m.) \\nrayón acetato (s.m.)</td>\n",
|
||
" <td>rayón (s.m.)\\nviscosa (s.f.)</td>\n",
|
||
" <td>fibra de acetato (s.f.)</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>acétate (n.m.) \\nfibre d’acétate (n.f.)</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>acetato (s.m.)\\nfibra de acetato (s.f.)</td>\n",
|
||
" <td>acetato (s.m.)\\nfibra de acetato (s.f.) \\nraio...</td>\n",
|
||
" <td>acetat (s.m.)</td>\n",
|
||
" <td>acetate\\nacetate fibre</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>acrilico (s.m.)\\nfibra acrilica (s.f.)\\nacrili...</td>\n",
|
||
" <td>Fibra costituita da macromolecole lineari cont...</td>\n",
|
||
" <td>acrílic, -a (adj.)\\nfibra acrílica (n.f.)</td>\n",
|
||
" <td>acrílica (s.f.)\\nfibra acrílica (s.f.)</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>acrílico (s.m.)</td>\n",
|
||
" <td>fibra de acrílico (s.f.)</td>\n",
|
||
" <td>acrylique (n.m.)\\nfibre acrylique (n.f.)</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>acrílico (s.m.)\\nfibra acrílica (s.f.)</td>\n",
|
||
" <td>acrílico (s.m.)\\nfibra acrílica (s.f.)</td>\n",
|
||
" <td>acrilic (s.m.)</td>\n",
|
||
" <td>acrylic\\nacrylic fibre</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>alfa (s.f.)\\nfibra d’alfa (s.f.)</td>\n",
|
||
" <td>Fibra ricavata dalle foglie della Stipa tenaci...</td>\n",
|
||
" <td>espart (n.m.)\\nfibra d’espart (n.f.)</td>\n",
|
||
" <td>esparto (s.m.) \\nfibra de esparto (s.f.)</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>fibra alfa (s.f.)</td>\n",
|
||
" <td>alfa (s.m.)</td>\n",
|
||
" <td>alfa (n.m.)</td>\n",
|
||
" <td>sparte (n.m.)\\nspart (n.m.)</td>\n",
|
||
" <td>alfa (s.f.)\\nesparto (s.m.)</td>\n",
|
||
" <td>alfa (s.f.)\\nfibra de alfa (s.f.)</td>\n",
|
||
" <td>alfa (s.m.)\\nfibră alfa (s.f.)</td>\n",
|
||
" <td>alfa\\nalfa fibre</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>alginica (s.f.)\\nfibra alginica (s.f.)</td>\n",
|
||
" <td>Fibra prodotta a partire dai sali metallici de...</td>\n",
|
||
" <td>fibra d’alginat (n.f.)</td>\n",
|
||
" <td>fibra algínica (s.f.)\\nfibra de alginato (s.f.)</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>alginato (s.m.)</td>\n",
|
||
" <td>fibre d’alginate (n.f.)\\nalginate (n.m.)</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>alxinato (s.m.)\\nfibra de alxinato (s.f.)</td>\n",
|
||
" <td>alginato (s.m.)\\nfibra algínica (s.f.)\\nfibra ...</td>\n",
|
||
" <td>alginat (s.n.)\\nfibră alginică (s.f.)</td>\n",
|
||
" <td>alginate\\nalginic fibre\\nalginate fibre</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" it \\\n",
|
||
"0 abaca (s.m.)\\nfibra di abaca (s.f.)\\ncanapa di... \n",
|
||
"1 acetato (s.m.)\\nfibra di acetato (s.f.) \n",
|
||
"2 acrilico (s.m.)\\nfibra acrilica (s.f.)\\nacrili... \n",
|
||
"3 alfa (s.f.)\\nfibra d’alfa (s.f.) \n",
|
||
"4 alginica (s.f.)\\nfibra alginica (s.f.) \n",
|
||
"\n",
|
||
" DEF \\\n",
|
||
"0 Fibra ottenuta dalle foglie della Musa textilis. \n",
|
||
"1 Fibra prodotta a partire dall’acetato di cellu... \n",
|
||
"2 Fibra costituita da macromolecole lineari cont... \n",
|
||
"3 Fibra ricavata dalle foglie della Stipa tenaci... \n",
|
||
"4 Fibra prodotta a partire dai sali metallici de... \n",
|
||
"\n",
|
||
" ca \\\n",
|
||
"0 abacà (n.m.)\\nfibra d’abacà (n.f.)\\ncànem de M... \n",
|
||
"1 raió (n.m.)\\nfibra d’acetat (n.f.) \n",
|
||
"2 acrílic, -a (adj.)\\nfibra acrílica (n.f.) \n",
|
||
"3 espart (n.m.)\\nfibra d’espart (n.f.) \n",
|
||
"4 fibra d’alginat (n.f.) \n",
|
||
"\n",
|
||
" es \\\n",
|
||
"0 abacá (s.m.)\\nfibra de abacá (s.f.)\\ncáñamo de... \n",
|
||
"1 acetato (s.m.) \\nrayón acetato (s.m.) \n",
|
||
"2 acrílica (s.f.)\\nfibra acrílica (s.f.) \n",
|
||
"3 esparto (s.m.) \\nfibra de esparto (s.f.) \n",
|
||
"4 fibra algínica (s.f.)\\nfibra de alginato (s.f.) \n",
|
||
"\n",
|
||
" es [ARG] es [ARG/MEX] \\\n",
|
||
"0 NaN NaN \n",
|
||
"1 rayón (s.m.)\\nviscosa (s.f.) fibra de acetato (s.f.) \n",
|
||
"2 NaN acrílico (s.m.) \n",
|
||
"3 NaN fibra alfa (s.f.) \n",
|
||
"4 NaN NaN \n",
|
||
"\n",
|
||
" es [MEX] \\\n",
|
||
"0 abacá de Manila (s.m.) \n",
|
||
"1 NaN \n",
|
||
"2 fibra de acrílico (s.f.) \n",
|
||
"3 alfa (s.m.) \n",
|
||
"4 alginato (s.m.) \n",
|
||
"\n",
|
||
" fr \\\n",
|
||
"0 abaca (n.m.)\\nchanvre de Manille (n.m.)\\ntagal... \n",
|
||
"1 acétate (n.m.) \\nfibre d’acétate (n.f.) \n",
|
||
"2 acrylique (n.m.)\\nfibre acrylique (n.f.) \n",
|
||
"3 alfa (n.m.) \n",
|
||
"4 fibre d’alginate (n.f.)\\nalginate (n.m.) \n",
|
||
"\n",
|
||
" fr [CA] \\\n",
|
||
"0 fibre d’abaca (n.f.)\\nmanille (n.f.) \n",
|
||
"1 NaN \n",
|
||
"2 NaN \n",
|
||
"3 sparte (n.m.)\\nspart (n.m.) \n",
|
||
"4 NaN \n",
|
||
"\n",
|
||
" gl \\\n",
|
||
"0 abacá (s.m.)\\ncánabo de Manila (s.m.) \n",
|
||
"1 acetato (s.m.)\\nfibra de acetato (s.f.) \n",
|
||
"2 acrílico (s.m.)\\nfibra acrílica (s.f.) \n",
|
||
"3 alfa (s.f.)\\nesparto (s.m.) \n",
|
||
"4 alxinato (s.m.)\\nfibra de alxinato (s.f.) \n",
|
||
"\n",
|
||
" pt \\\n",
|
||
"0 abacá (s.m.)\\nmanila (s.f.)\\ncânhamo-de-manila... \n",
|
||
"1 acetato (s.m.)\\nfibra de acetato (s.f.) \\nraio... \n",
|
||
"2 acrílico (s.m.)\\nfibra acrílica (s.f.) \n",
|
||
"3 alfa (s.f.)\\nfibra de alfa (s.f.) \n",
|
||
"4 alginato (s.m.)\\nfibra algínica (s.f.)\\nfibra ... \n",
|
||
"\n",
|
||
" ro \\\n",
|
||
"0 abaca (s.f.) \n",
|
||
"1 acetat (s.m.) \n",
|
||
"2 acrilic (s.m.) \n",
|
||
"3 alfa (s.m.)\\nfibră alfa (s.f.) \n",
|
||
"4 alginat (s.n.)\\nfibră alginică (s.f.) \n",
|
||
"\n",
|
||
" en \n",
|
||
"0 abaca\\nabaca fibre\\nManila hemp \n",
|
||
"1 acetate\\nacetate fibre \n",
|
||
"2 acrylic\\nacrylic fibre \n",
|
||
"3 alfa\\nalfa fibre \n",
|
||
"4 alginate\\nalginic fibre\\nalginate fibre "
|
||
]
|
||
},
|
||
"execution_count": 5,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df_data.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"id": "native-judges",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"df_data.rename(columns = {'es [ARG]': 'es-arg', 'es [MEX]': 'es-mex', 'fr [CA]': 'fr-ca'}, inplace = True)\n",
|
||
"#df_data.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"id": "united-samoa",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"'abaca'"
|
||
]
|
||
},
|
||
"execution_count": 7,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df_data.iloc[0].it.split('\\n')[0].split(' ')[0]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "indonesian-curtis",
|
||
"metadata": {},
|
||
"source": [
|
||
"Create a graph for the SKOS data and bind the namespaces to it"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 23,
|
||
"id": "parallel-bible",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"c1rdf = rdflib.Graph()\n",
|
||
"c1rdf.bind(\"pltextile\", pltextile)\n",
|
||
"c1rdf.bind(\"dc11\", dc11)\n",
|
||
"c1rdf.bind(\"dct\", dct)\n",
|
||
"c1rdf.bind(\"iso369-3\", iso369)\n",
|
||
"c1rdf.bind(\"skos\", SKOS)\n",
|
||
"c1rdf.bind(\"dc\", DC)\n",
|
||
"c1rdf.bind(\"rdf\", RDF)\n",
|
||
"c1rdf.bind(\"owl\", OWL)\n",
|
||
"c1rdf.bind(\"xsd\", XSD)\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "quantitative-integer",
|
||
"metadata": {},
|
||
"source": [
|
||
"Insert in the graph the *SKOS.ConceptScheme*"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 24,
|
||
"id": "protective-anxiety",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"<Graph identifier=N72688dca2b42426587f4eb0e0dac3bfe (<class 'rdflib.graph.Graph'>)>"
|
||
]
|
||
},
|
||
"execution_count": 24,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"now = datetime.datetime.today()\n",
|
||
"today_date=now.date()\n",
|
||
"title=Literal(conf['Texts']['LESSICOTITLE'], lang=conf['Texts']['LANG'])\n",
|
||
"description=Literal(conf['Texts']['LESSICODESCRIPTION'], lang=conf['Texts']['LANG'])\n",
|
||
"description_it=Literal(conf['Texts']['LESSICODESCRIPTION_IT'], lang='it')\n",
|
||
"identifier=Literal(conf['Texts']['LESSICOID'], lang=conf['Texts']['LANG'])\n",
|
||
"#identifier=URIRef(conf['Texts']['VOCABULARYID'])\n",
|
||
"createddate= Literal(conf['Texts']['LESSICOCREATEDATE'],datatype=XSD.date)\n",
|
||
"moddate= Literal(today_date,datatype=XSD.date)\n",
|
||
"version= Literal(conf['Texts']['LESSICOVERSION'],datatype=XSD.string)\n",
|
||
"\n",
|
||
"c1rdf.add((pltextile[''], RDF.type, SKOS.ConceptScheme))\n",
|
||
"c1rdf.add((pltextile[''], DC.title, title))\n",
|
||
"c1rdf.add((pltextile[''], DC.identifier, identifier))\n",
|
||
"c1rdf.add((pltextile[''], DC.description, description))\n",
|
||
"c1rdf.add((pltextile[''], DC.description, description_it))\n",
|
||
"c1rdf.add((pltextile[''], dct.created, createddate))\n",
|
||
"c1rdf.add((pltextile[''], dct.modified, moddate))\n",
|
||
"c1rdf.add((pltextile[''], OWL.versionInfo, version))\n",
|
||
"c1rdf.add((pltextile[''], dct.language, iso369.eng))\n",
|
||
"c1rdf.add((pltextile[''], dct.language, iso369.es))\n",
|
||
"c1rdf.add((pltextile[''], dct.language, iso369.fra))\n",
|
||
"c1rdf.add((pltextile[''], dct.language, iso369.gl))\n",
|
||
"c1rdf.add((pltextile[''], dct.language, iso369.ita))\n",
|
||
"c1rdf.add((pltextile[''], dct.language, iso369.ro))\n",
|
||
"c1rdf.add((pltextile[''], dct.language, iso369.pt))\n",
|
||
"c1rdf.add((pltextile[''], dct.language, iso369.ca))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 25,
|
||
"id": "vertical-election",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>it</th>\n",
|
||
" <th>DEF</th>\n",
|
||
" <th>ca</th>\n",
|
||
" <th>es</th>\n",
|
||
" <th>es-arg</th>\n",
|
||
" <th>es [ARG/MEX]</th>\n",
|
||
" <th>es-mex</th>\n",
|
||
" <th>fr</th>\n",
|
||
" <th>fr-ca</th>\n",
|
||
" <th>gl</th>\n",
|
||
" <th>pt</th>\n",
|
||
" <th>ro</th>\n",
|
||
" <th>en</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>abaca (s.m.)\\nfibra di abaca (s.f.)\\ncanapa di...</td>\n",
|
||
" <td>Fibra ottenuta dalle foglie della Musa textilis.</td>\n",
|
||
" <td>abacà (n.m.)\\nfibra d’abacà (n.f.)\\ncànem de M...</td>\n",
|
||
" <td>abacá (s.m.)\\nfibra de abacá (s.f.)\\ncáñamo de...</td>\n",
|
||
" <td></td>\n",
|
||
" <td></td>\n",
|
||
" <td>abacá de Manila (s.m.)</td>\n",
|
||
" <td>abaca (n.m.)\\nchanvre de Manille (n.m.)\\ntagal...</td>\n",
|
||
" <td>fibre d’abaca (n.f.)\\nmanille (n.f.)</td>\n",
|
||
" <td>abacá (s.m.)\\ncánabo de Manila (s.m.)</td>\n",
|
||
" <td>abacá (s.m.)\\nmanila (s.f.)\\ncânhamo-de-manila...</td>\n",
|
||
" <td>abaca (s.f.)</td>\n",
|
||
" <td>abaca\\nabaca fibre\\nManila hemp</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>acetato (s.m.)\\nfibra di acetato (s.f.)</td>\n",
|
||
" <td>Fibra prodotta a partire dall’acetato di cellu...</td>\n",
|
||
" <td>raió (n.m.)\\nfibra d’acetat (n.f.)</td>\n",
|
||
" <td>acetato (s.m.) \\nrayón acetato (s.m.)</td>\n",
|
||
" <td>rayón (s.m.)\\nviscosa (s.f.)</td>\n",
|
||
" <td>fibra de acetato (s.f.)</td>\n",
|
||
" <td></td>\n",
|
||
" <td>acétate (n.m.) \\nfibre d’acétate (n.f.)</td>\n",
|
||
" <td></td>\n",
|
||
" <td>acetato (s.m.)\\nfibra de acetato (s.f.)</td>\n",
|
||
" <td>acetato (s.m.)\\nfibra de acetato (s.f.) \\nraio...</td>\n",
|
||
" <td>acetat (s.m.)</td>\n",
|
||
" <td>acetate\\nacetate fibre</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>acrilico (s.m.)\\nfibra acrilica (s.f.)\\nacrili...</td>\n",
|
||
" <td>Fibra costituita da macromolecole lineari cont...</td>\n",
|
||
" <td>acrílic, -a (adj.)\\nfibra acrílica (n.f.)</td>\n",
|
||
" <td>acrílica (s.f.)\\nfibra acrílica (s.f.)</td>\n",
|
||
" <td></td>\n",
|
||
" <td>acrílico (s.m.)</td>\n",
|
||
" <td>fibra de acrílico (s.f.)</td>\n",
|
||
" <td>acrylique (n.m.)\\nfibre acrylique (n.f.)</td>\n",
|
||
" <td></td>\n",
|
||
" <td>acrílico (s.m.)\\nfibra acrílica (s.f.)</td>\n",
|
||
" <td>acrílico (s.m.)\\nfibra acrílica (s.f.)</td>\n",
|
||
" <td>acrilic (s.m.)</td>\n",
|
||
" <td>acrylic\\nacrylic fibre</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>alfa (s.f.)\\nfibra d’alfa (s.f.)</td>\n",
|
||
" <td>Fibra ricavata dalle foglie della Stipa tenaci...</td>\n",
|
||
" <td>espart (n.m.)\\nfibra d’espart (n.f.)</td>\n",
|
||
" <td>esparto (s.m.) \\nfibra de esparto (s.f.)</td>\n",
|
||
" <td></td>\n",
|
||
" <td>fibra alfa (s.f.)</td>\n",
|
||
" <td>alfa (s.m.)</td>\n",
|
||
" <td>alfa (n.m.)</td>\n",
|
||
" <td>sparte (n.m.)\\nspart (n.m.)</td>\n",
|
||
" <td>alfa (s.f.)\\nesparto (s.m.)</td>\n",
|
||
" <td>alfa (s.f.)\\nfibra de alfa (s.f.)</td>\n",
|
||
" <td>alfa (s.m.)\\nfibră alfa (s.f.)</td>\n",
|
||
" <td>alfa\\nalfa fibre</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>alginica (s.f.)\\nfibra alginica (s.f.)</td>\n",
|
||
" <td>Fibra prodotta a partire dai sali metallici de...</td>\n",
|
||
" <td>fibra d’alginat (n.f.)</td>\n",
|
||
" <td>fibra algínica (s.f.)\\nfibra de alginato (s.f.)</td>\n",
|
||
" <td></td>\n",
|
||
" <td></td>\n",
|
||
" <td>alginato (s.m.)</td>\n",
|
||
" <td>fibre d’alginate (n.f.)\\nalginate (n.m.)</td>\n",
|
||
" <td></td>\n",
|
||
" <td>alxinato (s.m.)\\nfibra de alxinato (s.f.)</td>\n",
|
||
" <td>alginato (s.m.)\\nfibra algínica (s.f.)\\nfibra ...</td>\n",
|
||
" <td>alginat (s.n.)\\nfibră alginică (s.f.)</td>\n",
|
||
" <td>alginate\\nalginic fibre\\nalginate fibre</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" it \\\n",
|
||
"0 abaca (s.m.)\\nfibra di abaca (s.f.)\\ncanapa di... \n",
|
||
"1 acetato (s.m.)\\nfibra di acetato (s.f.) \n",
|
||
"2 acrilico (s.m.)\\nfibra acrilica (s.f.)\\nacrili... \n",
|
||
"3 alfa (s.f.)\\nfibra d’alfa (s.f.) \n",
|
||
"4 alginica (s.f.)\\nfibra alginica (s.f.) \n",
|
||
"\n",
|
||
" DEF \\\n",
|
||
"0 Fibra ottenuta dalle foglie della Musa textilis. \n",
|
||
"1 Fibra prodotta a partire dall’acetato di cellu... \n",
|
||
"2 Fibra costituita da macromolecole lineari cont... \n",
|
||
"3 Fibra ricavata dalle foglie della Stipa tenaci... \n",
|
||
"4 Fibra prodotta a partire dai sali metallici de... \n",
|
||
"\n",
|
||
" ca \\\n",
|
||
"0 abacà (n.m.)\\nfibra d’abacà (n.f.)\\ncànem de M... \n",
|
||
"1 raió (n.m.)\\nfibra d’acetat (n.f.) \n",
|
||
"2 acrílic, -a (adj.)\\nfibra acrílica (n.f.) \n",
|
||
"3 espart (n.m.)\\nfibra d’espart (n.f.) \n",
|
||
"4 fibra d’alginat (n.f.) \n",
|
||
"\n",
|
||
" es \\\n",
|
||
"0 abacá (s.m.)\\nfibra de abacá (s.f.)\\ncáñamo de... \n",
|
||
"1 acetato (s.m.) \\nrayón acetato (s.m.) \n",
|
||
"2 acrílica (s.f.)\\nfibra acrílica (s.f.) \n",
|
||
"3 esparto (s.m.) \\nfibra de esparto (s.f.) \n",
|
||
"4 fibra algínica (s.f.)\\nfibra de alginato (s.f.) \n",
|
||
"\n",
|
||
" es-arg es [ARG/MEX] \\\n",
|
||
"0 \n",
|
||
"1 rayón (s.m.)\\nviscosa (s.f.) fibra de acetato (s.f.) \n",
|
||
"2 acrílico (s.m.) \n",
|
||
"3 fibra alfa (s.f.) \n",
|
||
"4 \n",
|
||
"\n",
|
||
" es-mex \\\n",
|
||
"0 abacá de Manila (s.m.) \n",
|
||
"1 \n",
|
||
"2 fibra de acrílico (s.f.) \n",
|
||
"3 alfa (s.m.) \n",
|
||
"4 alginato (s.m.) \n",
|
||
"\n",
|
||
" fr \\\n",
|
||
"0 abaca (n.m.)\\nchanvre de Manille (n.m.)\\ntagal... \n",
|
||
"1 acétate (n.m.) \\nfibre d’acétate (n.f.) \n",
|
||
"2 acrylique (n.m.)\\nfibre acrylique (n.f.) \n",
|
||
"3 alfa (n.m.) \n",
|
||
"4 fibre d’alginate (n.f.)\\nalginate (n.m.) \n",
|
||
"\n",
|
||
" fr-ca \\\n",
|
||
"0 fibre d’abaca (n.f.)\\nmanille (n.f.) \n",
|
||
"1 \n",
|
||
"2 \n",
|
||
"3 sparte (n.m.)\\nspart (n.m.) \n",
|
||
"4 \n",
|
||
"\n",
|
||
" gl \\\n",
|
||
"0 abacá (s.m.)\\ncánabo de Manila (s.m.) \n",
|
||
"1 acetato (s.m.)\\nfibra de acetato (s.f.) \n",
|
||
"2 acrílico (s.m.)\\nfibra acrílica (s.f.) \n",
|
||
"3 alfa (s.f.)\\nesparto (s.m.) \n",
|
||
"4 alxinato (s.m.)\\nfibra de alxinato (s.f.) \n",
|
||
"\n",
|
||
" pt \\\n",
|
||
"0 abacá (s.m.)\\nmanila (s.f.)\\ncânhamo-de-manila... \n",
|
||
"1 acetato (s.m.)\\nfibra de acetato (s.f.) \\nraio... \n",
|
||
"2 acrílico (s.m.)\\nfibra acrílica (s.f.) \n",
|
||
"3 alfa (s.f.)\\nfibra de alfa (s.f.) \n",
|
||
"4 alginato (s.m.)\\nfibra algínica (s.f.)\\nfibra ... \n",
|
||
"\n",
|
||
" ro \\\n",
|
||
"0 abaca (s.f.) \n",
|
||
"1 acetat (s.m.) \n",
|
||
"2 acrilic (s.m.) \n",
|
||
"3 alfa (s.m.)\\nfibră alfa (s.f.) \n",
|
||
"4 alginat (s.n.)\\nfibră alginică (s.f.) \n",
|
||
"\n",
|
||
" en \n",
|
||
"0 abaca\\nabaca fibre\\nManila hemp \n",
|
||
"1 acetate\\nacetate fibre \n",
|
||
"2 acrylic\\nacrylic fibre \n",
|
||
"3 alfa\\nalfa fibre \n",
|
||
"4 alginate\\nalginic fibre\\nalginate fibre "
|
||
]
|
||
},
|
||
"execution_count": 25,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"#c1rdf.serialize(destination='data/skostest.rdf', format=\"n3\");#format=\"pretty-xml\")\n",
|
||
"#comrdf.serialize(destination='data/parsed_rdf/prima_cantica_forme_com.rdf', format=\"n3\");\n",
|
||
"df_data.fillna('', inplace=True)\n",
|
||
"df_data.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "assigned-beijing",
|
||
"metadata": {},
|
||
"source": [
|
||
"The following cell implements the mapping rules for creating SKOS resources."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 26,
|
||
"id": "typical-prompt",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"1668\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"#df_data.iloc[0].it.split('\\n')[0].split(' ')[0]\n",
|
||
"for index, row in df_data.iterrows():\n",
|
||
" \n",
|
||
" strlabel=row.it.split('\\n')[0].split(' (')[0].strip()\n",
|
||
" label=strlabel.replace(\" \", \"_\")\n",
|
||
" #label=URIRef(row.it.split('\\n')[0].split(' (')[0].strip())\n",
|
||
" c1rdf.add((pltextile[''], SKOS.hasTopConcept, pltextile[label])) \n",
|
||
" frlabel=Literal(row[\"fr\"].split('\\n')[0].strip(), lang='fr')\n",
|
||
" fraltlabels=row[\"fr\"].split('\\n')[1:]\n",
|
||
" itlabel=Literal(row['it'].split('\\n')[0].strip(), lang='it')\n",
|
||
" italtlabels=row[\"it\"].split('\\n')[1:] \n",
|
||
" calabel=Literal(row['ca'].split('\\n')[0].strip(), lang='ca')\n",
|
||
" caaltlabels=row[\"ca\"].split('\\n')[1:]\n",
|
||
" eslabel=Literal(row['es'].split('\\n')[0].strip(), lang='es')\n",
|
||
" esaltlabels=row[\"es\"].split('\\n')[1:]\n",
|
||
" gllabel=Literal(row['gl'].split('\\n')[0].strip(), lang='gl')\n",
|
||
" glaltlabels=row[\"gl\"].split('\\n')[1:]\n",
|
||
" ptlabel=Literal(row['pt'].split('\\n')[0].strip(), lang='pt')\n",
|
||
" ptaltlabels=row[\"pt\"].split('\\n')[1:]\n",
|
||
" rolabel=Literal(row['ro'].split('\\n')[0].strip(), lang='ro')\n",
|
||
" roaltlabels=row[\"ro\"].split('\\n')[1:]\n",
|
||
" enlabel=Literal(row['en'].split('\\n')[0].strip(), lang='en')\n",
|
||
" enaltlabels=row[\"en\"].split('\\n')[1:]\n",
|
||
" \n",
|
||
" esarglabel=Literal(row['es-arg'].split('\\n')[0].strip(), lang='es-ar')\n",
|
||
" esargaltlabels=row[\"es-arg\"].split('\\n')[1:]\n",
|
||
" #es-arg-mex\n",
|
||
"# esargmexarglabel=Literal(row['es-arg-mex'].split('\\n')[0].strip(), lang='es-ar')\n",
|
||
"# esargmexmexlabel=Literal(row['es-arg-mex'].split('\\n')[0].strip(), lang='es-mx')\n",
|
||
"# esargmexaltlabels=row[\"es-arg-mex\"].split('\\n')[1:]\n",
|
||
" \n",
|
||
" esmexlabel=Literal(row['es-mex'].split('\\n')[0].strip(), lang='es-mx')\n",
|
||
" esmexaltlabels=row[\"es-mex\"].split('\\n')[1:]\n",
|
||
" frcalabel=Literal(row['fr-ca'].split('\\n')[0].strip(), lang='fr-ca')\n",
|
||
" frcaaltlabels=row[\"fr-ca\"].split('\\n')[1:]\n",
|
||
" \n",
|
||
" #definition\n",
|
||
" itdef=Literal(row[\"DEF\"].strip(), lang='it')\n",
|
||
" \n",
|
||
" c1rdf.add((pltextile[label], RDF.type, SKOS.Concept))\n",
|
||
" c1rdf.add((pltextile[label], SKOS.inScheme, pltextile['']))\n",
|
||
" c1rdf.add((pltextile[label], SKOS.topConceptOf, pltextile['']))\n",
|
||
" \n",
|
||
" for alab in esargaltlabels:\n",
|
||
" c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='es-ar')))\n",
|
||
" \n",
|
||
"# for alab in esargmexaltlabels:\n",
|
||
"# c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='es-ar')))\n",
|
||
"# c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='es-mx')))\n",
|
||
" \n",
|
||
" for alab in esmexaltlabels:\n",
|
||
" c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='es-mx')))\n",
|
||
" \n",
|
||
" for alab in frcaaltlabels:\n",
|
||
" c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='fr-ca')))\n",
|
||
" \n",
|
||
" for alab in esaltlabels:\n",
|
||
" c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='es')))\n",
|
||
" \n",
|
||
" for alab in glaltlabels:\n",
|
||
" c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='gl')))\n",
|
||
" \n",
|
||
" for alab in ptaltlabels:\n",
|
||
" c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='pt')))\n",
|
||
" \n",
|
||
" for alab in roaltlabels:\n",
|
||
" c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='ro')))\n",
|
||
" \n",
|
||
" for alab in enaltlabels:\n",
|
||
" c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='en')))\n",
|
||
" \n",
|
||
" for alab in caaltlabels:\n",
|
||
" c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='ca')))\n",
|
||
" \n",
|
||
" for alab in fraltlabels:\n",
|
||
" #print (\"tt \"+alab)\n",
|
||
" c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='fr')))\n",
|
||
" for alab in italtlabels:\n",
|
||
" c1rdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='it')))\n",
|
||
" \n",
|
||
" \n",
|
||
" if(frlabel):\n",
|
||
" c1rdf.add((pltextile[label], SKOS.prefLabel, frlabel))\n",
|
||
" if(itlabel):\n",
|
||
" c1rdf.add((pltextile[label], SKOS.prefLabel, itlabel))\n",
|
||
" if(gllabel):\n",
|
||
" c1rdf.add((pltextile[label], SKOS.prefLabel, gllabel))\n",
|
||
" \n",
|
||
" if(ptlabel):\n",
|
||
" c1rdf.add((pltextile[label], SKOS.prefLabel, ptlabel))\n",
|
||
" if(rolabel):\n",
|
||
" c1rdf.add((pltextile[label], SKOS.prefLabel, rolabel))\n",
|
||
" if(enlabel):\n",
|
||
" c1rdf.add((pltextile[label], SKOS.prefLabel, enlabel))\n",
|
||
" \n",
|
||
" if(calabel): \n",
|
||
" c1rdf.add((pltextile[label], SKOS.prefLabel, calabel))\n",
|
||
" if(eslabel): \n",
|
||
" c1rdf.add((pltextile[label], SKOS.prefLabel, eslabel))\n",
|
||
" if(esarglabel):\n",
|
||
" c1rdf.add((pltextile[label], SKOS.prefLabel, esarglabel))\n",
|
||
" \n",
|
||
"# if(esargmexarglabel):\n",
|
||
"# c1rdf.add((pltextile[label], SKOS.prefLabel, esargmexarglabel))\n",
|
||
"# c1rdf.add((pltextile[label], SKOS.prefLabel, esargmexmexlabel))\n",
|
||
" \n",
|
||
" if(esmexlabel):\n",
|
||
" c1rdf.add((pltextile[label], SKOS.prefLabel, esmexlabel))\n",
|
||
" if(frcalabel):\n",
|
||
" c1rdf.add((pltextile[label], SKOS.prefLabel, frcalabel))\n",
|
||
" \n",
|
||
" if (itdef):\n",
|
||
" c1rdf.add((pltextile[label], SKOS.definition, itdef))\n",
|
||
"\n",
|
||
"print(len(c1rdf))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 27,
|
||
"id": "answering-latino",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# for s, p, o in c1rdf.triples((None, None, None)):\n",
|
||
"# print(\"{} {}\".format(s, o.n3))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "quality-scratch",
|
||
"metadata": {},
|
||
"source": [
|
||
"Create a *Turtle* file in the **/data** directory with the SKOS resources for **Data Stewardship terminology** "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 28,
|
||
"id": "equal-voice",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"c1rdf.serialize(destination='data/lexpanlatskos_11.ttl', format=\"n3\");#format=\"pretty-xml\")\n",
|
||
"c1rdf.serialize(destination='data/lexpanlatskos_11.rdf', format=\"pretty-xml\");#format=\"pretty-xml\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "selected-enemy",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Lessico panlatino delle Maniche"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 29,
|
||
"id": "current-material",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"urlma=conf['Source']['LESSICOMANICHESOURCE']\n",
|
||
"df_data_maniche=pd.read_csv(urlma)\n",
|
||
"df_data_maniche.rename(columns = {'es [ARG]': 'es-arg', 'es [MEX]': 'es-mex', 'pt [BR]': 'pt-br'}, inplace = True)\n",
|
||
"df_data_maniche.fillna('', inplace=True)\n",
|
||
"#df_data_maniche.info()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 31,
|
||
"id": "incorporated-creature",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"<Graph identifier=Nc80da8e5fa8e4ef5a36a57aeaed9673d (<class 'rdflib.graph.Graph'>)>"
|
||
]
|
||
},
|
||
"execution_count": 31,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"cl_manicherdf = rdflib.Graph()\n",
|
||
"cl_manicherdf.bind(\"pltextile\", pltextile)\n",
|
||
"cl_manicherdf.bind(\"dc11\", dc11)\n",
|
||
"cl_manicherdf.bind(\"dct\", dct)\n",
|
||
"cl_manicherdf.bind(\"iso369-3\", iso369)\n",
|
||
"cl_manicherdf.bind(\"skos\", SKOS)\n",
|
||
"cl_manicherdf.bind(\"dc\", DC)\n",
|
||
"cl_manicherdf.bind(\"rdf\", RDF)\n",
|
||
"cl_manicherdf.bind(\"owl\", OWL)\n",
|
||
"cl_manicherdf.bind(\"xsd\", XSD)\n",
|
||
"now = datetime.datetime.today()\n",
|
||
"today_date=now.date()\n",
|
||
"title=Literal(conf['Texts']['LESSICOMANICHETITLE'], lang=conf['Texts']['LANG'])\n",
|
||
"description=Literal(conf['Texts']['LESSICOMANICHEDESCRIPTION'], lang=conf['Texts']['LANG'])\n",
|
||
"description_it=Literal(conf['Texts']['LESSICOMANICHEDESCRIPTION_IT'], lang='it')\n",
|
||
"identifier=Literal(conf['Texts']['LESSICOMANICHEID'], lang=conf['Texts']['LANG'])\n",
|
||
"#identifier=URIRef(conf['Texts']['VOCABULARYID'])\n",
|
||
"createddate= Literal(conf['Texts']['LESSICOCREATEDATE'],datatype=XSD.date)\n",
|
||
"moddate= Literal(today_date,datatype=XSD.date)\n",
|
||
"version= Literal(conf['Texts']['LESSICOVERSION'],datatype=XSD.string)\n",
|
||
"\n",
|
||
"cl_manicherdf.add((pltextile[''], RDF.type, SKOS.ConceptScheme))\n",
|
||
"cl_manicherdf.add((pltextile[''], DC.title, title))\n",
|
||
"cl_manicherdf.add((pltextile[''], DC.identifier, identifier))\n",
|
||
"cl_manicherdf.add((pltextile[''], DC.description, description))\n",
|
||
"cl_manicherdf.add((pltextile[''], DC.description, description_it))\n",
|
||
"cl_manicherdf.add((pltextile[''], dct.created, createddate))\n",
|
||
"cl_manicherdf.add((pltextile[''], dct.modified, moddate))\n",
|
||
"cl_manicherdf.add((pltextile[''], OWL.versionInfo, version))\n",
|
||
"cl_manicherdf.add((pltextile[''], dct.language, iso369.eng))\n",
|
||
"cl_manicherdf.add((pltextile[''], dct.language, iso369.es))\n",
|
||
"cl_manicherdf.add((pltextile[''], dct.language, iso369.fra))\n",
|
||
"cl_manicherdf.add((pltextile[''], dct.language, iso369.ca))\n",
|
||
"cl_manicherdf.add((pltextile[''], dct.language, iso369.ita))\n",
|
||
"cl_manicherdf.add((pltextile[''], dct.language, iso369.pt))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 32,
|
||
"id": "modular-realtor",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"597\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# Mapping\n",
|
||
"for index, row in df_data_maniche.iterrows():\n",
|
||
" \n",
|
||
" strlabel=row.it.split('\\n')[0].split('(')[0].strip()\n",
|
||
" label=strlabel.replace(\" \", \"_\").replace(\"’\",\"\").replace(\"'\",\"\").strip()\n",
|
||
" #label=URIRef(row.it.split('\\n')[0].split(' (')[0].strip())\n",
|
||
" cl_manicherdf.add((pltextile[''], SKOS.hasTopConcept, pltextile[label])) \n",
|
||
" frlabel=Literal(row[\"fr\"].split('\\n')[0].strip(), lang='fr')\n",
|
||
" fraltlabels=row[\"fr\"].split('\\n')[1:]\n",
|
||
" itlabel=Literal(row['it'].split('\\n')[0].strip(), lang='it')\n",
|
||
" italtlabels=row[\"it\"].split('\\n')[1:] \n",
|
||
" calabel=Literal(row['ca'].split('\\n')[0].strip(), lang='ca')\n",
|
||
" caaltlabels=row[\"ca\"].split('\\n')[1:]\n",
|
||
" eslabel=Literal(row['es'].split('\\n')[0].strip(), lang='es')\n",
|
||
" esaltlabels=row[\"es\"].split('\\n')[1:]\n",
|
||
" #gllabel=Literal(row['gl'].split('\\n')[0].strip(), lang='gl')\n",
|
||
" #glaltlabels=row[\"gl\"].split('\\n')[1:]\n",
|
||
" ptlabel=Literal(row['pt'].split('\\n')[0].strip(), lang='pt')\n",
|
||
" ptaltlabels=row[\"pt\"].split('\\n')[1:]\n",
|
||
"# rolabel=Literal(row['ro'].split('\\n')[0].strip(), lang='ro')\n",
|
||
"# roaltlabels=row[\"ro\"].split('\\n')[1:]\n",
|
||
" enlabel=Literal(row['en'].split('\\n')[0].strip(), lang='en')\n",
|
||
" enaltlabels=row[\"en\"].split('\\n')[1:]\n",
|
||
" \n",
|
||
" esarglabel=Literal(row['es-arg'].split('\\n')[0].strip(), lang='es-ar')\n",
|
||
" esargaltlabels=row[\"es-arg\"].split('\\n')[1:]\n",
|
||
" \n",
|
||
"\n",
|
||
" esmexlabel=Literal(row['es-mex'].split('\\n')[0].strip(), lang='es-mx')\n",
|
||
" esmexaltlabels=row[\"es-mex\"].split('\\n')[1:]\n",
|
||
" ptbrlabel=Literal(row['pt-br'].split('\\n')[0].strip(), lang='pt-br')\n",
|
||
" ptbraltlabels=row[\"pt-br\"].split('\\n')[1:]\n",
|
||
" \n",
|
||
" #definition\n",
|
||
" itdef=Literal(row[\"DEF\"].strip(), lang='it')\n",
|
||
" \n",
|
||
" cl_manicherdf.add((pltextile[label], RDF.type, SKOS.Concept))\n",
|
||
" cl_manicherdf.add((pltextile[label], SKOS.inScheme, pltextile['']))\n",
|
||
" cl_manicherdf.add((pltextile[label], SKOS.topConceptOf, pltextile['']))\n",
|
||
" \n",
|
||
" for alab in esargaltlabels:\n",
|
||
" cl_manicherdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='es-ar')))\n",
|
||
" \n",
|
||
" \n",
|
||
" for alab in esmexaltlabels:\n",
|
||
" cl_manicherdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='es-mx')))\n",
|
||
" \n",
|
||
" for alab in ptbraltlabels:\n",
|
||
" cl_manicherdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='pt-br')))\n",
|
||
" \n",
|
||
" for alab in esaltlabels:\n",
|
||
" cl_manicherdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='es')))\n",
|
||
" \n",
|
||
"# for alab in glaltlabels:\n",
|
||
"# cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='gl')))\n",
|
||
" \n",
|
||
" for alab in ptaltlabels:\n",
|
||
" cl_manicherdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='pt')))\n",
|
||
" \n",
|
||
"# for alab in roaltlabels:\n",
|
||
"# cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='ro')))\n",
|
||
" \n",
|
||
" for alab in enaltlabels:\n",
|
||
" cl_manicherdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='en')))\n",
|
||
" \n",
|
||
" for alab in caaltlabels:\n",
|
||
" cl_manicherdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='ca')))\n",
|
||
" \n",
|
||
" for alab in fraltlabels:\n",
|
||
" #print (\"tt \"+alab)\n",
|
||
" cl_manicherdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='fr')))\n",
|
||
" for alab in italtlabels:\n",
|
||
" cl_manicherdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='it')))\n",
|
||
" \n",
|
||
" \n",
|
||
" if(frlabel):\n",
|
||
" cl_manicherdf.add((pltextile[label], SKOS.prefLabel, frlabel))\n",
|
||
" if(itlabel):\n",
|
||
" cl_manicherdf.add((pltextile[label], SKOS.prefLabel, itlabel))\n",
|
||
"# if(gllabel):\n",
|
||
"# cl_manicherdf.add((pltextile[label], SKOS.prefLabel, gllabel))\n",
|
||
" \n",
|
||
" if(ptlabel):\n",
|
||
" cl_manicherdf.add((pltextile[label], SKOS.prefLabel, ptlabel))\n",
|
||
"# if(rolabel):\n",
|
||
"# cl_manicherdf.add((pltextile[label], SKOS.prefLabel, rolabel))\n",
|
||
" if(enlabel):\n",
|
||
" cl_manicherdf.add((pltextile[label], SKOS.prefLabel, enlabel))\n",
|
||
" \n",
|
||
" if(calabel): \n",
|
||
" cl_manicherdf.add((pltextile[label], SKOS.prefLabel, calabel))\n",
|
||
" if(eslabel): \n",
|
||
" cl_manicherdf.add((pltextile[label], SKOS.prefLabel, eslabel))\n",
|
||
" if(esarglabel):\n",
|
||
" cl_manicherdf.add((pltextile[label], SKOS.prefLabel, esarglabel))\n",
|
||
" \n",
|
||
"\n",
|
||
" if(esmexlabel):\n",
|
||
" cl_manicherdf.add((pltextile[label], SKOS.prefLabel, esmexlabel))\n",
|
||
" if(ptbrlabel):\n",
|
||
" cl_manicherdf.add((pltextile[label], SKOS.prefLabel, ptbrlabel))\n",
|
||
" \n",
|
||
" if (itdef):\n",
|
||
" cl_manicherdf.add((pltextile[label], SKOS.definition, itdef))\n",
|
||
"\n",
|
||
"print(len(cl_manicherdf))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 33,
|
||
"id": "matched-mustang",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"cl_manicherdf.serialize(destination='data/lexpanlatmanicheskos_11.ttl', format=\"n3\");#format=\"pretty-xml\")\n",
|
||
"cl_manicherdf.serialize(destination='data/lexpanlatmanicheskos_11.rdf', format=\"pretty-xml\");#format=\"pretty-xml\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "talented-making",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Lessico panlatino dei Colli"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 34,
|
||
"id": "centered-advantage",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"urlco=conf['Source']['LESSICOCOLLISOURCE']\n",
|
||
"df_data_colli=pd.read_csv(urlco)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 35,
|
||
"id": "desperate-uruguay",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"df_data_colli.rename(columns = {'es [ARG]': 'es-arg', 'es [MEX]': 'es-mex', 'pt [BR]': 'pt-br'}, inplace = True)\n",
|
||
"df_data_colli.fillna('', inplace=True)\n",
|
||
"#df_data_colli.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 36,
|
||
"id": "magnetic-stake",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"cl_collirdf = rdflib.Graph()\n",
|
||
"cl_collirdf.bind(\"pltextile\", pltextile)\n",
|
||
"cl_collirdf.bind(\"dc11\", dc11)\n",
|
||
"cl_collirdf.bind(\"dct\", dct)\n",
|
||
"cl_collirdf.bind(\"iso369-3\", iso369)\n",
|
||
"cl_collirdf.bind(\"skos\", SKOS)\n",
|
||
"cl_collirdf.bind(\"dc\", DC)\n",
|
||
"cl_collirdf.bind(\"rdf\", RDF)\n",
|
||
"cl_collirdf.bind(\"owl\", OWL)\n",
|
||
"cl_collirdf.bind(\"xsd\", XSD)\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "hidden-purple",
|
||
"metadata": {},
|
||
"source": [
|
||
"SKOS concept scheme"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "christian-paste",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"now = datetime.datetime.today()\n",
|
||
"today_date=now.date()\n",
|
||
"title=Literal(conf['Texts']['LESSICOCOLLITITLE'], lang=conf['Texts']['LANG'])\n",
|
||
"description=Literal(conf['Texts']['LESSICOCOLLIDESCRIPTION'], lang=conf['Texts']['LANG'])\n",
|
||
"description_it=Literal(conf['Texts']['LESSICOCOLLIDESCRIPTION_IT'], lang='it')\n",
|
||
"identifier=Literal(conf['Texts']['LESSICOCOLLIID'], lang=conf['Texts']['LANG'])\n",
|
||
"#identifier=URIRef(conf['Texts']['VOCABULARYID'])\n",
|
||
"createddate= Literal(conf['Texts']['LESSICOCREATEDATE'],datatype=XSD.date)\n",
|
||
"moddate= Literal(today_date,datatype=XSD.date)\n",
|
||
"version= Literal(conf['Texts']['LESSICOVERSION'],datatype=XSD.string)\n",
|
||
"\n",
|
||
"cl_collirdf.add((pltextile[''], RDF.type, SKOS.ConceptScheme))\n",
|
||
"cl_collirdf.add((pltextile[''], DC.title, title))\n",
|
||
"cl_collirdf.add((pltextile[''], DC.identifier, identifier))\n",
|
||
"cl_collirdf.add((pltextile[''], DC.description, description))\n",
|
||
"\n",
|
||
"cl_collirdf.add((pltextile[''], dct.created, createddate))\n",
|
||
"cl_collirdf.add((pltextile[''], dct.modified, moddate))\n",
|
||
"cl_collirdf.add((pltextile[''], OWL.versionInfo, version))\n",
|
||
"cl_collirdf.add((pltextile[''], dct.language, iso369.eng))\n",
|
||
"cl_collirdf.add((pltextile[''], dct.language, iso369.es))\n",
|
||
"cl_collirdf.add((pltextile[''], dct.language, iso369.fra))\n",
|
||
"cl_collirdf.add((pltextile[''], dct.language, iso369.ita))\n",
|
||
"cl_collirdf.add((pltextile[''], dct.language, iso369.pt))\n",
|
||
"cl_collirdf.add((pltextile[''], dct.language, iso369.ca))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "incorporate-difference",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Mapping\n",
|
||
"for index, row in df_data_colli.iterrows():\n",
|
||
" \n",
|
||
" strlabel=row.it.split('\\n')[0].split(' (')[0].strip()\n",
|
||
" label=strlabel.replace(\" \", \"_\").replace(\"’\",\"\")\n",
|
||
" #label=URIRef(row.it.split('\\n')[0].split(' (')[0].strip())\n",
|
||
" cl_collirdf.add((pltextile[''], SKOS.hasTopConcept, pltextile[label])) \n",
|
||
" frlabel=Literal(row[\"fr\"].split('\\n')[0].strip(), lang='fr')\n",
|
||
" fraltlabels=row[\"fr\"].split('\\n')[1:]\n",
|
||
" itlabel=Literal(row['it'].split('\\n')[0].strip(), lang='it')\n",
|
||
" italtlabels=row[\"it\"].split('\\n')[1:] \n",
|
||
" calabel=Literal(row['ca'].split('\\n')[0].strip(), lang='ca')\n",
|
||
" caaltlabels=row[\"ca\"].split('\\n')[1:]\n",
|
||
" eslabel=Literal(row['es'].split('\\n')[0].strip(), lang='es')\n",
|
||
" esaltlabels=row[\"es\"].split('\\n')[1:]\n",
|
||
" #gllabel=Literal(row['gl'].split('\\n')[0].strip(), lang='gl')\n",
|
||
" #glaltlabels=row[\"gl\"].split('\\n')[1:]\n",
|
||
" ptlabel=Literal(row['pt'].split('\\n')[0].strip(), lang='pt')\n",
|
||
" ptaltlabels=row[\"pt\"].split('\\n')[1:]\n",
|
||
"# rolabel=Literal(row['ro'].split('\\n')[0].strip(), lang='ro')\n",
|
||
"# roaltlabels=row[\"ro\"].split('\\n')[1:]\n",
|
||
" enlabel=Literal(row['en'].split('\\n')[0].strip(), lang='en')\n",
|
||
" enaltlabels=row[\"en\"].split('\\n')[1:]\n",
|
||
" \n",
|
||
" esarglabel=Literal(row['es-arg'].split('\\n')[0].strip(), lang='es-ar')\n",
|
||
" esargaltlabels=row[\"es-arg\"].split('\\n')[1:]\n",
|
||
" \n",
|
||
"\n",
|
||
" esmexlabel=Literal(row['es-mex'].split('\\n')[0].strip(), lang='es-mx')\n",
|
||
" esmexaltlabels=row[\"es-mex\"].split('\\n')[1:]\n",
|
||
" ptbrlabel=Literal(row['pt-br'].split('\\n')[0].strip(), lang='pt-br')\n",
|
||
" ptbraltlabels=row[\"pt-br\"].split('\\n')[1:]\n",
|
||
" \n",
|
||
" #definition\n",
|
||
" itdef=Literal(row[\"DEF\"].strip(), lang='it')\n",
|
||
" \n",
|
||
" cl_collirdf.add((pltextile[label], RDF.type, SKOS.Concept))\n",
|
||
" cl_collirdf.add((pltextile[label], SKOS.inScheme, pltextile['']))\n",
|
||
" cl_collirdf.add((pltextile[label], SKOS.topConceptOf, pltextile['']))\n",
|
||
" \n",
|
||
" for alab in esargaltlabels:\n",
|
||
" cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='es-ar')))\n",
|
||
" \n",
|
||
" \n",
|
||
" for alab in esmexaltlabels:\n",
|
||
" cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='es-mx')))\n",
|
||
" \n",
|
||
" for alab in ptbraltlabels:\n",
|
||
" cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='pt-br')))\n",
|
||
" \n",
|
||
" for alab in esaltlabels:\n",
|
||
" cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='es')))\n",
|
||
" \n",
|
||
"# for alab in glaltlabels:\n",
|
||
"# cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='gl')))\n",
|
||
" \n",
|
||
" for alab in ptaltlabels:\n",
|
||
" cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='pt')))\n",
|
||
" \n",
|
||
"# for alab in roaltlabels:\n",
|
||
"# cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='ro')))\n",
|
||
" \n",
|
||
" for alab in enaltlabels:\n",
|
||
" cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='en')))\n",
|
||
" \n",
|
||
" for alab in caaltlabels:\n",
|
||
" cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='ca')))\n",
|
||
" \n",
|
||
" for alab in fraltlabels:\n",
|
||
" #print (\"tt \"+alab)\n",
|
||
" cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='fr')))\n",
|
||
" for alab in italtlabels:\n",
|
||
" cl_collirdf.add((pltextile[label], SKOS.altLabel, Literal(alab, lang='it')))\n",
|
||
" \n",
|
||
" \n",
|
||
" if(frlabel):\n",
|
||
" cl_collirdf.add((pltextile[label], SKOS.prefLabel, frlabel))\n",
|
||
" if(itlabel):\n",
|
||
" cl_collirdf.add((pltextile[label], SKOS.prefLabel, itlabel))\n",
|
||
"# if(gllabel):\n",
|
||
"# cl_collirdf.add((pltextile[label], SKOS.prefLabel, gllabel))\n",
|
||
" \n",
|
||
" if(ptlabel):\n",
|
||
" cl_collirdf.add((pltextile[label], SKOS.prefLabel, ptlabel))\n",
|
||
"# if(rolabel):\n",
|
||
"# cl_collirdf.add((pltextile[label], SKOS.prefLabel, rolabel))\n",
|
||
" if(enlabel):\n",
|
||
" cl_collirdf.add((pltextile[label], SKOS.prefLabel, enlabel))\n",
|
||
" \n",
|
||
" if(calabel): \n",
|
||
" cl_collirdf.add((pltextile[label], SKOS.prefLabel, calabel))\n",
|
||
" if(eslabel): \n",
|
||
" cl_collirdf.add((pltextile[label], SKOS.prefLabel, eslabel))\n",
|
||
" if(esarglabel):\n",
|
||
" cl_collirdf.add((pltextile[label], SKOS.prefLabel, esarglabel))\n",
|
||
" \n",
|
||
"\n",
|
||
" if(esmexlabel):\n",
|
||
" cl_collirdf.add((pltextile[label], SKOS.prefLabel, esmexlabel))\n",
|
||
" if(ptbrlabel):\n",
|
||
" cl_collirdf.add((pltextile[label], SKOS.prefLabel, ptbrlabel))\n",
|
||
" \n",
|
||
" if (itdef):\n",
|
||
" cl_collirdf.add((pltextile[label], SKOS.definition, itdef))\n",
|
||
"\n",
|
||
"print(len(cl_collirdf))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "applicable-commissioner",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"cl_collirdf.serialize(destination='data/lexpanlatcolliskos_11.ttl', format=\"n3\");#format=\"pretty-xml\")\n",
|
||
"cl_collirdf.serialize(destination='data/lexpanlatcolliskos_11.rdf', format=\"pretty-xml\");#format=\"pretty-xml\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "limiting-duration",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.11.1"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|