Data Processing with Python
Python and Jupyter Lab/Google Colab
Metadata validation and standardization – Python scripts + Pandas library
import pandas as pd
import re
df = pd.read_csv("articles.csv")
def validate_doi(doi):
return bool(re.match(r"^10\.\d{4,9}/[-._;()/:A-Z0-9]+$", doi, re.I))
def validate_orcid(orcid):
return bool(re.match(r"^https?://orcid\.org/\d{4}-\d{4}-\d{4}-\d{4}$", orcid))
df["DOI_status"] = df["DOI"].apply(lambda x: "OK" if validate_doi(str(x)) else "Invalid")
df["ORCID_status"] = df["ORCID"].apply(lambda x: "OK" if validate_orcid(str(x)) else "Invalid")
df.to_excel("validated_articles.xlsx", index=False)
ETL – data extraction, transformation and loading
Last updated