liberTECHS

  • About
  • Contact
  • Portfolio
  • Home

Cloud Data Migration (2)

Goal

Future-proof public sector client’s data management processes (collection, sharing, use, storage and publication). Consolidate data assets from multiple source systems for analysts to gain better insights and enrich their analyses.

Role

  • Developed and optimized AWS-based ETL processes: Designed and implemented 5 AWS Lambda functions and multiple AWS Glue ETL jobs, enhancing data ingestion and processing efficiency. This resulted in a 30% reduction in data pipeline execution time by automating data mapping, data asset initialization, and data preparation workflows.
  • Implemented robust data sanitization and compliance measures: Enhanced data quality and regulatory compliance by debugging and extending AWS Glue ETL jobs to perform comprehensive data sanitization, personally identifiable information (PII) identification and redaction. This ensured all data handling adhered to GDPR standards, securing sensitive information across all datasets.
  • Streamlined data migration and documentation: Authored high-level and low-level design documents and testing plans to ensure clear, maintainable code and reliable data migration processes. Developed 25 custom scripts for efficient data migration into AWS PostgreSQL, facilitating the seamless transition of critical data assets with minimal downtime.

Capability

Data-driven decision-making and reporting enabled via custom built:

  1. Data Integration: Combined data from various sources into a unified data warehouse.
  2. Data Transformation: Standardised and sanitised data for consistency and accuracy.
  3. Data Accessibility: Provided a user-friendly interface for accessing and querying data.
  4. Reporting and Analytics: Enabled comprehensive reporting and analytics capabilities.

Stack

  • Python – coding language for data pipelines
  • SQL – DQL, DML, DDL
  • AWS Cloud Services
    • Aurora RDS – To host a data warehouse
    • Glue – for data pipelines
    • Lambda – serverless compute platform
    • Step Function – workflow orchestrator
    • S3 – cloud storage
    • Comprehend – Machine Learning powered, continuoulsy trained Natural Language Processing (NLP) service. Used to identify and redact the personally identifiable information (PII).
    • CodeCommit – for repository management
    • CloudShell – browser-based CLI environment
  • Apache Airflow – for orchestration of data pipelines using DAGs
  • Docker Engine / Desktop – to create containers for airflow environment and postgresql instance
  • DevOps – GitHub Actions, Terraform
  • Git – for repository management
  • Pycharm – IDE
  • Atlassian Toolset – Jira / Confluence
  • Sharepoint – content collaboration and workspace
  • Mural – content collaboration / design
  • Microsoft Visio – solution design

Year

2024

Filed Under: Data Engineering & BI

Automated Trading

Goal

Coming soon.

Role

Coming soon.

Capability

Coming soon.

Skills

MQL4 (scripting language for automated FX trading via MetaTrader 4 platform)

Filed Under: Data Engineering & BI

Text Processor

Goal

Automatically recognise, process and evaluate structured text data. Tool builds up an integrity score (0-100 score) for a professional advisor based on the attributes of the reviews they receive.

Role

I developed a python tool to recognise, process and evaluate structured text data.

Capability

Python tool reads input text data line by line and calculates an integrity score based on a number of factors:

  • Lots to say: Genuine reviewers tend to say less – knock 0.5% points off for each review that contains more than 100 words.
  • Burst: If a number of reviews come in within the same time frame – knock 40% points off if 2 or more come through in the same minute, 20% points if they come through in the same hour.
  • Same Device: We have a system that forms a readable tag (e.g. LB4-6WR) based on the browser/device/location. If we are seeing multiple reviews coming from the same device knock 30% points off each time.
  • All-Star: Non-genuine reviews are likely to have a five-star rating take 2% points off the integrity score for each review that has 5 stars; quadruple the penalty if the average is under 3.5 stars.
  • Solicited: If the review was left by someone who was invited by the professional then add 3% points to the integrity score.

Skills

Python, API development

Year

2020

Filed Under: Data Engineering & BI

Bulk Downloader

Goal

Power reporting process to create a lot of visibility and impact + deliver time and money savings on production.

Role

I developed a python tool to automate tedious data process with python (retrieval + sharing) for the FTSE 100 company. This has boosted their reporting efficiency + unlocked new time for a more rewarding work.

Capability

Produces a user-friendly and comprehensive presentation to the top management. Python tool actions below for 100+ reports:

  • authenticates access to an in-house web-based system
  • opens necessary pages
  • downloads page contents as offline HTML files
  • uploads HTML files onto SharePoint site

Skills

Python, Microsoft SharePoint.

Year

2021

Filed Under: Data Engineering & BI

CRUD Operations

Goal

Enable local church community to self-manage rotas for a smooth mass service.

Role

I developed a CRUD app to handle create-read-update-delete operations on a community website.

Capability

A web app featuring:

  • functionality to create – read – update – delete rotas info
  • permissions

Skills

PHP, HTML5, CSS3, JavaScript

Year

2020

Filed Under: Apps & Websites, Data Engineering & BI

Web Scraper

Data collection tool for producing accurate forecasts of the total supply in the UK’s natural gas market.

Goal

Client asked me to fix 3 resource drains:

  1. Time drain. Time consuming data collection.
  2. Money drain. Manually collected data was used in the natural gas demand forecasting process. Poor data quality and insufficient data collection frequency resulted in suboptimal forecasting results and significant revenue loss.
  3. Stress. Unlock new time for more rewarding work and reduce human stress.

Role

I automated tedious data processes with python (retrieval, harmonization, reformatting, sharing), saving 2 FTE hours per day, improving forecasting accuracy, maximizing company revenue, and creating more time for rewarding work.

Capability

I delivered 3 benefits for the client:

  1. Time gain. 2 hours (1 FTE equivalent) of time savings per day;
  2. Money gain. Better data quality, higher frequency and faster collection process significantly improved demand forecasting deliverables and protected key revenue stream.
  3. Reduced human stress. New time for more rewarding work unlocked / stress reduced.

Preview

Skills

Python, API development, Database design, Data pipelines, ETL, Web scraping, Data quality automation.

Year

2019

Filed Under: Data Engineering & BI

Quantifying Compliance in the FTSE 100 Company

Goal

Develop a platform for gauging data BMS compliance across the company’s data landscape (1000+ of data nodes and 5000+ of data flows between them) in real time. Establish, centralise and create organisation-wide transparency by enabling the business to continuously manage the data about data (eg. information on data ownership, business purpose, data related obligations, state of their data controls, state of their data quality). Also, identify and profile data related risks, drive remedial actions.

Role

  • I have led the development of key components within the compliance apps ecosystem, which powers the creation, measurement, and automation of data management standards compliance for the UK’s electricity system operator, National Grid ESO.
  • I proposed and implemented various automated workflows (e.g., QuickBase native API, Python API, Power BI UI) to identify gaps, track progress, report findings, and initiate personalized remedial actions across over 60 business teams.
  • I translated the standards into more than 30 custom-coded KPIs, including procedures for strengthening data controls across thousands of the company’s critical and operationally critical data nodes and flows using the QuickBase native API.
  • I simplified and streamlined the user experience (UX) and user interface (UI) of the National Grid ESO’s “Data Management Library” app to ensure it aligns seamlessly with evolving business needs.
  • Despite a 2x reduction in my team’s headcount, my achievements were crucial in maintaining and improving our team’s service levels (annual NPS score up from 53 to 65).

Capability

The main app in the ecosystem – the National Grid ESO “Data Management Library” (DML) featuring:

  • 30+ live custom coded KPIs measuring compliance against
  • Data BMS standards in
  • 65 business teams owning
  • 1000+ data nodes and
  • 5000+ data flows

Skills

QuickBase native API, Python Pandas, Python Selenium, Power BI, HTML5, SQL.

Year

2021

Filed Under: Apps & Websites, Data Engineering & BI

Knowledge Extraction With Spark

Goal

Derive meaningful insights and knowledge from large volumes of text data.

Role

I developed a python and spark based tools to recognise, process and summarise untructured text data.

Capability

Highly customisable toolkit for rapid and high quality exploration of text patterns:

  • Profile unstructured text data by recognizing and quantifying either all text strings or user-defined ones.
  • Capture context variations around text patterns of interest
  • Conduct semantic grouping at scale to enhance clarity regarding patterns in text and improve the accuracy of conclusions.
  • Produce analytics-ready summarized data.

Skills

Regex, Spark, Python, Big Data

FROM:

Polluted + unnecessarily dilluted view of text patterns

TO:

Contextual focus + meaningful relative scale

Year

2022

Filed Under: Data Engineering & BI

Cloud Data Migration

Goal

To future-proof and centralise their data assets, client embarked on a cloud-based data migration and transformation journey.

Role

  • Supported the client’s cloud-based data migration and centralization initiative, resulting in a 25% reduction in data access time and saving 000s of pounds in annual maintenance costs by:
    • Collaborating  on PySpark and SQL-based Azure Synapse notebooks;
    • Scripting CI/CD integration for data loading, transformation, validation, and writing into the ‘Strategic Data Platform’ SDP) for HM Courts and Tribunals Service, UK Ministry of Justice.

Skills

SQL ; PySpark ; Azure Data Factory (ADF) ; Synapse Analytics; CI/CD ; Regular Expressions (Regex) ; Git ; Microsoft Azure Cloud

Year

2023

Filed Under: Data Engineering & BI

Social Listening Data Pipelines

Goal

Client wanted to continuously quantify how the biggest and 200+ most polluting companies compare to each other (on the 0-100 scale) when it comes to their reputation + performance in causing less pollution + pro-activity in the space of general sustainability.

Role

  • Leveraged python OOP code across various tasks, reducing time-to-delivery by 60%. The code executes competitive benchmarking and sentiment analysis, thus improving the client’s campaign accuracy.
  • Co-developed python data pipelines for web crawling, ETL, data validation, and testing. To our client this serves as an extensive and efficient social listening system. The system scans corporate social media, annual reports for the 200+ companies, processes the data and feeds into the charts and KPIs at the front-end.
  • Quantified the reputation, sustainability and pollution reduction efforts across 200 target companies. Outcome was key in the acquisition of 2 new high-profile clients;

Capability

Multi-source and multi-keyword data pipelines driving sentiment analysis, competitive benchmarking and effective resource allocation in multinational advertising and public relations company.

Skills

Python (OOP) ; PyTest ; data pipelines building (Python + Luigi) ; Regular Expressions (Regex) ; Git ; Data validation (Great Expectations); Data ETL ; Web crawling (pages and apis) ; AWS S3

Year

2022

Filed Under: Data Engineering & BI

CONTACT

To get in touch, please fill out a contact form or give me a call at +44 7854 878 140.

 

SOCIALIZE

  • LinkedIn

JUMP TO

About
Portfolio
Home

© Copyright 2020 · LIBERTECHS · All Rights Reserved

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.