Logstash

A quick walkthrough of Logstash, the ETL engine offered by the Elastic Stack. Logstash is an open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite stash Logstash gained its initial popularity with log and metric collection, such as log4j logs, Apache web logs and syslog. Its application has broadened, to all kinds of data sources like large scale event streams, webhooks, database and message queue integration. Once data is transformed and cleaned up is routed to a final destination (i.e. the stash), Elasticsearch is one option, but lots of other choices are there (mongo, S3, Nagios, IRC, email). ...

December 7, 2018 · 6 min

PostgreSQL

PostgreSQL (postgres or pgsql) is a powerful open source relational database known for reliability, extensibility, and standards compliance. It features: Advanced SQL support (window functions, CTEs, JSON, full-text search) ACID compliance and strong transactional integrity Rich indexing (B-tree, GIN, GiST, BRIN, hash, SP-GiST) Extensible with custom types, operators, and functions MVCC for high concurrency and performance Robust security, authentication, and role management Active community, frequent releases, and excellent documentation Ideal for everything from small apps to large-scale, mission-critical systems. ...

July 14, 2018 · 7 min

Apache Spark

Recently I’ve had the opportunity to dig into Apache Spark, thanks to some training from Brian Bloechle from Cloudera. What is spark? Fast, flexible, and developer friendly, Apache Spark is the leading platform for large scale SQL, batch processing, stream processing, and machine learning. Java, Scala, Python and R are first class citizens when its comes to consuming the various Spark API’s. I’ll cover PySpark in more detail. Spark is an agnostic processing engine, that can target a number of cluster managers including Spark Standalone, Hadoop’s YARN, Apache Mesos and Kubernetes. In the context of Spark, some useful surrounding ecosystem to be aware of: ...

July 2, 2018 · 5 min