Skip to content

Introduction

datenstrom.io is a privacy-friendly data collection solution for websites, applications, products, and devices. With a focus on first party data collection, datenstrom.io helps gather you own data and get a better understanding of your customers and products. Collected data is enriched and delivered directly to your own data store. You can focus on your data and insights without worrying about the heavy lifting of data collection and processing.

Features

  • First party data collection: Use your own domain and pipeline to collect data
  • Privacy friendly: Full control over cookies and PII collection
  • Data enrichment: Data is enriched with additional information like device type, location, and more
  • Data processing and delivery: Data is processed and stored in your data store in parquet format
  • Event based architecture: Get notified about new parquet files in your data store and trigger further processing or loading into your warehouse
  • Strong data schema: Data is stored in a structured format with a predefined schema for easy querying and analysis

Architecture

datenstrom pipeline

Event Trackers

Event trackers are used to collect data from your website, application, product, or device. They send events to your dedicated collector endpoint with your custom domain. datenstrom.io provides a simple to use API to send events and is compatible with many existing tracking SDKs and libraries. For example all open source snowplow trackers are supported out of the box.

Event pipeline

The event pipeline processes incoming events and enriches them with additional information like device type, location, and more. Our reliable and scalable event pipeline ensures that no data is lost and high quality data is delivered. On a scheduled interval the data is processed, converted into parquet format of your chosen schema, and delivered to your data store. With connectors and webhooks you can trigger further processing like loading the data into your data warehouse or triggering custom actions.

Data store

datenstrom.io uses your own data store to store the processed data in parquet format. You can use your own data store like S3 to have price efficient storage and centralized data access. Parquet is a columnar storage format that is optimized for query performance and storage efficiency it is supported by nearly all data processing tools and data warehouses. With this approach you can keep full control over your data and have all possibilities for your data team.