Notice: file_put_contents(): Write of 11420 bytes failed with errno=28 No space left on device in /var/www/tgoop/post.php on line 50
Hacker News@hacker_news_feed P.113777
HACKER_NEWS_FEED Telegram 113777
Show HN: I built an open-source data pipeline tool in Go (Score: 150+ in 13 hours)

Link: https://readhacker.news/s/6jG5n
Comments: https://readhacker.news/c/6jG5n

Every data pipeline job I had to tackle required quite a few components to set up:
- One tool to ingest data
- Another one to transform it
- If you wanted to run Python, set up an orchestrator
- If you need to check the data, a data quality tool
Let alone this being hard to set up and taking time, it is also pretty high-maintenance. I had to do a lot of infra work, and while this being billable hours for me I didn’t enjoy the work at all. For some parts of it, there were nice solutions like dbt, but in the end for an end-to-end workflow, it didn’t work. That’s why I decided to build an end-to-end solution that could take care of data ingestion, transformation, and Python stuff. Initially, it was just for our own usage, but in the end, we thought this could be a useful tool for everyone.
In its core, Bruin is a data framework that consists of a CLI application written in Golang, and a VS Code extension that supports it with a local UI.
Bruin supports quite a few stuff:
- Data ingestion using ingestr (https://github.com/bruin-data/ingestr)
- Data transformation in SQL & Python, similar to dbt
- Python env management using uv
- Built-in data quality checks
- Secrets management
- Query validation & SQL parsing
- Built-in templates for common scenarios, e.g. Shopify, Notion, Gorgias, BigQuery, etc
This means that you can write end-to-end pipelines within the same framework and get it running with a single command. You can run it on your own computer, on GitHub Actions, or in an EC2 instance somewhere. Using the templates, you can also have ready-to-go pipelines with modeled data for your data warehouse in seconds.
It includes an open-source VS Code extension as well, which allows working with the data pipelines locally, in a more visual way. The resulting changes are all in code, which means everything is version-controlled regardless, it just adds a nice layer.
Bruin can run SQL, Python, and data ingestion workflows, as well as quality checks. For Python stuff, we use the awesome (and it really is awesome!) uv under the hood, install dependencies in an isolated environment, and install and manage the Python versions locally, all in a cross-platform way. Then in order to manage data uploads to the data warehouse, it uses dlt under the hood to upload the data to the destination. It also uses Arrow’s memory-mapped files to easily access the data between the processes before uploading them to the destination.
We went with Golang because of its speed and strong concurrency primitives, but more importantly, I knew Go better than the other languages available to me and I enjoy writing Go, so there’s also that.
We had a small pool of beta testers for quite some time and I am really excited to launch Bruin CLI to the rest of the world and get feedback from you all. I know it is not often to build data tooling in Go but I believe we found ourselves in a nice spot in terms of features, speed, and stability.
https://github.com/bruin-data/bruin
I’d love to hear your feedback and learn more about how we can make data pipelines easier and better to work with, looking forward to your thoughts!
Best, Burak



tgoop.com/hacker_news_feed/113777
Create:
Last Update:

Show HN: I built an open-source data pipeline tool in Go (Score: 150+ in 13 hours)

Link: https://readhacker.news/s/6jG5n
Comments: https://readhacker.news/c/6jG5n

Every data pipeline job I had to tackle required quite a few components to set up:
- One tool to ingest data
- Another one to transform it
- If you wanted to run Python, set up an orchestrator
- If you need to check the data, a data quality tool
Let alone this being hard to set up and taking time, it is also pretty high-maintenance. I had to do a lot of infra work, and while this being billable hours for me I didn’t enjoy the work at all. For some parts of it, there were nice solutions like dbt, but in the end for an end-to-end workflow, it didn’t work. That’s why I decided to build an end-to-end solution that could take care of data ingestion, transformation, and Python stuff. Initially, it was just for our own usage, but in the end, we thought this could be a useful tool for everyone.
In its core, Bruin is a data framework that consists of a CLI application written in Golang, and a VS Code extension that supports it with a local UI.
Bruin supports quite a few stuff:
- Data ingestion using ingestr (https://github.com/bruin-data/ingestr)
- Data transformation in SQL & Python, similar to dbt
- Python env management using uv
- Built-in data quality checks
- Secrets management
- Query validation & SQL parsing
- Built-in templates for common scenarios, e.g. Shopify, Notion, Gorgias, BigQuery, etc
This means that you can write end-to-end pipelines within the same framework and get it running with a single command. You can run it on your own computer, on GitHub Actions, or in an EC2 instance somewhere. Using the templates, you can also have ready-to-go pipelines with modeled data for your data warehouse in seconds.
It includes an open-source VS Code extension as well, which allows working with the data pipelines locally, in a more visual way. The resulting changes are all in code, which means everything is version-controlled regardless, it just adds a nice layer.
Bruin can run SQL, Python, and data ingestion workflows, as well as quality checks. For Python stuff, we use the awesome (and it really is awesome!) uv under the hood, install dependencies in an isolated environment, and install and manage the Python versions locally, all in a cross-platform way. Then in order to manage data uploads to the data warehouse, it uses dlt under the hood to upload the data to the destination. It also uses Arrow’s memory-mapped files to easily access the data between the processes before uploading them to the destination.
We went with Golang because of its speed and strong concurrency primitives, but more importantly, I knew Go better than the other languages available to me and I enjoy writing Go, so there’s also that.
We had a small pool of beta testers for quite some time and I am really excited to launch Bruin CLI to the rest of the world and get feedback from you all. I know it is not often to build data tooling in Go but I believe we found ourselves in a nice spot in terms of features, speed, and stability.
https://github.com/bruin-data/bruin
I’d love to hear your feedback and learn more about how we can make data pipelines easier and better to work with, looking forward to your thoughts!
Best, Burak

BY Hacker News




Share with your friend now:
tgoop.com/hacker_news_feed/113777

View MORE
Open in Telegram


Telegram News

Date: |

During a meeting with the president of the Supreme Electoral Court (TSE) on June 6, Telegram's Vice President Ilya Perekopsky announced the initiatives. According to the executive, Brazil is the first country in the world where Telegram is introducing the features, which could be expanded to other countries facing threats to democracy through the dissemination of false content. Read now When choosing the right name for your Telegram channel, use the language of your target audience. The name must sum up the essence of your channel in 1-3 words. If you’re planning to expand your Telegram audience, it makes sense to incorporate keywords into your name. Content is editable within two days of publishing bank east asia october 20 kowloon
from us


Telegram Hacker News
FROM American