Pytanie |
Odpowiedź |
ETL (Extract, Transform, Load) rozpocznij naukę
|
|
A process where data is extracted from a source, transformed (e.g. cleaned or aggregated), and then loaded into a database or data warehouse.
|
|
|
ELT (Extract, Load, Transform) rozpocznij naukę
|
|
Raw data is first loaded into the destination (like BigQuery), and then transformed using SQL or other tools inside the warehouse.
|
|
|
DAG (Directed Acyclic Graph – Airflow) rozpocznij naukę
|
|
A structure used in Airflow to define workflows. It represents a sequence of tasks that must run in a specific, non-circular order.
|
|
|
rozpocznij naukę
|
|
Dividing a large table into parts (usually by date) to make queries faster and cheaper by scanning only relevant partitions.
|
|
|
rozpocznij naukę
|
|
A way to combine data from two or more tables based on a related column (e.g. user_id).
|
|
|
rozpocznij naukę
|
|
WHERE filters rows before aggregation; HAVING filters after. Example: HAVING COUNT(*) > 100.
|
|
|
rozpocznij naukę
|
|
Python API for Apache Spark. It’s used to process very large datasets in a distributed, parallelized way.
|
|
|
rozpocznij naukę
|
|
A serverless cloud data warehouse from Google, designed for running fast SQL queries on large datasets.
|
|
|
rozpocznij naukę
|
|
A storage system for raw, unstructured, or semi-structured data — often used for flexible analytics or staging.
|
|
|
rozpocznij naukę
|
|
A structured database optimized for analysis and reporting, typically holding cleaned and transformed data.
|
|
|
rozpocznij naukę
|
|
A unit of work in Airflow DAGs – defines what each task does (e.g. PythonOperator, BashOperator).
|
|
|
rozpocznij naukę
|
|
A named data stream in Apache Kafka where producers send and consumers receive messages.
|
|
|
IAM (Identity and Access Management – GCP) rozpocznij naukę
|
|
A system for managing permissions and access to resources in Google Cloud – defines who can do what.
|
|
|
KPI (Key Performance Indicator) rozpocznij naukę
|
|
A measurable value that shows how effectively a process or business is performing (e.g. conversion rate, average delay).
|
|
|
rozpocznij naukę
|
|
Transformations are not executed until an action (like. count() or. collect()) is called – helps optimize performance.
|
|
|
rozpocznij naukę
|
|
A setting that allows a task to be automatically retried after failure, helpful for unstable operations.
|
|
|
rozpocznij naukę
|
|
The process of ensuring that data is accurate and consistent – includes checking for missing values, duplicates, or wrong formats.
|
|
|
rozpocznij naukę
|
|
A function that performs calculations across a "window" of rows related to the current row, without collapsing them into a single result (e.g. ROW_NUMBER(), AVG(...) OVER(...)).
|
|
|