DE int

 0    18 fiszek    guest3164346
ściągnij mp3 drukuj graj sprawdź się
 
Pytanie English Odpowiedź English
ETL (Extract, Transform, Load)
rozpocznij naukę
A process where data is extracted from a source, transformed (e.g. cleaned or aggregated), and then loaded into a database or data warehouse.
ELT (Extract, Load, Transform)
rozpocznij naukę
Raw data is first loaded into the destination (like BigQuery), and then transformed using SQL or other tools inside the warehouse.
DAG (Directed Acyclic Graph – Airflow)
rozpocznij naukę
A structure used in Airflow to define workflows. It represents a sequence of tasks that must run in a specific, non-circular order.
Partitioning (BigQuery)
rozpocznij naukę
Dividing a large table into parts (usually by date) to make queries faster and cheaper by scanning only relevant partitions.
JOIN (SQL)
rozpocznij naukę
A way to combine data from two or more tables based on a related column (e.g. user_id).
HAVING vs WHERE (SQL)
rozpocznij naukę
WHERE filters rows before aggregation; HAVING filters after. Example: HAVING COUNT(*) > 100.
PySpark
rozpocznij naukę
Python API for Apache Spark. It’s used to process very large datasets in a distributed, parallelized way.
BigQuery
rozpocznij naukę
A serverless cloud data warehouse from Google, designed for running fast SQL queries on large datasets.
Data Lake
rozpocznij naukę
A storage system for raw, unstructured, or semi-structured data — often used for flexible analytics or staging.
Data Warehouse
rozpocznij naukę
A structured database optimized for analysis and reporting, typically holding cleaned and transformed data.
Airflow Operator
rozpocznij naukę
A unit of work in Airflow DAGs – defines what each task does (e.g. PythonOperator, BashOperator).
Kafka Topic
rozpocznij naukę
A named data stream in Apache Kafka where producers send and consumers receive messages.
IAM (Identity and Access Management – GCP)
rozpocznij naukę
A system for managing permissions and access to resources in Google Cloud – defines who can do what.
KPI (Key Performance Indicator)
rozpocznij naukę
A measurable value that shows how effectively a process or business is performing (e.g. conversion rate, average delay).
Lazy Evaluation (Spark)
rozpocznij naukę
Transformations are not executed until an action (like. count() or. collect()) is called – helps optimize performance.
Retry (Airflow)
rozpocznij naukę
A setting that allows a task to be automatically retried after failure, helpful for unstable operations.
Data Validation
rozpocznij naukę
The process of ensuring that data is accurate and consistent – includes checking for missing values, duplicates, or wrong formats.
Window Function (SQL)
rozpocznij naukę
A function that performs calculations across a "window" of rows related to the current row, without collapsing them into a single result (e.g. ROW_NUMBER(), AVG(...) OVER(...)).

Musisz się zalogować, by móc napisać komentarz.