Search by job, company or skills

RRD Global Outsourcing

Product Lead

Early Applicant
  • a month ago
  • Be among the first 50 applicants

Job Description

1. Data architecture: Designing and implementingdata architecture that meets the needs of the organization, taking intoconsideration factors such as scalability, security, and performance.

a. Skills : data architecture skills andtechnologies encompass a broad range of knowledge and tools, including databasemanagement systems, data modeling, ETL tools, cloud infrastructure, datagovernance, big data technologies, data visualization tools, data warehousing,programming languages, and data migration techniques.

b. Details :

i. Database management systems (DBMS):Understanding of various types of DBMS such as Relational, NoSQL, Graph,Columnar etc. and their applications in different scenarios.

ii. Data governance: Understanding of datagovernance principles and practices, including data quality, data security, anddata privacy.

iii. Familiarity with big data technologies such asHadoop, Spark, and Kafka for storing, processing, and analyzing large amountsof data.

iv. Experience with data visualization tools such asTableau, Power BI, or Qlik for creating visual representations of data.

v. Understanding of data warehousing principlesand practices, including designing and implementing data warehouses using toolssuch as Snowflake, Redshift or Google BigQuery.

vi. Proficiency in programming languages such asPython, Java, or Scala for data processing, transformation, and analysis or anyother programing language that helps in ETL.

vii. Understanding of the process of migrating datafrom one system to another and the tools and techniques used for it.

2. Data modeling: Creating data models thataccurately represent the data and enable efficient querying and analysis.

a. Skills : data modeling skills and technologiesencompass a variety of skills and technologies, including conceptual, logical,and physical modeling, data modeling tools, database management systems, NoSQLdatabases, data warehousing, OLAP cubes, UML modeling, and data modelingmethodologies.

b. Details :

i. Proficiency in using data modeling tools such asERwin, Visio, or Enterprise Architect for creating, editing, and maintainingdata models. [More than technology, concept is vey important]

ii. Familiarity with database management systemssuch as SQL Server,

3. Data integration: Integrating data from varioussources and ensuring data quality.

a. Skills : data integration skills andtechnologies encompass a variety of skills and technologies, including ETLtools, APIs, data warehousing, data formats and protocols, data migration, dataintegration middleware, MDM, cloud integration, data virtualization, and datagovernance.

b. Details :

i. Proficiency in using ETL tools such as ApacheNifi, Talend, or Informatica to extract data from various sources, transform itinto the desired format, and load it into a target system. API.[Any 1 or 2 isok, but should be strong ]

ii. DataWarehousing: Familiarity with data warehousing concepts and architectures,including star and snowflake schema.

iii. Familiarity with cloud integration platformssuch as AWS Glue, Azure Data Factory or Google Cloud Dataflow, which enabledata integration across cloud and on-premise systems

4. Data pipeline development: Developing andmaintaining ETL (extract, transform, load) pipelines to move data from varioussources to the data warehouse.

a. Skills : data pipeline development skills andtechnologies encompass a variety of skills and technologies, includingprogramming languages, dataflow frameworks, workflow management tools, cloudcomputing, data streaming, containerization, data quality and governance, datatransformation, data storage, and monitoring and alerting.

b. Details :

i. Knowledge of dataflow frameworks such as ApacheBeam or Apache Spark that allow for parallel processing of data pipelines. [any other data flow technology is also ok ]

ii. Familiarity with workflow management toolssuch as Apache Airflow or Azkaban that enable the scheduling and monitoring ofdata pipeline jobs. [ any other Scheduling tool is also ok ]

iii. Understanding of cloud computing platformssuch as AWS, Azure, or Google Cloud, which provide scalable and cost-effectiveinfrastructure for running data pipelines.

iv. Knowledgeof data streaming technologies such as Apache Kafka or AWS Kinesis that enablereal-time data processing and analysis. [ Nice to have ]

v. Familiarity with data quality and governancepractices that ensure the accuracy, completeness, and consistency of data inthe pipeline.

vi. Proficiency in data transformation techniquessuch as data cleansing, aggregation, or normalization.

vii. Understanding of data storage technologies suchas Hadoop Distributed File System (HDFS), Amazon S3, or Azure Blob Storage thatare commonly used as target destinations for data pipelines.

5. Data transformation: Transforming data into therequired format for analysis and reporting.

a. Skills : data transformation skills and technologies encompass a variety ofskills and technologies, including data wrangling, SQL, data visualization,statistical analysis, machine learning, data mining, data quality assessment,data integration, data transformation tools, and big data technologies.

b. Details :

i. Proficiency in data wrangling techniques such asdata cleaning, data shaping, and data formatting, to prepare data for analysis.

ii. SQL:Proficiency in SQL (Structured Query Language), a programming language used tomanage and manipulate relational databases.

iii. Understanding of data visualization tools andtechniques that enable the presentation of data in a visual and intuitivemanner.

iv. Basic Knowledge of statistical analysistechniques such as regression analysis, hypothesis testing, and dataclustering.

v. Familiarity with machine learning algorithmsand techniques that enable automated data transformation and analysis.

vi. Understanding of data mining techniques such asassociation rule mining and anomaly detection, to extract useful patterns andinsights from data.

6. Data storage: Managing and maintaining databasesand data warehouses, including monitoring, tuning, and troubleshooting.

a. Skills : data storage skills and technologies encompass a variety of skills andtechnologies, including relational databases, NoSQL databases, datawarehousing, cloud storage, distributed file systems, object storage, dataarchival, data backup and recovery, data replication, and data storagemonitoring.

b. Details :

i. Relational Databases: Proficiency in relationaldatabase management systems such as MSSQL, MySQL, Oracle, or PostgreSQL, whichenable the creation, manipulation, and querying of structured data.

ii. NoSQL Databases: Understanding of NoSQL (Notonly SQL) databases such as MongoDB, Cassandra, or DynamoDB, which enable thestorage and retrieval of unstructured or semi-structured data.

iii. Knowledgeof cloud storage platforms such as Amazon S3, Google Cloud Storage, or AzureBlob Storage, which provide scalable and cost-effective storage solutions fordata

iv. Understanding of distributed file systems suchas Hadoop Distributed File System (HDFS) or GlusterFS, which enable the storageand processing of large amounts of data across multiple servers.

v. Proficiency in data archival techniques suchas data compression, data deduplication, and data encryption, to optimizestorage capacity and ensure data security.

vi. Knowledgeof data backup and recovery techniques such as point-in-time recovery,continuous data protection, and disaster recovery planning, to ensure dataavailability and prevent data loss.

7. Data security: Ensuring the security and privacyof data throughout the data lifecycle.

a. Details : data security skills and technologiesencompass a variety of skills and technologies, including encryption, accesscontrol, authentication, authorization, data governance.

8. Collaboration: Working closely with datascientists, analysts, and other stakeholders to understand their data needs andprovide solutions that meet those needs.

9. Documentation: Documenting data engineeringprocesses and procedures to ensure that they are repeatable and scalable.

10. Continuous improvement: Staying up-to-date withthe latest data engineering technologies and methodologies, and continuallyimproving processes and infrastructure to optimize performance and efficiency.

More Info

Industry:Other

Function:Data Engineering

Job Type:Permanent Job

Skills Required

Login to check your skill match score

Login

Date Posted: 23/10/2024

Job ID: 97590835

Report Job

About Company

Hi , want to stand out? Get your resume crafted by experts.

Similar Jobs

Data Engineering Lead Data and Analytics

M C Saatchi UKCompany Name Confidential

Lead Data Engineer

KornferryCompany Name Confidential
Last Updated: 23-10-2024 02:30:36 PM