PlusOne

Telling the ASF's Stories

Tencent Data Lake Solution based on Open Source Projects Junping Du Tencent Jerry(Saisai) Shao Tencent

September 13, 2019
timothyarthur

Data Lake is a system of data stored, processed and managed in its natural/raw format, user could store data as-is, without having to first structure the data, and run different types of analytics. With the increasing adoption of big data – from dashboards and visualizations to big data processing, real-time analytics, and machine learning to guide better decisions, data warehouse itself cannot satisfy the changing of workload patterns, Data Lake, as the generalization of data warehouse, becomes a buzzword in recent years.

Public cloud providers announced their own Data Lake services one after another, trying to mitigate the gaps in traditional data warehouse solution. Tencent, one of top 3 ISP in China, also has a such requirement. Unlike other companies who built it with their in-house solutions. Tencent fully embraces the open source community and uses open source softwares to build a production ready Data Lake solution.

In this talk, we will introduce you what is Data Lake, and how to build a Data Lake solution. Especially, we will introduce you the Tencent solution which uses open source technologies to build a large scale, production ready Data Lake solution. Last but not least, we will also show you own contributions back to the community. Through this talk, audience will get a full understanding of Data Lake and Tencent way to build this solution.

Data Movement & Integration at PayPal & LinkedIn using Apache Gobblin Jay Sen Sudarshan Vasudevan

September 13, 2019
timothyarthur

Data replication at PayPal drives various different business use-cases from fraud detection, user behavioral analysis, credit checks to lot of other offline business decisions. During this talk, we will present how Apache Gobblin empowers data movement and integrations at PayPal in partnership with LinkedIn to showcase all the recent features as well as the planned roadmap for the platform. Apache Gobblin is a distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems. In the second half of this presentation, we will present recent additions to Gobblin including: 1. A new declarative approach for defining data pipelines using Gobblin-as-a-Service, and 2. Real world experiences running hybrid batch and streaming pipelines using Gobblin.

Serverless Integration with Apache Camel and Knative Michael Costello

September 13, 2019
timothyarthur

Abstract: When we say cloud native what is it that we really mean, and how does this really adapt to our enterprise as we transition away from our now arcane ESB and SOA patterns and further along our integration journey to what everyone is calling cloud native. In this talk we’ll discuss the promises of cloud native architecture, why it makes sense to distribute our integration logic in this fashion using Apache Camel in a serverless way and why Apache Camel is uniquely suited for this, and how to leverage an underlying Knative event bus to add state back into our serverless integration deployments which use Apache Camel. This talk will examine both Apache Camel and why integration is so important in our new cloud native MicroService world, and add a host of cloud native capabilities using Knative to bring us to our next iteration of distributed integration – cloud native integration.

Configuring Apache Camel for the Cloud Bob Paulin

September 13, 2019
timothyarthur

Cloud … check. Container … check. Orchestration … check. Now that you’ve got the basics it’s time to starting thinking about how all occupants of your new cloud infrastructure are going to communicate. And no mater what cloud, container or orchestration tool you choose, Apache Camel has what you need to get your system configured and connected. This talk will cover Camel components for cloud friendly configuration, communication, packaging, and deployment. The approaches presented will take a “use what is already there” philosophy. So whatever you’re working with from Do It Yourself Virtual Machines with containers like Apache Karaf or a fully managed Kubernetes clusters with fat jars, integrating vertically or horizontally, Apache Camel has the pieces to make it all work. This talk will cover how to do it as as well as advantages and drawbacks of each approach.

Agile Integration – Cloud Native Application Development Christina Lin

September 13, 2019
timothyarthur

Cloud native application development is more than just container orchestration, when it comes to designing proper agile software architecture, there are many aspect that need to be taken into account. From simple microservices runtime, orchestration of core business, interacting with legacy, connecting with external SaaS application. To a more reactive system with events driven backbone, and also avoid data silos and how to deal with routing, versioning deployment strategy. Putting everything into a big picture, guide you through what next generation of cloud native architecture should be like and how everything works together.

Conquering Network Distributed Applications Using the Ballerina Programming Language Anjana Fernando

September 13, 2019
timothyarthur

Ballerina is the next generation programming language, which redefined what means to be ‘general purpose’. Historically, programming languages concentrated on single machine execution, in a controlled environment, and also for good reason, because any other external interactions were out of the scope of a programming language, at least in the bygone days that is. But now, the communication network is something that is always there, and software often doesn’t work alone. But rather, they work more and more by communicating with each other to do something meaningful. So hiding this network from our code is no longer an option, and we try to use various frameworks and libraries get the functionality we need. Ballerina is a general purpose programming language, which has built its core concepts and functionalities to support creation of networked application. These features include the built-in support for services/resources, transactions, and resilient communication support, and also combines with a type system which further enhances these operations. In this session, we will go through Ballerina, in understanding the motivation behind this language, the features it introduces, and why it is a critical and a timely addition to the tools and technologies we would need now.

Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) – Friends, Enemies or Frenemies? Kai Waehner

September 13, 2019
timothyarthur

MQ, ETL and ESB middleware are often used as integration backbone between legacy applications, modern microservices and cloud services. This introduces several challenges and complexities like point-to-point integration or non-scalable architectures. This session discusses how to build a completely event-driven streaming platform leveraging Apache Kafka’s open source messaging, integration and streaming components to leverage distributed processing, fault-tolerance, rolling upgrades and the ability to reprocess events. nLearn the differences between an event-driven streaming platform leveraging Apache Kafka and middleware like MQ, ETL and ESBs – including best practices and anti-patterns, but also how these concepts and tools complement each other in an enterprise architecture.

Apache Camel K: connect your Knative serverless applications with everything else Andrea Tarocchi Nicola Ferraro

September 13, 2019
timothyarthur

When you start developing serverless applications in the real world, sooner rather than later you will need to talk to external (legacy) systems. In this talk you will discover how to leverage Apache Camel K to connect your Knative serverless applications with everything Camel can connect.nApache Camel K allows running Camel routes as serverless applications directly on top of any Kubernetes cluster, leveraging Knative serverless capabilities such as auto-scaling, scaling to zero, event-based communication in order to connect serverless functions and microservice with external systems.

Customer segmentation and personalization in websites/PWAs using Apache Unomi Serge Huber

September 13, 2019
timothyarthur

In this session, you will learn all that’s new with Apache Unomi, the open source Customer Data platform (that graduated this year) based on the Apache Karaf runtime, and all that’s happened since the last ApacheCon. You will discover how to easily integrate it with an existing website or SPA/PWA using its built-in web tracker, how to build customer segments and how to use the API to personalize the experience for your users. You’ll also learn how you can extend it to do almost anything, using either the built-in rules engine or your own plugins. You will also discover the new Docker compatibility and the upcoming GraphQL API. Finally, you’ll learn what’s next and how you can help the project.

Serverless: Multi-tenant Rule Engine Service Powered by Apache Karaf Dmitry Vasilyev Saeid Mirzaei George Ye

September 13, 2019
timothyarthur

The Netflix media pipeline processes thousands of new shows and movies every day so that you can watch them on any device anywhere. We use a forward chaining rule engine to coordinate all of this work in multiple workflows. Hosting these workflows in a reliable, scalable and cost effective manner is a huge challenge at our scale. In this talk, we will introduce the design of Netflix’s next generation rule engine framework. The goal is to boost modularity, increase developer productivity and decrease operational overhead. The new system is a platform as a service that lets workflow developers focus on workflow data model, execution conditions, and remote function invocations without worrying about how to deploy, scale, and monitor it. The system uses the OSGI framework to build separation among workflows and leverages Apache Karaf as the runtime container. Other interesting topics such as workflow bundle management and a novel rule domain specific language will be covered in this talk. Keywords:nRule engine, OSGI, Apache Karaf, Serverless

Powered by WordPress.com.