why REST? The World Wide Web is arguably the world’s largest distributed application. Understanding the key architectural principles underlying the Web can help explain its technical success and may lead to improvements in other distributed applications, particularly those that are amenable to the same or similar methods of interaction. REST contributes both the rationale behind the modern Web’s software architecture and a significant lesson in how software engineering principles can be systematically applied in the design and evaluation of a real software system. —- Fielding’s REST dissertati
Read more »

great free learning website with quiz great online editor & debugger Python学习资料/文章/指南整理 Zen of Python By typing import this, you can see the zen of python. Some need more explanation: Explicit is better than implicit: It is best to spell out exactly what your code is doing. This is why adding a numeric string to an integer requires explicit conversion, rather than having it happen behind the scenes, as it does in other languages. Flat is better than nested: Heavily nested structures (lists of lists, of lists, and on and on…) should be avoided. Errors should never pass silently: In general,
Read more »

k8s is a platform to manage containerized workloads and services. Concepts kubernetes Objects 1. Kubernetes abstract a desired state of cluster as objects. 2. an object configuration includes: 1. spec: describe the desired state 1. apiVersion: the api version of kubernetes 2. metadata: the name & namespace 3. spec: the desired state definition. 2. status: describe the actual state of the object 3. cluster state (understanding kubernetes objects): 1. what containerized applications are running and where they’re running 2. how many reso
Read more »

Huawei Obs is an object storage service on cloud. Concepts Object 1. The real complete file or byte stream to save 2. object name is the unique id in a bucket 1. it’s used as part of url path. The naming restrictions are fit to url path naming restrictions. 3. Access(based on version in fact) 1. Object ACL: 1. general control to object: read object, read/write object ACL, only users in the same account 2. Object policy 1. fine-grained control to object: fine-grained actions(put,delete…) on object, all users 4. multi-versions 1.
Read more »

What’s beam beam is a open-source, unified model for defining both batched & streaming data-parallel processing pipelines. * open-source (apache v2 license) * to define data-parallel processing pipelines * an unified model to define pipelines. The real processing is run by the underlying runner (eg. spark, apache apex, etc.). all available runners * can process both batched (bounded datasets) & streaming (unbounded datasets) datasets Use it See the wordcount examples, wordcount src Now we define a simple pipeline and run it. Transform, Count are all built-in atom operations to define t
Read more »

chmod, chown understanding linux file permissions File permissions are defined by permission group and permission type 1. permission group * owner(u) * group(g) * all other users(a) 2. permission type * read (r - 4) * write(w - 2) * execute(x - 1) permission presentation The permission in the command line is displayed as _rwxrwxrwx 1 owner:group * the first character (underscore _ here) is the special permission flag that can vary. * the following three groups of rwx represent permission of owner, group and all other users respectively. If the ow
Read more »

lombok is a library to help your write java cleaner and more efficiently. It’s plugged into the editor and build tool, which works at compile time. Essentially, it modifies the byte-codes by operating AST (abstract semantic tree) at compile time, which is allowed by javac. This is, in fact, a way to modify java grammar. Usage To use it, 1. install lombok plugin in intellij 2. add package dependency in project (to use its annotations) 1 2 3 4 5 6 org.projectlombok lombok 1.16.18 provided<
Read more »

Concept spark is a fast and general-purpose cluster computing system like Hadoop Map-reduce. It runs on the clusters. Spark Ecosystem The components of Apache Spark Ecosystem * spark core: cluster computing system. Provide API to write computing functions. * Spark SQL. SQL for data processing, like hive? * MLlib for machine learning. * GraphX for graph processing * Spark Streaming. Core concepts??? * RDDs (Resilient Distributed Datasets): RDDs are the fundamental data structure in Spark. They are immutable and can be split into multiple partitions that can be processed in parallel.
Read more »

yarn architecture Yarn is used to manage/allocate cluster resource & schedule/moniter jobs. These parts – resource manager – are split up from hadoop framework. Yarn has two main components: * Schedular: manage resources (cpu, memory, network, disk, etc.) and allocate it the applications. * node manager will tell Schedular the node resource info (node status) * application master will ask Schedular for resources. * When partitioning resources among various queues, applications, Schedular supports pluggable policies. For example: * CapacityScheduler allocate resources by tenant req
Read more »
0%