To create a schema, you need to first determine your required data model and apply that over a station. You also need to select a data format, which, in turn, will determine the data format (and other characteristics) of the ingested messages. Each format supported by Memphis has its own benefits and drawbacks.
Google Protobuf
Protobuf or Protocol Buffers is a free, open-source cross-data platform used for serializing structured data. It’s used for developing programs that can communicate with one another for storing data or over a network. Protobuf involves an interface description language, which describes the data structure, and a program responsible for generating the source code from the description for parsing or generating bytes that represent the structured data.
Memphis supports proto2 and proto3 with features like versioning, embedded serialization, and producer live evolution. Support for import types and import packages will be introduced soon.
JSON
JSON schema is essentially a vocabulary that lets you annotate and validate JSON documents. It offers machine- and human-readable documentation and data validation that helps with automated testing and ensuring the quality of data submitted by clients.
Apache Avro
Avro is a row-oriented remote procedure call and data serialization framework that uses JSON to define protocols and types and serializes data in a binary format. It’s primarily used in Apache Hadoop to provide a serialization format for persistent data, as well as a wire format for communication between multiple Hadoop notes and from the client program to Hadoop services. Avro uses a schema to structure the encoded data and has two schema languages – Avro IDL for human editing and a machine-readable language based on JSON.
Support for Apache Avro will be introduced in Memphis soon.
GraphQL
GraphQL is an open-source data manipulation and query language for APIs whose runtime fulfills queries with existing data. Support for GraphQL will be introduced in Memphis soon.