Skip to content

Data Serialization Formats

Data serialization is the process of converting data structures or object states into a format that can be stored (e.g., in a file or memory buffer) or transmitted (e.g., across a network connection link) and reconstructed later.^[600-developer__big-data__java-serializable.md]

In the context of network communication and distributed systems, serialization formats are often categorized by their compatibility across different programming languages and environments.^[600-developer__big-data__java-serializable.md]

Java Serialization

Java serialization is a language-specific mechanism primarily used within the Java ecosystem for Remote Method Invocation (RMI).^[600-developer__big-data__java-serializable.md]

RPC and Cross-Language Formats

For Remote Procedure Calls (RPC) and systems requiring interoperability between different languages, several standard formats and frameworks are used.^[600-developer__big-data__java-serializable.md]

  • XML: Commonly used in traditional web services.^[600-developer__big-data__java-serializable.md]
  • JSON: The dominant format for modern RESTful APIs.^[600-developer__big-data__java-serializable.md]
  • Thrift: A framework developed by Facebook.^[600-developer__big-data__java-serializable.md]
  • Protocol Buffers (Protobuf): A binary serialization format developed by Google.^[600-developer__big-data__java-serializable.md]
  • Avro: A serialization system developed within the Hadoop ecosystem.^[600-developer__big-data__java-serializable.md]
  • gRPC: A high-performance RPC framework.^[600-developer__big-data__java-serializable.md]

Thrift Protocols and Transports

Thrift provides a stack of modular components to define data types and service interfaces^[600-developer__big-data__java-serializable.md]. It supports various protocols and transport layers:

Protocols

Protocols determine how data is encoded for transmission^[600-developer__big-data__java-serializable.md]: * TBinaryProtocol: Encodes binary data. * TCompactProtocol: A more compact binary format. * TJsonProtocol: Uses JSON encoding. * TSimpleJsonProtocol: A simplified JSON format (read-only, produces output without meta-data). * TDebugProtocol: Uses a human-readable text format for debugging.

Transports

Transports determine how data is moved between nodes^[600-developer__big-data__java-serializable.md]: * TSocket: Uses standard blocking socket I/O. * TFramedTransport: Sends data in framed chunks. * TFileTransport: Writes to a file. * TMemoryTransport: Uses memory for I/O (typically via a simple byte array).

Server Models

Thrift supports different server models to handle requests^[600-developer__big-data__java-serializable.md]: * TSimpleServer: A simple single-threaded server. * TThreadPoolServer: Uses a thread pool to handle requests. * TNonblockingServer: A non-blocking server. * THsHaServer: Half-sync/Half-async server.

  • [[API]]
  • [[Big Data]]
  • [[JSON]]
  • [[XML]]

Sources

  • 600-developer__big-data__java-serializable.md