Distributed Systems
From-scratch Java RMI framework · Docker multi-service deployment · PySpark large-scale data analysis
A collection of distributed systems projects completed for CS328 at SUSTech, spanning three interconnected components: a custom Java RMI-style framework built without using java.rmi, a Docker-based multi-service deployment, and a PySpark pipeline analyzing ~970,000 real-world parking records.
Highlights
- Implemented a Java RMI-style framework from scratch — service registry, dynamic proxy stubs (
java.lang.reflect.Proxy), skeleton threading, and serializedInvocationMsg/ReturnMsgover sockets — without usingjava.rmi - Demonstrated the framework with a concrete
MatrixCalculatorremote service, with server and client communicating through the custom registry and stub layer - Containerized the full system with Docker multi-stage builds and Docker Compose; three isolated services (registry, server, client) discover each other via environment variables
- Processed ~970,000 real-world parking records with PySpark, producing five analytical outputs including time-windowed utilization rates and Spark DAG visualizations
- Applied CheckStyle across the Java codebase (~700 lines) to enforce code quality standards
MyRMI — Remote Method Invocation from Scratch
The centerpiece of the work is a Java RMI-style remote invocation framework implemented without the built-in java.rmi library. Every layer of the RMI stack was constructed by hand:
- Registry —
RegistryImplhandles objectbindandlookup;LocateRegistryprovides the client-facing factory; registry calls are themselves proxied over the network viaRegistryStubInvocationHandler - Stubs —
StubInvocationHandlerimplementsjava.lang.reflect.InvocationHandler, intercepting any method call on a remote interface and serializing it into anInvocationMsgfor transmission - Skeletons —
SkeletonReqHandlerthreads receive incoming messages, deserialize them, dispatch the call via reflection, and return the result as aReturnMsg - Messages —
InvocationMsgandReturnMsgcarry method name, argument types, argument values, and return value over a socket connection - Exceptions —
RemoteException,AlreadyBoundException, andNotBoundExceptionmirror the standard RMI exception hierarchy
The test service exposes a MatrixCalculator remote interface with two implementations, allowing multiple services to be registered and resolved by name through the same registry instance.
MyRMI_Docker — Containerized Deployment
The framework was extended into a realistic multi-service deployment using Docker and Docker Compose. Registry, server, and client each run as an isolated container:
- Multi-stage
Dockerfileper service: Maven compiles in the build stage; a minimal OpenJDK image runs at runtime -
docker-compose.yamldefines startup order (registry → server → client) and inter-service dependencies - Service discovery uses environment variables (
REGISTRY_HOST,SERVER_PORT) — no hardcoded addresses
SparkProcessing — Large-Scale Parking Data Analysis
A PySpark pipeline analyzing Shenzhen parking records (~970,000 rows, 1,930 berths, 43 street sections).
Five analytical tasks:
- Berth count per street section
- Unique berth-to-section mappings
- Average parking duration per berth
- Hourly utilization rate with percentage breakdown
- Peak-hour identification across sections
Technical Summary
| Languages | Java 8, Python 3 |
| Infrastructure | Docker, Docker Compose |
| Frameworks | PySpark, Maven |
| Key concepts | RPC, dynamic proxy, serialization, service registry, container orchestration, DAG execution |
| Dataset | ~970,000 rows of real parking data |
| Code quality | CheckStyle enforced across Java codebase |