title: "8. Making It Your Own" order: 8
Through Chapter 7, we've built a translation desktop app and studied how production-quality code differs. In this chapter, let's go over the key points for turning this app into something entirely your own.
The translation app was just a vehicle. Replace llama.cpp with your own library, and the same architecture works for any application.
First, replace the llama.cpp-related FetchContent entries in CMakeLists.txt with your own library.
# Remove: llama.cpp and cpp-llamalib FetchContent
# Add: your own library
FetchContent_Declare(my_lib
GIT_REPOSITORY https://github.com/yourname/my-lib
GIT_TAG main
)
FetchContent_MakeAvailable(my_lib)
target_link_libraries(my-app PRIVATE
httplib::httplib
nlohmann_json::nlohmann_json
my_lib # Your library instead of cpp-llamalib
# ...
)
If your library doesn't support CMake, you can place the header and source files directly in src/ and add them to add_executable. Keep cpp-httplib, nlohmann/json, and webview as they are.
Change the translation API's endpoints and parameters to match your task.
| Translation app | Your app (e.g., image processing) |
|---|---|
POST /translate |
POST /process |
{"text": "...", "target_lang": "ja"} |
{"image": "base64...", "filter": "blur"} |
POST /translate/stream |
POST /process/stream |
GET /models |
GET /filters or GET /presets |
Then update each handler's implementation. For example, just replace the llm.chat() calls with your own library's API.
// Before: LLM translation
auto translation = llm.chat(prompt);
res.set_content(json{{"translation", translation}}.dump(), "application/json");
// After: e.g., an image processing library
auto result = my_lib::process(input_image, options);
res.set_content(json{{"result", result}}.dump(), "application/json");
The same goes for SSE streaming. If your library has a function that reports progress via a callback, you can use the exact same pattern from Chapter 3 to send incremental responses. SSE isn't limited to LLMs — it's useful for any time-consuming task: image processing progress, data conversion steps, long-running computations.
In this book, we load the LLM model at the top of main() and keep it in a variable. This is intentional. Loading the model on every request would take several seconds, so we load it once at startup and reuse it. If your library has expensive initialization (loading large data files, acquiring GPU resources, etc.), the same approach works well.
cpp-httplib processes requests concurrently using a thread pool. In Chapter 4 we protected the llm object with a std::mutex to prevent crashes during model switching. The same pattern applies when integrating your own library. If your library isn't thread-safe or you need to swap objects at runtime, protect access with a std::mutex.
Edit the three files in public/.
index.html — Change the input form layout. Swap <textarea> for <input type="file">, add parameter fields, etc.style.css — Adjust the layout and colors. Keep the two-column design or switch to a single columnscript.js — Update the fetch() target URLs, request bodies, and how responses are displayedEven without changing any server code, just swapping the HTML makes the app look completely different. Since these are static files, you can iterate quickly — just reload the browser without restarting the server.
This book used plain HTML, CSS, and JavaScript, but combining them with a frontend framework like Vue or React, or a CSS framework, would let you build an even more polished app.
Check the licenses of the libraries you're using. cpp-httplib (MIT), nlohmann/json (MIT), and webview (MIT) all allow commercial use. Don't forget to check the license of your own library and its dependencies too.
The download mechanism we built in Chapter 4 isn't limited to LLM models. If your app needs large data files, the same pattern lets you auto-download them on first launch, keeping the binary small while sparing users the manual setup.
If the data is small, you can embed it directly into the binary with cpp-embedlib.
webview supports macOS, Linux, and Windows. When building for each platform:
libwebkit2gtk-4.1-devConsider setting up cross-platform builds in CI (e.g., GitHub Actions) too.
Thank you so much for reading to the end. 🙏
This book started with /health returning {"status":"ok"} in Chapter 1. From there we built a REST API, added SSE streaming, downloaded models from Hugging Face, created a browser-based Web UI, and packaged it all into a single-binary desktop app. In Chapter 7 we read through llama-server's code and learned how production-quality servers differ in their design. It's been quite a journey, and I'm truly grateful you stuck with it all the way through.
Looking back, we used several key cpp-httplib features hands-on:
set_chunked_content_provider, static file serving with set_mount_pointbind_to_any_port + listen_after_bind for background threadingcpp-httplib offers many more features beyond what we covered here, including multipart file uploads, authentication, timeout control, compression, and range requests. See A Tour of cpp-httplib for details.
These patterns aren't limited to a translation app. If you want to add a web API to your C++ library, give it a browser UI, or ship it as an easy-to-distribute desktop app — I hope this book serves as a useful reference.
Take your own library, build your own app, and have fun with it. Happy hacking! 🚀