Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
gRPC-Go Engineering Practices (grpc.io)
140 points by tgma on Jan 25, 2018 | hide | past | favorite | 82 comments


Its a good time to ask, is gRPC any good? I'd love to standardize on stable middleware layer that handles multiple versions of clients and servers well. Rest with json really seems to work great for most things already.

What is the advantage of gRPC - just more efficient?


Versioning your API is one huge benefit and better done using proto/rpc. If you change your Json schema, have fun propagating that change to all clients without fear. Or hope you've built special infra to do that.

Also "just more efficient" is a funny way to characterize the performance difference between just data bytes vs data + structure bytes (read: the gap is large). You gain in transmission and you gain during deserialization / parsing.

Here is an example.

{"My key":"my value"} has n=21 characters. When you parse you must scan the whole 21 characters O(n), every time, just to read the thing.

If you instead store this in fixed size bytes, where you have some fixed # of bytes that tell you "my value starts at address 0x43", then you can skip to just the values you care about. You don't need brackets or quotes. And you can use other nifty tricks to compress the binary representation further for savings on the wire.


> performance difference [..] (read: the gap is large)

Hmm...can you quantify that? I found that while these performance difference you mention exist, they are not actually that large and completely dwarfed by actions later in the chain, particularly if you have generic intermediate representations.


I fully agree, but want to point out that JSON can be compressed.


You're absolutely right. Auth0 shows that when compressed, protobufs don't provide a great benefit vs JSON.

Something I've read from others is that gRPC is not the easiest thing to use directly in SPA. If you have services exposed to a web front end and mobile, double exposing a REST-ish API + gRPC might not be worth it. This is a problem with which I'm currently struggling.

https://auth0.com/blog/beating-json-performance-with-protobu...


Did you not finish the rest of the post? Yes, it's true that compressed json is not materially different in size vs protobufs. But when it comes to the speed of serializing and deserializing protobufs there is a huge gain across languages. So if you're using gRPC in backend environments you'll gain huge speed benefits.


That was already acknowledged earlier in the thread.


I spoke to some of the gRPC folks at KubeCon/CNCon and the word is that browser support is in the works. There are some existing, slightly hackish ways to do it already but I'm confident that they will get it done properly.

It will be huge when that's a reality and we start building all of our APIs around protobuf.


When compressed sizes can still matter as the payload size increases.

REST is the language the rest of the web speaks so if providing a consumer service that isn't latency sensitive you should feel at liberty to give a RESTful API.

If you're building internal services, consumed internally, gRPC is a decent way to go.

If say you're providing a logging service to a number of large customers you might want to provide a gRPC service for the performance characteristics.

gRPC also provides you the option to write all your services internally using gRPC and then you can expose a JSON service without having to write it yourself.


Afraid I'm going to be contrary and old-fashioned and say I prefer JSON.

Its never been problematic adding or extending JSON endpoints and its never been a problem using basic gzip compression on the fly either.

And JSON endpoints are a damn sight easier to debug and wireshark and all the rest.

I've spent a lot of time writing fast JSON serialization for various languages including Java etc; its staggering how inefficient most libraries are. But that's not really JSON's fault.


There are definitely cases for both. JSON is definitely easier to consume as a human (so for debugging).

gRPC was born out of Google's Stubby rpc system[0], which is used heavily for communicating between different jobs. If you are going to stand up a lot of different services that are going to talk, it provides a lot of advantages that you don't get with JSON. For large companies that use multiple program languages, this is really nice, as proto3 and GRPC have code generators for a slew of different languages.

There are a lot of other niceties that are useful in gRPC that you don't get with JSON (like streaming data).

[0] https://grpc.io/blog/principles


Also a protobuf may be stored directly in some of the databases google uses internally, then you have lots of tools around them (diffing with map-reduces done on them, etc.).


Yeah, you can certainly use JSON until you start caring about cost / performance then switch over. I don't use gRPC on hobby projects because writing a flask or express server that receives json payloads is much faster.

But if I were writing the next Uber or Facebook or whatever I'd probably get on gRPC or thrift or another RPC system as soon as we hit a non-trivial number of users.


>Afraid I'm going to be contrary and old-fashioned and say I prefer JSON.

history is completing full circle. Before JSON (HTTP REST) there was RPC. JSON was a new and shiny thing while the RPC was for the old-fashioned. The JSON took over exactly for all the great advantages over RPC that you listed (and a bunch of others) and despite all the advantages of RPC that people in this thread tout today as the gRPC advantages. I suspect that in 10-15 years some young guys at Google will come up with gJSON, and their arguments for it will look like your comment today.


Uh, no. If anything, they are switching more workloads to Flatbuffers. Google does a lot of fleet-wide profiling and the encoding and decoding of protocol buffers, which were meant to be efficient, is NOT lost in the noise. Groups like the toolchain teams have a good idea of how many orders of magnitudes in cores are spent doing just that every second. Something like JSON would require probably tens or hundreds of millions of dollars in additional machines for no gain (slightly educated, but still wild guess).


Thanks, actually Flatbuffers looks very useful.


it's a performance point of view. You will never beat a binary format to transfer data over the wire. Sure it's not as debugable, but if you want speed you don't have much choice. HTTP/2 did the same thing and ditched the textual format of HTTP/1.


Even with a binary format, you have a lot of options. gRPC has plans for binary logging, which Stubby has had for a decade or more. Google's Stubby clients and servers also have built-in HTTP handlers that let you do a lot of interactive debugging, like the /debug/requests interface in golang's net/trace package. Unexpectedly, I found that a much better troubleshooting experience than pouring through logs (also because logs don't capture everything).


Not really. If you need to support multiple platforms and you need something secure (no DER or BER or...) then protobuf is pretty much unbeatable. Well-maintained Libraries in every language and good generated code based on a simple format.


How did you get access to sniff the network? What IP are you going to sniff for? How are getting keys so you can MITM the traffic?

Basically none of the "advantages" you describe exist when you're in an environment like Google's, and that kind of environment is the sort of environment one uses gRPC (or Finagle, or others) in. IOW, if you can use Wireshark successfully, you have an argument for using JSON. Just keep in mind many people are not in such an environment.


For the record, gRPC can use JSON. There are examples in the repo on how to do this:

https://github.com/grpc/grpc-java/blob/master/examples/src/m...


> And JSON endpoints are a damn sight easier to debug and wireshark and all the rest.

https://github.com/google/protobuf/issues/3303


there are environment variables for get all the bugging information you want.


I found gRPC to be a bit too heavyweight and complex. I'm pretty excited about twirp[1] right now.

[1]: https://github.com/twitchtv/twirp


What have you find too complex about protobuffs? Unfortunately twirp is go-only.


It's go-only for now. Protobufs includes a code gen step, which is a burden on project tooling and workflow and it's not very fun using the types it generates or writing boilerplate to convert to the types you'd prefer. Also, I'm not sure about twirp, but protobufs doesn't have a good way to model interface or sum types last I checked.


Proto 3 has the "oneof" type. For instance:

message MyThing { oneof sum_type { TypeOne type_one = 1; TypeTwo type_two = 2; } }

The lack of inheritance seems awkward at first, but with oneof it isn't much of a blocker. The APIs for this aren't always great -- Go's in particular feel kind of awkward (IMO). Java's are nice -- it's a separate enum you can switch over.

An example from a Go project of mine:

  switch req.StartAt.(type) {
  	case *pb.GetLogsRequest_Timestamp:
  		t, err := types.TimestampFromProto(req.GetTimestamp())
  		if err != nil {
  			return nil, errors.Errorf("Bad timestamp: %v", req.GetTimestamp())
  		}
  		filter.Timestamp = t
  	case *pb.GetLogsRequest_Offset:
  		filter.StartOffset = req.GetOffset()
  	case *pb.GetLogsRequest_Position_:
  		switch req.GetPosition() {
  		case pb.GetLogsRequest_LATEST:
  			filter.Position = LATEST
  		case pb.GetLogsRequest_EARLIEST:
  			filter.Position = EARLIEST
  		}
  	case nil:
  	default:
  		return nil, errors.Errorf("Unknown GetLogsRequest.StartAt type.")
  	}
  }


Can you elaborate? Which language were you using it in, and for what usecase?


If you're doing microservices it's getting kinda boring to write all your client libraries, gRPC generates your client code, which saves time.


Also streaming that you can't do with regular REST, so think about push notification and the like.

One of the biggest advantage imo is the contract between the client and the server, both are always in sync about what to send / receive.

I've seen many times things break because x,y,x added a field or change a type that the server / client couldn't understand.


> Also streaming that you can't do with regular REST

Nothing about the REST architectural style prohibits a resource (or, rather, a particular resource representation) from being a stream.


What protocol / library do you use for that? afaik there is nothing since this can only works properly on HTTP/2 which REST doesn't really use.


HTTP/1.x will allow you to stream requests from the client. It'll also let you stream chunked responses from the server. But it won't let you do both simultaneously without going more exotic.

Speaking from my own experience in Java, especially testing my gRPC/HTTP bridge[1], streaming is never overly easy with HTTP clients. You're usually just handed a byte stream to handle yourself. Streaming to the server from the client has been another really interesting exercise to get right.

That's part of what's nice about gRPC (aside from the RPC model and protobuf serialization). If you want unary semantics, done. If you want streaming, it's just a keyword away, and it works across all languages.

[1] https://github.com/Xorlev/grpc-jersey


Can't speak for Java but it was super easy in Go. I wrote some software for my home weather station and it streams [1] over gRPC to another little Go app [2] that runs on my laptop, wherever it may be.

My use is not at all complex but I can't imagine that more involved streaming would be any tougher.

[1] https://github.com/chrissnell/gopherwx/blob/master/storage_g...

[2] https://github.com/chrissnell/grpc-weather-bar

Edit: here's a simple example protobuf definition for streaming to a client: https://github.com/chrissnell/gopherwx/blob/master/protobuf/...


Academic discussion: wouldn't the "stateless" part of REST preclude using a stream that necessarily needs to keep track of state?

Practical discussion: the biggest advantage of using REST is that you had a client readily available in pretty much every language. Most of those clients haven't been updated to deal with the HTTP/2 machinery necessary to drive streams the way gRPC does.


> Most of those clients haven't been updated to deal with the HTTP/2 machinery necessary to drive streams the way gRPC does.

HTTP has supported streaming data for years, e.g. Server-Sent Events[0]. The short version is that you GET a URL, and the body keeps on arriving forever. I'm pretty sure that we had this sort of HTTP streaming back in the late 90s …

I'll grant that many JSON-oriented pseudo-REST clients probably don't support it, though.

[0] https://en.wikipedia.org/wiki/Server-sent_events


Right, but server-sent events work only one way: from server to client. gRPC allows you to create bi-directional streams, where both the client and server are pumping data.

Chunked encoding allows to do something like what gRPC does, but it's my understanding (I might be wrong) that the data needs to be base64 encoded, whereas HTTP/2 supports raw byte arrays to be sent.


HTTP/1 responses carry raw bytes, and chunked encoding doesn't affect that (just some packet length prefixing).

If you mean binary data would need to be Base64-encoded with server-sent events, that's true. SSE uses text/event-stream as the content type, which is expected to be text. It would be interesting if someone defined a content type for streaming binary events.


Server sent is not stream, it's a pseudo bad implementation on top of http 1.1.


How is SSE not a stream? HTTP 1.1 does a fine job of sending streaming content, and SSE is a stream-parsable content type.


re: efficiency, I think this blog post does a decent job explaining: https://auth0.com/blog/beating-json-performance-with-protobu.... Basically, it doesn't make a huge difference if you're communicating with a Javascript endpoint, but Java to Java (or probably between other non-js backends) you can save a lot of time on serialization/deserialization. It's worth noting that the post uses fairly large blobs (50k people and addresses), and you probably won't see as big of a difference on smaller requests.


Biggest benefits I've seen so far. Codegen, streaming connections, API versioning.


Json is a serialization format. gRPC is both a serialization format and a DDL (data definition language).

That means that you are storing your schema, which also happens to contain interoperability features.

...and the serialization format is more efficient.


Disclaimer: I work for Google, but not on gRPC.

It's worth mentioning that gRPC is actually format-agnostic. While I can't say it does the best job of this (gRPC+protobuf works best), there's precedent for using any transport in gRPC. gRPC-Java includes examples of JSON serialization and Thrift serialization.

gRPC is just the transport protocol (its self built on HTTP/2). That said, I'd highly recommend protobufs, they're great. Define your schema in an agnostic way such that it can be easily browsed and compiled to multiple languages.


As a note, a disclaimer like this given without explanation could seem to indicate gRPC is a Google product even though apparently it no longer is. I actually had a conversation with the gRPC mailing list a few weeks back about how weirdly this has been communicated on their website as well. The linked blog post continues to refer to 'we' as 'Google'.

(For those curious, IIRC it is now been handed off to a subgroup of the Linux Foundation, called the Cloud Native Computing Foundation. The change does not appear to be well advertised on their website, which contains no reference to it.)


Thanks for the feedback; we updated the footer: https://github.com/grpc/grpc.github.io/pull/622/files

Note that this is an open-source project, and the website itself is on GitHub Pages, so feel free to send us pull requests. Sometimes the reason behind things not changing is not a huge conspiracy, but no one actually having spare time to do it. :)


Perfect. It'll be nice when Google isn't the only 'author' ;) but now I can clearly end up finding the CNCF this way. It just seemed apparent from the parent comment that it was possible even Googlers commenting were unaware gRPC was now under the CNCF! :)

I generally wouldn't file a PR on someone's copyright line or similar legalese, I don't know what the impact would be for any given organization or how they need to format it.


CONTRIBUTING.md in the repository[1] explicitly suggests authors to add their name to the AUTHORS file on their first substantial commit should they like to. A reviewer would work with them on the order and formatting. One thing is for sure: there is a diverse contributor base; you can see the full list in the git history.

[1] https://github.com/grpc/grpc/blob/master/CONTRIBUTING.md


The schema is not stored, rather think of it this way, a .proto file once it generates an interface for specific language, then this actual code understands the schema.

Now on the wire, you simply sent field numbers (e.g. the numbers that you have to manually specify in monotonically increasing way, and never reuse smaller number), and either simple type (int, float) or symbol reference to another.

But you never send the actual schema... Now some specific storage formats, might have a duplicate version of that schema, but this may be just part of their design.

So the nice thing, is that if you follow some simple rules, like: never reuse previous number, be careful when changing types of existing fields (there are only so and so ways it can go), then you can upgrade independently your services, and the data being pushed.

For example if you've added a new field (and new number), then your existing code (that's still compiled with the older schema), might just ignore it, since, even that it's there, there is no endpoint (e.g. method call to read it) for it to be accessed.

Off course, it's not so simple. After all, there are tricks, should I simply pass through fields that I do not understand to other services, or should I filter them? I certainly don't know.

I wish protos are used more and more, and ways to put them into various databases (like mysql, postgres, sqlite, etc.) is done. Then also formats that store such data in more optimal way, by rearranging such that compression can be gained, and later faster retrieval, etc.


It's actually protobuf which is both a serialization format and data definition language. gRPC is more of a service definition language in that sense.


gRPC uses protobuf, they are not synonymous.


Most of the comments here focus on Protobuf but the other half of gRPC is HTTP/2, and HTTP/2 is very complex.

Good luck implementing a half-decent HTTP/2 client or server library if your language doesn't support it already. This is easily a 6 months job.

The other issue with HTTP/2 is that most client libraries require TLS to work. Development environments become harder to setup. It becomes harder to sniff the traffic during debugging. In client/server scenarios where both are on the same machine this also creates unnecessary overhead.


My only regret is that when we started our current project, I did not commit 100% to gRPC, so now we have a mix of services. If I had of gone all in, it would be easier to integrate upcoming things like conduit [1], and I would not have to generate Swagger files but could just ship the proto files.

[1] https://conduit.io/


I believe conduit is going to support HTTP1.1 by the time it's done.

The http2 and grpc focus was just to get the alpha release out there.


It would be an ideal standard if it supported browsers.


Slightly off-topic, but related.. I have read that a common practice for managing proto files (or any schema definitions really) is to put them in a separate repo/package to share. It seems pretty straightforward in my head and provides several advantages. However I still ask about any trade-offs when doing this in practice?


It has similar challenges than a monorepo.

First challenge is, that you have to keep some kind of reference on what proto are you using within your project. What Google (and some others) do, is that they a) put all the proto files in a separate repo [0] and then generate them for each language separately (python [1], ...). This way you can use whichever proto file you need within your project, however you have to load more libraries than you need to. To be honest, it only makes slight difference when deploying, so not too bad.

The second challenge is, that you have to generate the result files every time you make some kind of change. If you have a lot of proto files, then it may take some time to generate them and there are very little tools available to help you. Google open-sourced Artman [2] although it's more focused on APIs than managing shared protos.

The massive advantage is that, because proto files are self-explanatory and if you put enough information in them can function as direct documentation of your API's interface, you don't need to fish out the requirements in the project or in the documentation but rather just directly read the proto file itself. But this does depend on the developers, to make it as consistent as possible, which is not always the case [3].

[0] https://github.com/googleapis/googleapis

[1] https://pypi.python.org/pypi/googleapis-common-protos

[2] https://github.com/googleapis/artman

[3] https://news.ycombinator.com/item?id=16166153


Thanks for the response! These are very good references. I like the breakdown of message and/or service definitions into separate files so it is easy to browse through the repository. I am assuming the version bump (v1 vs. v2) occurs when you introducing breaking changes to the API? I am aware that protobuf does a good job at allowing forwards and backwards compatibility at the message level..


Correct on the versioning. We internally treat protos as an immutable descriptions of our API interfaces. That means whenever we need to change anything (outside of bugs), be it adding a new field, or changing the order, renaming fields, changing types, ... we start a new version.

We also use a lot of inheritance of non-default types (timestamps, errors, ...) so it's important to make sure that we don't break anything for others.


Interesting so adding new fields even.. I suppose there is quite a bit of planning and iterating on the interface before making it public to reduce churn? I understand doing that for backwards incompatible changes.


We add new fields (or redesign existing protos) very sporadically. We also don't have fully automated process to deal with a change in a proto within all projects so by using new versions we can signal to the developers that they should eventually adopt it. We are actively using 5 languages that have to deal with the changes, so we found it easier overall to do it this way.


The mono-repo of protos has been our approach, and has worked very well for us so far.

Only difference is we publish the resulting code for all languages into a single artifact repo, which other projects use as a dependency.


Hw many languages do you use? Do you generate separate folders for each language or what does the structure looks like?


Great to hear, I'm pretty bullish on the framework and have been using it happily in Go for a while


There has been a flurry of gRPC posts on HN recently - it must be the new xml soap rest NoSQL fad!

NoSQL is an interesting parallel - Google publish map reduce and bigtable and amazon publish some influential papers and suddenly everyone is using NoSQL in order to be "web scale". Then it turns out that Google themselves were doing sql web scale and spanner and all that.

There's a risk that gRPC is the same? In chat yesterday ex-googlers said that Google was increasingly moving over to flatbuffer...

Personally I have an aversion for tools with generators. Harks back to the damage CORBA did me I guess... I also have a preference for plaintext eg JSON - so much easier to debug.

Oh well. Guess we're in the fashion business ... ;)


Is this gRPC the same thing as golang net/rpc referred to here: https://news.ycombinator.com/item?id=16170116? I don't think so but I've never used either one.

>seniorsassycat: I don't understand why AWS released Go support instead of binary support and I don't understand why they chose to rely on go's net/rpc [...] which encodes objects using Go's special [gobs] binary format


net/RPC is a rpc implementation in the go standard library, which uses gob for serialization.

gRPC is a protocol and set of libraries for cross-language rpc based on protobuffs. Also doing a lot of codegen for you, like generating clients.


No, this is an RPC and streaming framework built on top of Protobuf and HTTP/2. It's pretty much an open source version of libraries that Google uses pretty much everywhere internally.


No I believe gRPC is a different thing. Don't know what net/rpc is.


Has anyone found a good resource on using gRPC directly in a JS client? I've looked at using gRPC. My current challenge is that I want to support a website/webgateway on one side and mobile gateway on the other. If I use Swift or Java on the mobile side, it's easy. If I use Ionic Framework, I'm in the same spot as with the web gateway; probably better off with HTTP + RPC.


It is possible to use gRPC directly from JavaScript using the gRPC-Web [1] project by Improbable. More accurately, it uses TypeScript, which is generated from the proto file in a similar way to other languages. You still need a proxy to transform requests from HTTP/1.1 to HTTP/2.0.

I've actually only used the gRPC JSON gateway mentioned in the other replies so I'm not sure how it compares, but it looks interesting.

[1] https://github.com/improbable-eng/grpc-web


I wrote https://github.com/zang-cloud/grpc-json to solve this problem. It serves Golang GRPC methods as a JSON API, no configuration required.


Does this work? https://github.com/grpc-ecosystem/grpc-gateway

Seems to generate a REST proxy server side.


I haven't tried it, but that goes against the grain of gRPC. I understand it as an integration point between legacy REST and gRPC. I don't know, and from what I've read, wouldn't use it for greenfield development.

The only issue I have with full HTTPS + JSON is CRIME. I'm not sure how that works.


It does work, very well. We have been using it for over a year without issue. It is not ideal, but it is good to have an option for when clients dont support gRPC.


Does anyone have experience with both gRPC and Thrift? I'd be curious to know how they compare.


Thrift used to support more languages (this has changed). gRPC was more performant (take w/ grain of salt, this is word of mouth) for a while -- unclear if it's changed or if the difference was ever that significant except at very high scale.

I think they're pretty similar and you can't lose either way. Facebook's support of Thrift and Google's of gRPC make both decent options.

One thing I will say about gRPC is that it plays nice with Google's build system (Bazel) and some Google APIs now have first-class gRPC support. If you choose thrift in your stack you'll have to call APIs using JSON or support gRPC anyway if you want to use them for those API calls...so gRPC might be an attractive choice. Furthermore gRPC's go interop is also excellent if you happen to be a fan of golang.


When I looked into Thrift's generated code for Objective-C when we started working on gRPC (so 2014), RPCs were all blocking, which made for a non-idiomatic API. I haven't looked if they've added an asynchronous version since.


I've used Thrift on a project in the past and one feature gRPC has that Thrift doesn't is the ability to use streaming semantics. This would have actually been very useful for the project I used Thrift on. If I were implementing it now I'd definitely use gRPC.


Is it possible to add a middleware which does gRPC <-> JSON?



I found grpc-gateway to be overkill for what I wanted, so I wrote https://github.com/zang-cloud/grpc-json to serve Golang GRPC methods as a JSON API, no configuration required.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: