JSON Schema Master Race - Generating the schema itself

in #programming6 years ago

There exists a pattern of:

  • given a collection of JSON objects
  • produce a JSON schema that is valid for the entire set

Now whats so hard about this?

To produce one that is valid for all of them is very easy, just don't validate anything. Not very useful. The problem grows in complexity when you throw in the constraint of

  • The schema should be maximally constrained to the data, where maximally constrained means that there exists no additional property of json schema that could be added without invalidating one of the objects.

Okay so now its more complicated right? Now we need to make sure that we don't just have a valid for all schema, but we also want to ensure that we have accurately captured all the required fields, upper and lower bounds of numbers, many others. Some of them are downright impossible by conventional computer science knowledge.

Some of these parameters are simple to infer. Type is a simple one. So what are the parameters of json schema that are difficult to infer?

to be continued