How I use Pydantic unrequired fields without defaults
Pydantic‘s declarative style is simple and magic. Here’s how I use unrequired fields to avoid their defaults cluttering the Json Schema.
I use Pydantic as a staple in most of my recent Python projects. It is clean, makes sense and works very well, especially when my code uses Python’s typing support. It even creates Json Schemas for me automagically.
If that were the sum of it then this article would end here. I’ve found, however, that as my models get more complex, as I add nested models and as I reach for automatic OpenAPI schemas with FastApi, I often need to clean things up for two reasons:
- The Json Schema (and consequently OpenAPI schema) tends to automatically include all the defaults you specify for fields that are not required. This makes it confusing and hard to use for clients.
- I want my application code to leverage the power of linting with the typing system. I can’t do that easily if my fields can all be None as a default.
This is my journey deeper into Pydantic’s flexibility and how I define my models to keep my API structures consistent, their docs cleaner and my code more type friendly.
Why is this a problem for me?
When my fields are not required then I like non-null defaults so that I don’t have to sift through a mountain of if field is not None: boilerplate. This is my preference and it works 100% of the time.
Yes, there are cases where my fields should be allowed to be null but then that is an explicit decision for me. There is a difference between allowing null to indicate a missing field and allowing null to indicate a valid value. This is not the point.
The point is, when I generate the (OpenAPI or Json) schema docs, the last thing I want is a mountain of pointless default data structures in the swagger-like tools. If a field is intended to be not-required then I don’t want API clients to worry about what empty defaults I’m using behind the scenes. I also don’t want my serialized data to be cluttered with useless data structures.
More specifically, what if I’m using a nested schema that is not required but might not be able to have a default value? Or it can have a default (nested) value but it would clutter and confuse the documentation?
Sensible but Simple Definitions
When I started using Pydantic for building models and schemas I fell in love with the sensible declarative way of defining field requirements. If I want a field to be required, I don’t give it a default.
from pydantic import BaseModel
class Animal(BaseModel):
num_legs: int = 4 # Not required because it has a default
name: str # required because no default
Animal.model_json_schema()
{'properties': {'num_legs': {'default': 4,
'title': 'Num Legs',
'type': 'integer'},
'name': {'title': 'Name', 'type': 'string'}},
'required': ['name'],
'title': 'Animal',
'type': 'object'}
I love it. It’s all there. num_legs has a default of 4 and name is required. There is no real point in leaving out the default value of 4 for num_legs because it is quite helpful info for the client. Anyway, it’s all there printed in the Json Schema.
A Nested Example
I don’t often deeply nest my Json objects but quite often they have one, maybe two layers. It makes me happy.
So when a nested model can be unrequired I have to set a default value. Makes sense. But what if I can’t? Or what if I don’t want to because I’m worried about a bunch of pointless validators and serializers running on default data?
Let’s look.
Just allow null for nested models with required fields
class Address(BaseModel):
street: str
city: str
class Person(BaseModel):
address: Address | None = None
As mentioned, I don’t like using this option and I’m sure I’m not alone. Why? Because using null to indicate missing data can conflict with allowing null as a valid value. It gets messy for me.
But it does work. Of course it works and the schema tells you so:
...
'address': {
'anyOf': [{'$ref': '#/$defs/Address'}, {'type': 'null'}],
...
So if I don’t want someone to have to provide Address (because they can’t) but also don’t want them to provide null then this option works because null is the default.
p = Person() # look Ma, no hands
p.model_dump()
>>> {'address_1': None}
p.model_dump(exclude_unset=True)
>>> {}
My dumped model will show this, so my application Python code will need to support types that could be None so that the linters don’t get heartburn. Why? Because what if the client actually uses null because the Json Schema said they could.
p = Person.model_validate_json('{"address":null}')
# Or p = Person(address=None)
p.model_dump()
>>> {'address_1': None}
p.model_dump(exclude_unset=True)
>>> {'address_1': None}
There it is. My application code is going to have to check for a lot of Nones. Worst thing in the world? No. Do I have an urge to wash my hands? Yes.
Just allow null but don’t tell anyone. Shhhhh.
Hmmm, intruiging. I then wondered if I could hide this “allow null” behind the scenes so that the client just has to omit the field.
from pydantic import BaseModel
from pydantic.json_schema import SkipJsonSchema # Looky here
class Address(BaseModel):
street: str
city: str
class Person(BaseModel):
name: str
address: Address | SkipJsonSchema[None] = None
The Json Schema now doesn’t tell the client anything about allowing null. This implies that if the field is not required then simply omit it.
...
address': {
'allOf': [
{'$ref': '#/$defs/Address'}],
'default': None,
'title': 'Address'}},
...
Erm, but it does say that there’s a default value of null. So the model doesn’t support null but will supply null as a default? This didn’t work for me and it wouldn’t work for clients.
OK, let’s remove it from the schema then.
from pydantic import BaseModel, Field
from pydantic.json_schema import SkipJsonSchema
def pop_default_from_schema(s):
s.pop('default', None)
class Address(BaseModel):
street: str
city: str
class Person(BaseModel):
address: Address | SkipJsonSchema[None] = Field(default=None, json_schema_extra=pop_default_from_schema)
The json_schema_extra allows us to supply a callable which simply pops any ‘default’ reference in the schema dict for that field. Nice. The schema now looks clean.
'address': {'allOf': [{'$ref': '#/$defs/Address'}], 'title': 'Address'}
Turns out, I can still use null it’s just that the Schema tells you not to.
p = Person.model_validate_json('{"address":null}')
p.model_dump()
>>> {'address_1': None}
p.model_dump(exclude_unset=True)
>>> {'address_1': None}
Yip. Still allows null in the API even though the Schema says don’t. My application code is going to need that boilerplate if check.
Don’t allow null but make it hard and ugly
To satisfy my thirst for eliminating nulls in my API for missing data, I like to make it explicit. Problem is, then I need a default but I can’t because my nested model doesn’t really have one.
class Address(BaseModel):
street: str
city: str
class Person(BaseModel):
address: Address = Address(street="", city="")
Now before we all scream about the wasted CPU cycles creating a default class that isn’t going to be used… let’s just look at the mechanism here first.
...
'address': {
'allOf': [{'$ref': '#/$defs/Address'}],
'default': {'street': '', 'city': ''}
}
...
This is a bit like before, but the Json Schema is horrible! That default, yuk. I really don’t want that cluttering OpenAPI schemas.
It is possible that there are cases that Address could have fields with defaults which would simplify the creation of a default Address nested model but really, it’s a zero sum game.
I’m looking at use cases like JSON:API where, if a nested model is included then it explicitly needs data supplied. That and cases where nested model simply can’t have defaults.
Don’t allow null but make it pretty and neat
So we reach for our old friends
class Address(BaseModel):
street: str
city: str
class Person(BaseModel):
address: Address = Field(default=Address(street="", city=""),
json_schema_extra=pop_default_from_schema)
The schema looks nice. No mention of null or defaults or even required. Basically, the schema says “give me an Address structure or don’t even try”.
...
'address': {'allOf': [{'$ref': '#/$defs/Address'}]}
...
The dumped model isn’t too bad either, sort of.
p = Person.model_validate_json('{}')
p.model_dump()
>>> {'address': {'street': '', 'city': ''}}
p.model_dump(exclude_unset=True)
>>> {}
I would need to exclude_unset to make sure I don’t have a bunch of useless data in my serialized dict. This is not too bad because that is what I designed my schema to do… don’t give me data and I won’t have any data.
But then instead of having biolerplate looking for if field is None: I’m just going to have to have boilerplate that asks if data.get(‘address’) etc. Zero sum game? Maybe but at least the client isn’t bamboozled by the Schema.
For me, it boils down to a question of which is worse:
- wasting CPU cycles on creating defaults that will help me down the line in my application code
- saving CPU cycles but spending them later on boilerplate if field is None checks
When I say it out loud I’m not always 100% sure but at least now I have options!
Erm, what about the validators on the default?
There is also the question of what about validators that run off the nested model? This is a good question. Pydantic, by default, won’t run validators on default fields but when we create Address as the default, does its validators get run?
Let’s test:
class Address(BaseModel):
street: str = Field(..., min_length=3)
class Person(BaseModel):
address: Address = Address(street="")
p = Person()
>>> Beep. Boop. Crash. Burn. Computer say no.
Yeah, about those validators. Need to be careful.
What should you do?
“It depends on your use case”
You knew I was going to say that didn’t you? But it is true, there are valid reasons to use nulls and possibly hide them in the schema. There are also valid reasons to create nested structures as defaults and possibly hide them.
For me, one of the worst things to do is mix the behaviour. This inconsistency makes life very confusing for clients. It’s not so bad for your own application code behind the serialized data because linters and other fancy tools are very good at helping you not nail your tongue to the table.
My journey at least helped me understand what was happening with the model and schema as I defined more fields under weird conditions. When you use things like JSON:API you need to embrace the weird so that’s also OK.
I know I can use nulls or empty fields, even if they are nested and now I know that I can hide them in the Json Schema as well (and ultimately the OpenAPI schema too) and that was my problem I needed to understand.
Hopefully this helps you.
Hopefully you have an even better idea and share it in the comments.