2021-08-24 Repositories, Paginator, Abstraction
Main takeaways:
Get rid of Paginator.
New topics
Alan: get an experience of best practices in terms of ORM performance
Alan: looking to migrate away from Doctrine ORM - more abstracted layer where we hydrate from other stores
Alan: trying Apache Unomi - low priority according to Sikandar
Alan: trying to move away in an iterative approach
Jan: Using repositories as subscriber dependencies, it can slow down the kernel
Jan: Using repositories as subscriber dependencies, it can slow down the kernel
$doctrine = makeMeADoctrine();
function makeMeADoctrine() {
$eventSubscriber->addListener(new MyListener($dic->get(OtherService1::class)));
$eventSubscriber->addListener(new MyListener($dic->get(OtherService2::class)));
$eventSubscriber->addListener(new MyListener($dic->get(OtherService3::class)));
$eventSubscriber->addListener(new MyListener($dic->get(OtherService4::class)));
$eventSubscriber->addListener(new MyListener($dic->get(OtherService5::class)));
$eventSubscriber->addListener(new MyListener($dic->get(OtherService6::class)));
$eventSubscriber->addListener(new MyListener($dic->get(OtherService8::class)));
$eventSubscriber->addListener(new MyListener($dic->get(OtherService9::class)));
$eventSubscriber->addListener(new MyListener($dic->get(OtherService10::class)));
$eventSubscriber->addListener(new MyListener($dic->get(OtherService11::class)));
}
OtherService*
must be lazy (Lazy Services (Symfony Docs)
Jan: listener fetched at runtime
$lazyEntityManager = new class implements EntityManagerInterface
{
public ?EntityManagerInterface $inner = null;
public function flush() { $this->inner->flush(); }
}
Marco: by upgrading symfony, you get lazy EntityManager
by default, because they need
it to reset the service (background workers).
Alan: mautic/app/bundles/UserBundle/Config/config.php at 340f3440c23fbd48f34fc26b35e45170ebdfcc87 · mautic/mautic
Marco: that already breaks laziness, but we can mark the repository lazy. Make mautic.user.repository.user_token
lazy perhaps.
Marco: if you put laziness in hot paths, it won't lead to anything.
Sikandar: laziness will move initialization time into the runtime. Bootstrap not such a big issue, so we
need to be selective.
Marco: we need more information about a performance profile.
Sikandar: problem is not really at application-side (memory/cpu/latency).
Alan: clearly not a major concern. It may help in background processing.
Marco: are the background processes spawned once per task, or kept alive?
Alan: goes back to multi-tenancy.
Marco: maybe we can reboot individual services (EntityManager
), worked fine for some integration test suite
in the past.
Marco: Built-in Symfony Service Tags (Symfony Docs)
Marco, Sikandar: only about stateful services
Sikandar: are connections pooled?
Marco: no, and resetting services would probably also reset a connection pool, if we had one
Alan: we don't have connection pooling
Marco: XDebug profiler output (cachegrind.*.out
file)
Marco: problem probably not here
Alan: trouble with the paginator
Marco: paginator - as soon as you have issues, move away from it
Marco: tells you "how much", "give me a page"
Marco: explaining pagination abstraction - it's high level, work with every page
Marco: move to split methods if you can, write custom SQL/DQL if you have performance problems
Alan: explaining that InnoDB
is slow at counting
Marco: pagination works like this
SELECT a, b
FROM MyUsers a
JOIN a.posts b
Broken query: assume 2 user with 1000 posts each.
The following query will give you 1 user with 100 posts hydrated: wrong result, and wrong in-memory too.
Simpler query does not need paginator:
Jan: problem with large numeric offsets - offset seems to become problematic
Marco: could force it to make a range query by using identifiers (find first identifier after X)
Jan: https://www.eversql.com/faster-pagination-in-mysql-why-order-by-with-limit-and-offset-is-slow/
Jan: asking about a tool/library that implements this
Marco: IMO avoid more tools here, write SQL. Explaining OLTP (OnLine Transaction Processing) vs reporting
Marco: suggesting to do more SQL
Alan: not afraid of writing more SQL
Marco: avoid SQL generators, write SQL by hand, avoid magic to avoid also unpredictable performance
Alan: segmentation is the biggest issue
Schema change -> migration to other stores
Marco: suggesting using different schema for transactional and reporting data.
Sikandar: use a new data store (column storage) for this, but it's in pipeline and won't happen soon.
Alan: that's also the problem - Doctrine kinda forced us to stick to MySQL
Marco: explaining simple example of ES repository:
Sikandar: what about an entity that has a column with JSON?
Marco: doesn't need to be a repository
Alan: we're looking at a way to get a single source of truth (event-sourcing potentially), and it's managed by the API.
Alan: then we have queries to perform, like segmentation, like "who has visited X in the last Y days"
Alan: we could store in unstructured JSON table, and allow searching
Alan: it's possible to index JSON columns now - Indexing JSON column in MySQL 8
Marco: suggesting splitting two different schemas for reading/writing again
Sikandar: we attempted using replication (1:1 schema too)
Marco: referring to CQRS, avoid it until really necessary
Marco: start with query objects
Queries can then be made swappable (domain has definition, infrastructure has implementation):
Alan: so suggestion is to move from repositories to more granular queries
Marco: suggesting to use the ORM for storing/modifying information (OLTP), and move to query objects that perhaps
avoid the ORM overall for larger batch tasks
Next week
Perf profile - xdebug output
Managing obj relationships without enforcing FK constraints
ORM generated queries vs Native SQL queries performance .. will it make any difference
add link to Zoom call directly to calendar entry