2021-08-24 Repositories, Paginator, Abstraction

Main takeaways:

Get rid of Paginator.

New topics

Alan: get an experience of best practices in terms of ORM performance
Alan: looking to migrate away from Doctrine ORM - more abstracted layer where we hydrate from other stores
Alan: trying Apache Unomi - low priority according to Sikandar
Alan: trying to move away in an iterative approach
Jan: Using repositories as subscriber dependencies, it can slow down the kernel

Jan: Using repositories as subscriber dependencies, it can slow down the kernel

1 2 3 4 5 6 7 8 9 10 11 12 13 14 $doctrine = makeMeADoctrine(); function makeMeADoctrine() { $eventSubscriber->addListener(new MyListener($dic->get(OtherService1::class))); $eventSubscriber->addListener(new MyListener($dic->get(OtherService2::class))); $eventSubscriber->addListener(new MyListener($dic->get(OtherService3::class))); $eventSubscriber->addListener(new MyListener($dic->get(OtherService4::class))); $eventSubscriber->addListener(new MyListener($dic->get(OtherService5::class))); $eventSubscriber->addListener(new MyListener($dic->get(OtherService6::class))); $eventSubscriber->addListener(new MyListener($dic->get(OtherService8::class))); $eventSubscriber->addListener(new MyListener($dic->get(OtherService9::class))); $eventSubscriber->addListener(new MyListener($dic->get(OtherService10::class))); $eventSubscriber->addListener(new MyListener($dic->get(OtherService11::class))); }

OtherService* must be lazy (https://symfony.com/doc/current/service_container/lazy_services.html)

Jan: listener fetched at runtime

1 2 3 4 5 $lazyEntityManager = new class implements EntityManagerInterface { public ?EntityManagerInterface $inner = null; public function flush() { $this->inner->flush(); } }

Marco: by upgrading symfony, you get lazy EntityManager by default, because they need
it to reset the service (background workers).
Alan: https://github.com/mautic/mautic/blob/340f3440c23fbd48f34fc26b35e45170ebdfcc87/app/bundles/UserBundle/Config/config.php#L364-L371
Marco: that already breaks laziness, but we can mark the repository lazy. Make mautic.user.repository.user_token
lazy perhaps.
Marco: if you put laziness in hot paths, it won't lead to anything.
Sikandar: laziness will move initialization time into the runtime. Bootstrap not such a big issue, so we
need to be selective.
Marco: we need more information about a performance profile.
Sikandar: problem is not really at application-side (memory/cpu/latency).
Alan: clearly not a major concern. It may help in background processing.
Marco: are the background processes spawned once per task, or kept alive?
Alan: goes back to multi-tenancy.
Marco: maybe we can reboot individual services (EntityManager), worked fine for some integration test suite
in the past.
Marco: https://symfony.com/doc/current/reference/dic_tags.html#kernel-reset
Marco, Sikandar: only about stateful services
Sikandar: are connections pooled?
Marco: no, and resetting services would probably also reset a connection pool, if we had one
Alan: we don't have connection pooling
Marco: XDebug profiler output (cachegrind.*.out file)
Marco: problem probably not here

Alan: trouble with the paginator

Marco: paginator - as soon as you have issues, move away from it
Marco: tells you "how much", "give me a page"
Marco: explaining pagination abstraction - it's high level, work with every page
Marco: move to split methods if you can, write custom SQL/DQL if you have performance problems
Alan: explaining that InnoDB is slow at counting
Marco: pagination works like this

1 2 3 SELECT a, b FROM MyUsers a JOIN a.posts b
1 2 3 SELECT COUNT(DISTINCT a) FROM MyUsers a JOIN a.posts b
1 2 3 SELECT DISTINCT a.id FROM MyUsers a JOIN a.posts b
1 2 3 4 SELECT a, b FROM MyUsers a JOIN a.posts b WHERE a.id IN (:ids)

Broken query: assume 2 user with 1000 posts each.
The following query will give you 1 user with 100 posts hydrated: wrong result, and wrong in-memory too.

1 2 3 4 SELECT a, b FROM MyUsers a JOIN a.posts b LIMIT 100

Simpler query does not need paginator:

1 2 3 SELECT a, p FROM MyUsers a LEFT JOIN a.profile p # this is a *-to-one association

Jan: problem with large numeric offsets - offset seems to become problematic
Marco: could force it to make a range query by using identifiers (find first identifier after X)
Jan: https://www.eversql.com/faster-pagination-in-mysql-why-order-by-with-limit-and-offset-is-slow/
Jan: asking about a tool/library that implements this
Marco: IMO avoid more tools here, write SQL. Explaining OLTP (OnLine Transaction Processing) vs reporting
Marco: suggesting to do more SQL
Alan: not afraid of writing more SQL
Marco: avoid SQL generators, write SQL by hand, avoid magic to avoid also unpredictable performance
Alan: segmentation is the biggest issue

Schema change -> migration to other stores

Marco: suggesting using different schema for transactional and reporting data.
Sikandar: use a new data store (column storage) for this, but it's in pipeline and won't happen soon.
Alan: that's also the problem - Doctrine kinda forced us to stick to MySQL
Marco: explaining simple example of ES repository:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 <?php final class ContactInformationRepository { public function get(ContactId $id): Contact { $events = $this->connection->query('SELECT * FROM EVENTS .... WHERE ...'); $contact = Contact::bare(); foreach ($events as $e) { $contact->applyEvent($e); } return $contact; } }

Sikandar: what about an entity that has a column with JSON?
Marco: doesn't need to be a repository
Alan: we're looking at a way to get a single source of truth (event-sourcing potentially), and it's managed by the API.
Alan: then we have queries to perform, like segmentation, like "who has visited X in the last Y days"
Alan: we could store in unstructured JSON table, and allow searching
Alan: it's possible to index JSON columns now - https://stackoverflow.com/a/61040738
Marco: suggesting splitting two different schemas for reading/writing again
Sikandar: we attempted using replication (1:1 schema too)
Marco: referring to CQRS, avoid it until really necessary
Marco: start with query objects

1 2 3 4 5 6 7 8 9 <?php final class GetCountOfContactsInState { public function __invoke(ContactState $state): int { // ... } }

Queries can then be made swappable (domain has definition, infrastructure has implementation):

1 2 3 4 5 6 7 8 9 10 11 <?php namespace Mautic\SomeComonent\Infrastructure; final class GetCountOfContactsInSegment implements \Mautic\SomeComponent\Domain\ContactsInSegment { public function __invoke(SegmentDefinition $segment): int { // ... } }

Alan: so suggestion is to move from repositories to more granular queries
Marco: suggesting to use the ORM for storing/modifying information (OLTP), and move to query objects that perhaps
avoid the ORM overall for larger batch tasks

Next week

  • Perf profile - xdebug output

  • Managing obj relationships without enforcing FK constraints

  • ORM generated queries vs Native SQL queries performance .. will it make any difference

  • add link to Zoom call directly to calendar entry