2021-08-31 ORM Hydration and Relationships

Perf profile - xdebug output

there's a cachegrind.out.134
try out running with --env=prod
try opcache.validate_timestamps=0 - https://www.php.net/manual/en/opcache.configuration.php#ini.opcache.validate-timestamps
tune opcache: https://tideways.com/profiler/blog/fine-tune-your-opcache-configuration-to-avoid-caching-suprises
try composer install --classmap-authoritative
potentially interesting to use opcache preloading (PHP 7.4)
Mautic\CoreBundle\Templating\TemplateReference->getPath being called a lot of times - to investigate why
Alan: describing fallback system for templates
Marco: perhaps template system can be replaced with a compiler pass that generates a map of templates upfront
Sikandar: asking about whether this affects scalability
Marco: reducing outliers highlights more issue areas like PDO queries, I/O
Marco: maybe we should generate 1M+ leads
Marco perhaps AbstractHydrator really is the perf bottleneck
Marco: explained hydrator overhead via https://ocramius.github.io/blog/doctrine-orm-optimization-hydration/
Sikandar: we can just use a DTO named ctor / no reflection and use native SQL

final class LeadForSomeSpecificUseCaseDTO { public $id; public $name; public function fromRow(array $resultSetRow): self { ... } } class GimmeContactsForMyUseCase { /** @return iterable<int, LeadForSomeSpecificUseCaseDTO> */ public function __invoke(...): iterable { ... } }

Sikandar: can also use the SQL query and spit directly to frontend
Marco: type safety to be considered

Managing obj relationships without enforcing FK constraints

Sikandar: when we analyze queries from Zabbix/PMM, we see a lot of queries waiting for lock,
10%~30%. System is very write-intensive, and referential integrity is enforced.
Reads and writes are probably contending with each other.
Sikandar: we're not using FKs except for cascades.
Alan: cascade operations are in place / used for deletes
Alan: deleting contacts => lots of cascade operations
Marco: is there a reproducer?
Alan: for deletes, yes, for FKs not really
Sikandar: no direct relationship between FK cascades and locks
Marco: was replication attempted?
Sikandar: replication made it worse
Marco: replication shouldn't affect it, as it operates on binlogs?
Marco: is high consistency really needed for reads? For example segments?
Sikandar: depends on use-case. For example seeing new contact.
Marco: is high consistency really a business scenario though?
Alan: replication was abnormally slow - took minutes to catch up, and would be hours behind
Alan: we shifted focus elsewhere
Alan: we don't want to create duplication/loops for information that could be processed twice, such as
broadcast email, like eligible contacts after a broadcast went out.
Alan, Marco: discussing Doctrine's MasterSlaveConnection
Sikandar: largest DB is probably 350Gb of leads
Alan: that's mostly metadata - related data around segmentation leads to large sizes
Alan: information about what was sent and such is in related tables / denormalized
Marco: 350Gb is "tiny" - why is it having so much write noise that we get locks?
Alan: creating a contact -> added to segments -> written to table that maps to segment -> log entry (immutable)
Alan: segment mapping table is kept up to date
Alan: log is an append-only data structure
Alan: campaigns also produce records for every step in a campaign, deciding which path a lead has taken
Alan: we generate records when emails go out, and when links are clicked, when users are redirected, etc.
Alan: landing page also leads to writes
Marco: are writes caused by frontend (HTTP) of background processes?
Alan: more background processes - they're real-time
Marco: is there a message queue for this?
Alan: we do, but community uses cronjobs
Marco: perhaps use messenger component to abstract this, and send to RMQ?
Sikandar: that shifts the problem though
Marco: perhaps a good idea to avoid flushing every queue operation to disk?
Marco/Sikandar: discussion about NoSQL use-case
Marco: probably relevant if we have frequent ALTER TABLE, but probably not the issue
Marco: are the logs in FK and transactional coupling with the active lead data?
Alan: no direct biz logic, primary use-case is warehousing
Alan: data only used for visualization/API
Marco: what if we remove strong integrity on logs? Is there PII in there? Could it go to a column storage/time-series DB?
Alan: some info in logs could contain PII
Marco: perhaps we could UDP-out the logs?

Next meeting:

\Mautic\CoreBundle\Entity\CommonRepository::saveEntities performance

ORM generated queries vs Native SQL queries performance .. will it make any difference

add link to Zoom call directly to calendar entry